Language

An introduction to Pliant storage

I have a problem with writing this article, because as in many other areas, the rational explination of the path to the selected solution is in fact very artificial and so weak from the intellectual point of view: the serious one would be to test all reasonably simple solutions and report problems met with each, but it is not something possible in a decent amount of lines.
The real inventor effective process has already in fact both kind of troubles, since it's a mixture of intuition (artificial) and experiments (long), so, at report time, fully assuming the artificial weakness is an obvious choice since it mostly cannot be avoided in facts.

History of mainstream storage standards

The first level of storage has been led down by Unix in the seventies: permanent data are stored on disk as a tree of files, where each file is a flat set of bytes. This concept proved over the years to be good because both simple and quite flexible. (1)

Some kind of data fit nicely on the file notion, such as ASCII text files or images. In very few words, most data that you load at some point in the application, then process in main memory, and finally store back to disk.
On the other hand, database informations (tables containing large set of records that each contain several fields) don't fit that nicely to the disk as a tree of files concept. One of the reason is: what should we map a file to ? A whole database, a table, a record or a field ? If we map a database or even a table, then we have integrity issue because the complex in file layout required to efficiently map a whole table in a single file might get inconsistent in case of an application crash. On the other hand, if we decide to map each record or even field to a file, then most file systems will collapse both on speed and space usage.
This shows that two concepts are missing in the Unix file systems layout: transactions (granting that none or all operations will be applied but not some only, so that in the end the complex on disk layout be granted to remain consistent) and efficient handling of large sets of very small data.
These two extra concepts have been added at application level in relational database servers, so that in most modern computing systems, two storages mechanisms are used simultaneously: the Unix file system for application that do load and save, and a relational database server for handling large sets of small data.

Constrains and opportunities provided by hardware evolution

Back in 1984, a hard disk typical performances are

•	transfer rate: 100 KB/s

•	access time: 100 ms

in 2008, they are:

•	transfer rate: 50 MB/s

•	access time: 10 ms

so, in 25 years, the transfer (continuous read or write) rate has been multiplied by 500, but the access time has been divided by only 10.
As a result, a modern computer hard disk should be seen more as a tape than a device with direct access capabilities (also it might change with the emerging memory sticks).
The few remaining slowness of modern desktop computers (an application needing several seconds to start up, or the disk check lasting many many minutes) are direct result of this fact: the computer capabilities are growing mostly exponentially, but the access time (and network latency) does not.

In order to cope with disk access time slowness, all modern operating systems do disk caching: some of the on disk data are also stored in the main memory, so that they can be read, and more importantly, accessed much faster.
Now, if we focus more specifically on database storage, the direct consequence of disks technology evolution is that under any decent load, a database server will behave smoothly only if most of the frequently accessed data are stored in the operating system (or database server) disk cache.
Well, since a modern server memory can be counted in GB, it is no more a serious issue to have a copy of all the database files in the main memory for anything but very large databases.
As a result, the next question is: if we have all database data in the main memory, what is the disk useful for. The obvious answer is: it is only a backup of the main memory, in case of a crash.
So, the next next question is: since memory and disk are used to provide very different features (respectively process and restore), is it still a good idea to have the same data structure in memory and on disk, I mean organize in memory data the same way as on disk ? (2)
Well, I bet you will not be surprised if I now tel you that my answer is just no.

As a summary, at step 1, we see that disk slow access speed evolution with cheap huge amount of main memory now makes it mandatory to have most of the database not only on the disk, but also in the main memory in order to fully benefit from modern server capabilities.
Then, at step 2, we see that if we decide to have not most of the database but rather all of it, then we can get some extra freedom: the in memory and on disk data organization need no more to be the same so that each of them can be optimized accordingly to what it is really used for.

The computer is a caches pile: main memory organization

As soon as I've asserted that we are going to process the database in the main memory instead of on the disk, I bet many of you have eared a bell immediately ringing in their head: what if we run out of memory because the database growing.

I'll rather try now to provide a very high overview of what a modern computer truly is.
First of all, let's say that a modern computer is just a pile of caches:

•	processor registers

•	processor L1 cache (8 KB)

•	processor L2 and L3 caches (4 MB)

•	main memory (2 GB)

•	disk (250 GB)

So, if we want to keep the freedom of different organization on disk and in memory, yet have performances slowly getting down when memory is full instead of an abrupt stop, all we need is to split the data set in several pieces (objects) so that not all of them are used at a given time. As a result, should we need to free some memory, then we can just drop some objects not used at the moment, because we will be able to rebuild them from the on disk log when needed, at the price of some performance penalty.

There are mainly three different ways to organize main memory:

•

Explicit allocation:
Not only the program reserves some memory area when it needs some, but it also has to explicitly release it when it will not be used any more.
This is what C and C++ languages do through the well known malloc/free functions.
The problem is that explicitly releasing memory has to be done is many many places in a program, so is both boring (and risky because high probability of stupid mistakes) and sometime very difficult (when the object is shared).

•

References count:
Pointers automatically increment and decrement a counter on the object they point, so that when the counter falls down to zero, the object can be automatically released.
This is used by Python.
The problem is that any circular reference will prevent the automatic releasing system to work, so finding a working layout can be difficult or even impossible for very complex data layouts.

•

Garbage collector:
The pointers follow stick rules so that when memory is exhausted, an automatic process can be used to scan all memory and safely decide what can be released.
This is used by most if not all logical programing languages, plus most modern high level languages such as Java.
There are two problems with this mechanism: first, rules prevent very low level programing in performance critical areas, then performances are good only if the available memory is significantly larger that the really used one so that scanning does not append too often. (3)

Pliant uses a mixture of explicit allocation, references count and a global cache.

At the beginning of the article, we have specified that in any modern operating system, on disk data are stored in a tree of files. In Pliant, each storage object loaded in memory is placed in a tree of objects called the global cache. I'm not aware of any other mainstream language providing an equivalent of the Pliant global cache, even if it could be implemented with more or less success in most of them.

Anyway, the view of the cache pile as the spinal column of a modern computer memory layout bring you the design reason for Pliant to use references count instead of a garbage collector. The most important part is the global cache, and reference count is a better way to support it because garbage collector would prevent to fill most of the memory without serious performances drop, and would bring constrains on the way to handle low level operations.

Pliant storage model

The data organization adopted by Pliant storage system is a set of objects, with each object stored:

•	in the main memory, as a mostly unconstrained data structure (a graph) in the global cache

•	on disk, as a log of changes

Any implementation attempt shows that the only serious issue with this new model is to find a way to identify in the on disk log the in memory nodes implied with each change. In memory, a node is identified by it's address, but it's not a permanent identity since it is likely to change with each restart of the application. Several solutions exist, but it's beyond the scope of this introduction.

Let's finish this introduction to Pliant storage through listing a few properties of the Pliant storage model:

Providing infinite log is very easy and enables two very import things in open systems:

•	It is easy to answer to question: who modified this data, when ?

•	In case an account has been stolen and used to modify some data, it is possible to selectively rewind the modifications performed using the stolen account between two specified dates.

Synchronizing several distant servers is not too complex since it's mostly a matter of sending and applying the log of each of them on all others.

Desktop documents (XML trees), that don't fit nicely with the mainstream relational database model, do fit nicely on top of the Pliant storage model, so that we can get read of the load and save feature for desktop applications and don't need to reinvent a dirty little logging feature in each application (for copping with unexpected crashes).

Pliant database model

The Pliant database model sits on top of the Pliant storage model.
The data layout model adopted by Pliant database engine is not the mainstream relational model, but a tree model.

Let's start by explaining what the relational model is, what the Pliant model is, and properties attached with each of them.

The relational model says that a database is a set of tables, a table is a set of records, and each record in a table contain the same set of fields. Then relations 1 to 1, 1 to n and n to n are established through fields having the same value in different tables. As an example, in the shopping database we take as the example in the 'gentle introduction to using Pliant databases' article, an order contains n articles, and the connection between the order and it's articles (a subset of all articles records stored in the single articles table) is established using a field in the articles table that have to match the corresponding field value in the order record.

On the other hand, Pliant database model is more modeled on what people do when they handle data directly with their favorite computing language without the help of a specific storage engine. Back to the sample, an order is a record, and it has a field which is a set of articles (an embedded subtable).

From the theory point of view, the rational model is an arbitrary graph with all relations being equal, and the Pliant model forces you to extract a main tree from it, with two different classes or relations: the ones matching the main tree structure being easier and more efficient to use than the ones crossing the tree.

Both systems provide an immediate advantage:

Since the relational model keeps all relations equal (the database engine is automatically deciding to create indexes to speed up most often used 1 to n relations), then it adapts much better if the main usage of the database changes over time, so that the main tree get's different.
Moreover, n to n relations are much better handled because they fit naturally and efficiently on the model whereas with the Pliant database tree model, all n to n relations cannot be part of the main tree, so have to be handled the not easy not efficient way.
As a summary, the relational model is a much more general model so that in the end it handles complex relations more cleanly and more efficiently.

On the other hand, many database have a fairly evident main tree. In our sample database, an order with a set of records. In such a situation, the locality of the Pliant tree model can be a serious advantage. If we want to display a single form to a user showing an order, and the order is not yet loaded in the main memory, with Pliant model, we will do a single disk access to read the order record and it will load all the embedded subtables (a single articles one in our simple sample database) at once. On a relational database server, the order record is read, then for each subtable, one or many more disks access are required (remind that direct access on modern disks is painfully slow) to read related records in other tables involved in the 1 to n relations.
In other words, the tree model better matches the Pliant storage global cache notion.

Then, we have to study how the application deals with the data. Only afterwards, it will be to conclude what each system seams best suited for.

Relational model is nearly always associated with SQL access language. SQL is a logical programming language, and in facts the only logical programming language that succeeded to become mainstream.
Logical programming is opposed to procedural programming. In procedural programming, you describe to the computer how to build the result step by step. In logical programming, you describe the assertions the result must match, and the computer deduces a way to compute it. So, the question is: Why nearly all database servers ended with using logical programming whereas nearly everything else ended using procedural programming.
The answer is double: we have seen that the relational model treats all relation equal, so that the database engine is (at least in theory) responsible for deciding what indexes it creates and maintains to speed up operations. Using a logical programming language for queries maximizes the freedom of the database engine because it will also decide the way to compute the result to the queries.
Then, the logical programming main difference with procedural programming is that it's better because it enables more elegant solutions to some problems, but it's also worse because it requires the programmer to be more clever (because it's more abstract) so that it minimizes rather than amplify programmers capabilities. (4) The database queries is an area where most programs are either trivial or quite simple so that logical programming providing better freedom for the database engine in order to maximize the benefits of the relational model, at the expense of limiting programmers capabilities is something that makes sense in the end.

On the other end, relational SQL server have serious latency issues. The reason is that since the server is handling complex on disk data structures (variable length fields and indexes) the server has to run in it's own process or even own server because a crash can be a serious trouble (5). So operations that require many small requests as opposed to one big one, such as just displaying a form to the operator, tend to be poorly handled, both on the efficiency front and on the program simplicity front.On the other hand the Pliant storage model with storage objects that match standard languages organization and on disk infinite log makes it efficient (the storage server is the application server and works in main memory), safe (on disk format is so simple) and clean to program (the database database data are no different from other data).

So, my general conclusion is that Pliant storage and database models are better suited than relational model supported by SQL servers for the front-end, I mean displaying forms to the operators, keeping data safely on disk (including infinite log that can be rewind selectively), and over the Internet life synchronization among several servers.
On the other end, relational model with SQL queries on top is better suited for the back-end, I mean make statistics, due to the ability of the SQL engine to benefit from the relation model freedom associated with the logical programming freedom to automatically organize computations efficiently on batch requests handling large sets of data.
Now, the very simple log structure used by Pliant database servers makes it fairly easy to propagate on the fly any change to some SQL server, so that a mixed configuration to benefit from both words advantages is easy to setup provided the programs modifying data do it on the Pliant database server side.

I end this article explaining Pliant storage design with a management level advise: any ERP (home made or home specialized database system that closely matches company organization) is likely to fall to an such inconsistent level over time that it needs replacement (6). As a result, using a database system that has a logging system that can be easily parsed to live plug changes to another completely different system through filtering the simple log is a the feature that enables to run a mixed configuration at some point, so make the transition less scaring.
In very few words, at project startup time, checking the log your new system will provide might be a greater value for your time than checking the features it will provide.

(1)	In the early days, there used to be different proposals for on disk data organization. On modern operating systems, they have been some attempts to provide database like storage at operating system, or to provide better adapted disk organization for storing many small files and maybe richer semantic such as transaction (Reiser4)
(2)	Modern database servers can have special (better optimized) data encoding for in memory only tables (temporary ones), but the general software organization remains disk (bloc) oriented. There are also some object oriented database proposals that work very differently than mainstream SQL (relational model) database servers. I just don't know what they worth.
(3)	There are endless research papers describing solutions to reduce the cost of scanning all memory in garbage collectors. From my point of view, checking a memory allocation mechanism requires to write a database server (huge volumes) and a graphical stack (huge volumes and mandatory low level programming) on top of it and see how it carries the load.
(4)	This is a very strong assertion, and plenty of experimented computing guys will not agree and say that one is just better than the other. Discussions about programming languages qualities tend to end in endless flames. Since unless somebody points me out another name, I assume to be the single guy that wrote the largest number of softwares in the world, I'll just backup my assertion with: this is the result of my experience, so it's strength is just related to the number of softwares I've written.
(5)	On some high end SQL servers, the danger of data corruption due to crash is properly reduced through advanced logging features, but it makes the all thing really complex and has some performances impacts. There are also stored procedures used to both reduce the execution time of tiny requests or ensure tighter data access rules, but it's not enough to change the general picture if we compare with Pliant in memory database model.
(6)	One reason could be poor management by the computing department that has not been corrected soon enough by the main management because main management was not computing field issues aware or centered, Another reason could be fast company organization change that makes it really hard to adjust the tightly connected complex software accordingly.