Language

Pliant database engine layout

The main design issue with Pliant database on top of Pliant storage is that there are too many layers stacked. As a result, the overall thing is fairly hard to understand.

Light datas, fat pointers

In 'The storage machinery layout' article, we have shown that a Pliant database will be stored on disk as a PML encoded set of instructions (or XML like ASCII file, but it does not matter). Executing all the instructions is the way to restore the database state in the main memory global cache when the Pliant process is restarted.

Now, once again, the main difficulty is to design the instructions set, and more precisely to translate on disk the in memory pointer notion. Let's get back to the sample shopping database we introduced in the 'A gentil intoduction to using Pliant databases'.
Let's assume that variable 'a' is a pointer to a record with type 'Article':

var Data:Article a :> ...

and we execute:

a count := 3

The 'count' field of the 'Article' record will be modified in the database object in the computer main memory, but we must also write immediately an instruction to the disk (either PML encoded file if the database engine is sitting of top of the storage machinery, or an XML like ASCII file) so that the change be not lost if the Pliant process crashes soon after.
The only serious problem about it is: how to specify in the on disk file the 'Article' record the 'count' field belongs to. Let's start by answering the following question: where do we pick this informations from ?

We could store the information in each field ('count' in our example would not have type Int but a bigger one, let's say DbInt, that would contain the required references to produce the PML stream) but it would be terrificly expansive as far as memory consuming is concerned.

A better approche would be to store the record identifier at record level instead of field level, so the 'Article' data type would turn to:

type Article
  field Str order_id
  field Str article_id
  field Str ref
  field Int count
  field Float ppu

This is what the relational database model does.
Now, the problem still holds, because: if we have 'a' is a pointer to an article, and futher more have 'c' is a pointer to an Int, then do:

c :> a count
foo c

where 'foo' is a function that modifies the content of 'c', then the problem to determine the order and article 'c' refers to still holds.

The selected solution is to use fat pointers for accessing database datas. Basically, the pointer contains the path to the data it points. This explain why when handling database datas, 'c' does not have type Pointer:Int but rather Data:Int.

The same applies at object level: a Pliant database field is not a real Pliant object (it does not have a header with references count and type pointer) so generic methods cannot be called, but the database fat pointer contains an interface field that enable to call database generic methods.

Fat pointers detailed content

The Pliant database fat pointers data type support is defined in module /pliant/storage/database/prototype.pli and named 'Data_'

In our previous sample where 'c' was a pointer to 'count' field, the 'Data_' fat pointer support fields would have the following content:

   •   

'adr' field contains the effective address of the 'count' field in the 'Article' record.

   •   

'object' field contains the address of the 'Article' record.

   •   

'interface' contains the address of a true Pliant object that will be used to access the generic methods (more details on this later).

   •   

'base' field contains the address of the database (among other things, the database contains the semaphore that is needed to make all database operations automatically thread safe).

   •   

'path1' and 'path2' contain the path to the pointed field that we would use to write the modify instruction in the on disk PML encoded file.

The effective path of the database pointed field can be obtained at any time through calling 'path' method or 'dbpath' method (they are defined in the same module).
The fact that we use 'path1' and 'path2' in the 'Data_' type rather than a simple 'path' field with type 'Str' is just for speed reason.
Now, 'path' returns the absolute path, and 'dbpath' returns the relative path from the database root path. So, in the PML encoded steam, 'dbpath' will be used as the field unique identifier.

At application level, the fat pointers to database datas will have type Data:xxx where xxx is the expected format of the underlying data, so in our sample database, we would have:

var Data:Article a
var Data:Int c :> a count

Each time the application does something on such a pointer (in this example, we get the 'count' field), a generic method on the 'interface' object of the underlying 'Data_' is called.
Here, it will be 'search', with "count" for parameter 'k' (key) value.
The interesting thing is that 'search' method is responsible for setting all the fields of the resulting underlying 'Data_' fat pointer. So, among othe things, it will compute the 'path1' and 'path2' fields of the new underlying 'Data_' fat pointer, and set it's 'interface' field.

So, you now know the overall high level machinery of the Pliant database engine: each time the application wants to do something on the database (get a subrecord, create a new subrecord, get a field, change it, etc), a generic method of the 'DataInterface_' object that is pointed by the 'interface' field of the 'Data_' underlying fat pointer data type will be called.

Still in /pliant/storage/database/prototype.pli, we define 'Database_' data type, which is the prototype for a Pliant database. I mean, it is the data stucture that connects the database to the Pliant storage layer (global cache, PML encoded file, etc). The data type is declared at the beginning of the module, but the generic methods it provides are defined at the end.
The main one at the moment is 'get_root' that brings a database big pointer to the root node of the database content, so is called when using 'data' method at appliction level.

 

From high level application interface down to real code

Let's see how things truely happend on a very simple example:

var Data:Article a :> ...
var Data:Int c :> a count
c := 3

First, the data type of 'c', I mean 'Data:Int' is defined in /pliant/storage/database/pointer.pli, in:

function Data t -> tt
  arg Type t ; arg_R Type tt
  ...

and we have is_field = true (see code in order to understand what I mean by 'is_field').
'is_field' picks it's value from 'data_kind' function defined in /pliant/storage/database/interface.pli

As a result, with a bit of over simplification in order to provide a step to understand the real code, function Data will dynamically compile :

type Data:Int
  field Data_ data

function 'cast Int' d -> v
  arg Data:Int d ; arg Int v
  implicit
  if (d:data:interface get d:data addressof:v Int)=failure
    v := undefined

Implementation note: Where is the locking !!!

So, if we write in our application:

var Int i := c

then the Pliant language casting machinery will silently (because we have set 'implicit' flag in 'cast Int' function) call 'cast Int' function, so that 'get' method of the underlying fat pointer interface will be called.'

I will now explain where the 'interface' field of the fat pointer, that provides all the generic methods that do the true work, comes from, since it is what does the real work in facts, with everything else beeing mostly glue code.
Everything starts in /pliant/storage/database/interface.pli, that defines 'data_interface' function, which is called every time we want to get an interfaces object (so a set of generic methods in facts) for a given data type.
The code for 'data_interface' is fairly short, because all it does is call all the functions listed in the 'data_interface_generators' list, with the hope that one will provide the solution.
This is just a standard trick to make the machinery extensible: new modules can record new functions for dealing with special data types.
Now, the true function that will be called to get an interface object providing generic methods for a given data type is 99% of the time 'inmemory_interface' defined in /pliant/storage/database/inmemory.pli which is recorded in the list just after beeing defined.

We see (assuming you also read the code while reading the documentation) that if the type is a field ('c' in our example), the interface object will have type 'DataField', and if the field is a record ('a' in our example), the interface object will have type 'DataRecord'.
Now, you should read the definition of 'DataField' and 'DataRecord' in /pliant/storage/database/inmemory.pli since they are fairly simple.
Let's provide just one example; the 'get' function for 'DataField' which is called by the 'cast Int' function we have seen earlier in the paragraph:

method df get d adr type -> status
  oarg DataField df ; arg Data_ d ; arg Address adr ; arg Type type ; arg Status status
  if type=df:type
    type copy_instance d:adr adr
    status := success
  else
    var Str s := to_string d:adr df:type "db"
    status := from_string adr type s "db"

So, if the requested type provided in 'type' argument matches the real type of the field stored in memory which is 'df:type', then we just copy it through calling 'copy_instance', else we cast the value stored in memory to a string, then cast the string to the requested type.

Let's continue with our initial example in this paragraph through explaining how 'c := 3' will be processed.

First of all, it will be compiled by:

meta ':=' e
  ...

defined in /pliant/storage/database/pointer.pli, so that function 'data_set' defined in the same modeule is called:

function data_set d v t
  arg_rw Data_ d ; arg Universal v ; arg Type t
  ...

After obtaining the database semaphore, 'data_set' function issues:

d:interface set d addressof:v t

So, the 'set' generic method of the 'interface' field of the database big pointer is called, and we have just seen that since 'c' as type Data:Int, it will be 'set' method of 'DataField' data type, defined in /pliant/storage/database/inmemory.pli.
Let's see what it really does:

method df set d adr type -> status
  oarg_rw DataField df ; arg_rw Data_ d ; arg Address adr ; arg Type type ; arg Status status
  if type=df:type
    type copy_instance adr d:adr
    status := success
  else
    var Str s := to_string adr type "db"
    status := from_string d:adr df:type s "db"
  if status=success
    d:base notify_set d d:adr df:type

It's basically the same as 'get', but the other way round, with a extra call to 'notify_set' at the end.
'notify_set' is a generic method of the 'base' field of the fat pointer. The 'base' field points to the underlying database object. 'notify_set' is the function that will write the on disk instruction so that the change be not lost in case Pliant process is restarted. Let's track where the real code stands in the middle of the database storage glue.

We start with the application definining:

(gvar Database:Shop shop_database) load "data:/my_corp/shop/shop.pdb"

so that function 'Database' defined in /pliant/storage/database/pointer.pli is called to build the real data type of 'shop_database'.
In function 'Database' code, we see that the real data type of 'Data:Shop' will be 'DatabaseFile'.
So, the real 'notify_set' code is defined in module /pliant/storage/database/file.pli as:

method df notify_set d adr type
  arg_rw DatabaseFile df ; arg Data_ d ; arg Address adr ; arg Type type
  plugin notify_set
  if df:log_required
    if type=Str
      df log_line "<pdata path=[dq]"+d:dbpath+"[dq]>"+html_encode:(adr map Str)+"</pdata>"
    else
      var Str value := to_string adr type "raw"
      df log_line "<pdata path=[dq]"+d:dbpath+"[dq]>"+html_encode:value+"</pdata>"

This paragraph is probably fairly hard to understand because Pliant database glue code is a fairly big pile, so I'll try now to provide some overview of it:
pointer.pli module provides the 'Data' and 'Database' functions that implement application level data types Data:xxx and Database:xxx
For Database:xxx, the underlying data type is DatabaseFile, defined in file.pli module.
For Data:xxx, the underlying data type is alsways Data_ defined in prototype.pli, but the generic methods come from the 'interface' field. The 'interface' object is constructed throuh calling 'data_interface' dispatching function defined in interface.pli, and most of the time the winer is 'inmemory_interface' function defined in inmemory.pli that returns an object with either 'DataField' or 'DataRecord' type.

Mapping database application level feature to interface generic methods

Fields

Let's get back to the sample database, and assume that we have:

var Data:Article a :> ...
var Data:Int c :> a count

First, let's read a field value. We have seen in the previous paragraph that:

var Int i := c

is handled through 'cast Int' function defined at the time Data:Int is defined, and ends to calling 'get' generic method of the interface field of the big pointer:

 

method di get d adr type -> status
  oarg DataInterface_ di ; arg Data_ d ; arg Address adr ; arg Type type ; arg Status status

Then, let's write a field value. We have also seen that:

:= 3

is handled by meta ':=' defined in /pliant/storage/database/pointer.pli and translates to calling 'data_set' function, which calls 'set' generic method of the interface field of the fat pointer:

method di set d adr type -> status
  oarg_rw DataInterface_ di ; arg_rw Data_ d ; arg Address adr ; arg Type type ; arg Status status

Records

Let's access a field within a record:

c :> a count

is handled by meta '' defined in /pliant/storage/database/pointer.pli and translates to calling 'map_field' function, which ends in either calling 'apply' function defined in /pliant/storage/database/inmemory.pli or calling the 'search' generic method:

method di search d k -> d2
  oarg DataInterface_ di ; arg Data_ d ; arg Str k ; arg Data_ d2

The 'apply' function is just a way to speed up things, through avoiding to look for the field in a dictionary at execution time.

Tables

We are assuming 'o' to be a database pointer to an 'Oder' record in our sample database:

var Data:Order o :> ...

First, create a new record.

o:article create "a1"

'create' meta is also defined in /pliant/storage/database/pointer.pli, and translates to calling 'data_create', which ends to calling 'create' generic method:

method di create d k -> status
  oarg_rw DataInterface_ di ; arg_rw Data_ d ; arg Str k ; arg Status status

Deleting is not very deferent:

o:article delete "a1"

is compiled by 'delete' meta defined in the same module, and translates to calling 'data_delete', which ends to calling 'delete' generic method:

method di delete d k -> status
  oarg_rw DataInterface_ di ; arg_rw Data_ d ; arg Str k ; arg Status status

Testing the existence of a record is performed at application level through 'exists':

if exists:a
  ...

it is compiled by 'exists' meta, sill in the same module, and translates to calling 'data_exists', which translate to ... just testing 'adr' field of the database big pointer to not be null.

We'll finish with the scanning application level instruction:

each a o:article
  ...

'each' is compiled by meta 'each', still in /pliant/storage/database/pointer.pli, and translates (when no extra option is provided) mostly to calling 'scan_first' and 'scan_next', which ends to calling 'first' and 'next' generic methods:

method di first d start stop buf -> d2
  oarg DataInterface_ di ; arg Data_ d ; arg Str start stop ; arg_w DataScanBuffer buf ; arg Data_ d2

method di next d start stop buf -> d2
  oarg DataInterface_ di ; arg Data_ d ; arg Str start stop ; arg_rw DataScanBuffer buf ; arg Data_ d2

Getting futher

We have not introduced all generic methods of the 'DataInterface_' type ('interface' field the database big pointer) yet:

method di reset d -> status
  oarg_rw DataInterface_ di ; arg_rw Data_ d ; arg Status status

method di type d -> t
  oarg DataInterface_ di ; arg Data_ d ; arg_R Type t

method di address d -> a
  oarg DataInterface_ di ; arg Data_ d ; arg Address a

method di count d start stop -> count
  oarg_rw DataInterface_ di ; arg Data_ d ; arg Str start stop ; arg Int count

method di first_to_store d start stop buf -> d2
  oarg DataInterface_ di ; arg Data_ d ; arg Str start stop ; arg_w DataScanBuffer buf ; arg Data_ d2

method di next_to_store d start stop buf -> d2
  oarg DataInterface_ di ; arg Data_ d ; arg Str start stop ; arg_rw DataScanBuffer buf ; arg Data_ d2

method di pre_delete d k
  arg DataInterface_ di ; arg Data_ d ; arg Str k

Well, if you reached that point, the code is the documentation is probably something you are now ready for, and if you get locked, then dropping me an email will probably make me happily resume trying to explain how it's currently led down.