Language English (original text) Français

# Pointers and memory allocation

From an just a bit oversimple point of view, a computer memory is a continuous set of byte. Each byte has an index called it's address.

Each data has an address that we can get through 'addressof' keyword:

var Int i

On a 32 bits system, an integer, 'i' in this sample is stored on 4 bytes since the integer is 32 bits long and each byte is 8 bits long. The address returned by 'addressof' is the address of the first of the four bytes containing 'i' value.

This might seem strange at first. If the address is the index of a memory byte, then it's an integer value, so why do we handle it as a data with type 'Address' instead of handling it as a data with type 'Int' ?
Because typing is not only intended to specify what a data is from the implementation point of view, but also what a data is from a logical point of view.

Here are some operations we can do on an address:

b := a translate Int 3

means that 'b' will receive the address of the memory byte which is 12 bytes after the one 'a' contains the address of. 12 is 3 times 4, where 4 is the number of bytes of an integer on a 32 bits plateform.

If translate is given only a type parameter but no count (3 in our sample), then count 1 is assumed. So:

b := a translate Int

is the same as:

b := a translate Int 1

Then we can access the data at any given address:

i := b map Int

If the memory bytes specified by b don't contain an integer address, the result is unspecified and might just crash the program.

Here is a first shortcut:

b := a translate Int 3
i := b map Int

is the same as:

i := a map Int 3

Finally, in some rare situations, we might want to do real computations on some address, so we can cast it to a real integer:

i := cast a Int

then do some computations, and finally cast back:

So, when we write:

b := a translate Int 3

with an absolute lack of taste, we could instead write it as:

b := cast (cast a Int)+3*Int:size Address

## Pointers

If your program has a lot to do on the interger data at address 'a', then explicitely dereferencing each time using 'map' might be boring in the end:

...
a map Int := (a map Int)*3+2

So, we introduction pointers, as a way to make dereferencing lighter (from the writing point of view only since from the computational point of view it's just the same) since implicit:

var Pointer:Int p
...
p := p*3+2

In this first example, we see that a pointer uses two set instruction, one being ':>' and the other beeing ':='
Let's start from an even simpler example:

var Int i j
var Pointer:Int p
p :> i
p := j

':>' means starting from now, 'p' points to the address of 'i'.
':=' means the value pointed by 'p' will receive the value of 'j'.

Of course, we could get the exact same result, with ugly writing, using an address variable 'a' instead of a pointer variable 'p':

var Int i j
a map Int := j

For readers used to C language, the program would be written as:

int i,j;
int *p;
p = &i;
*p = j;

In other words, Pliant pointers do automatic dereferencing, whereas C pointers require explicit dereferencing. C notation is closer to the computational reality since dereferencing is an operation that the processor really has to do so showing it as a '*' sign can be considered a good thing; on the other hand, it makes program much less clear to read.

There is a 'null' constant address that points to memory byte with index 0, and it is used in many applications as a special value. See other programming languages for extra details.

a := null
var Int i := cast a Int
console "i = " i eol # will display 'i = 0'

When dealing with pointers, testing for null address is done through 'exists' function:

var Pointer:Int p
if exists:p
...

which is equivalent to:

...

## Double indirection

Some algorithms need to handle a pointer on a pointer. As an example, in an algorithm dealing with a list of elements, we might need to have 'l' variable beeing a pointer to the last pointer of the list so that we can modify it to add a new element in the end of the list:

var Int i
var Pointer:Int p
var (Pointer Pointer:Int) l
l := i
l :> i
l :> p
:>> p

The first instruction 'l := i' means full dereferencation, so the integer data pointed by the pointer pointed by 'l' will receive the value of 'i'..
The second instruction 'l :> i' means the pointer pointed by 'l' will now point 'i'.
The third instruction 'l :> p' means the pointer pointed by 'l' will now point the integer data currently pointed by 'p'.
The last instruction 'l :>> p' means that 'l' is now pointing 'p' pointer.

The first instruction 'a := addressof i' means 'a' address variable will now contain the address of the 'i' variable content.
The second instruction 'a := addressof p' means that 'a' address variable will now contain the address of the integer data pointed by 'p'.
The third instruction 'a := addressof Pointer:Int p' means that 'a' address variable will now contain the address of the 'p' pointer.

## Memory allocation

Reserving some raw bytes of memory is done trough 'memory_allocate' function which is an equivalent of C language 'malloc' function:

a := memory_allocate 100 null

The first parameter is the number of consecutive bytes requested, and the second is an address that recommands to provide an address near the provided one.

This second does not exist is C language 'malloc' and is ... not used by Pliant at the moment.
The general idea for keeping it is that when parallelism grows, at some point the bottleneck becomes the bus connecting the processor to the memory, so that one solution is to connect more tighly each part of the memory to one processor. Then, accessing the part of the memory tightly connected is faster than accessing the other part because it does not have to go through the shared bus.
So, the second parameter to memory_allocate is mostly used by data types so that if the root of a complex data is somewhere in memory (let's say the head part of the string in on a thread stack), then the subparts (the characters varying part of the string in our example) will be assigned to the right portion of memory (the one providing allocation tightly connected to the memory assigned with peticular processor handling that thread in our example).

Releasing the memory is done through calling 'memory_free':

memory_free a

Then there are the classical variants.
First allocating some memory with all bytes initialy set to zero:

a := memory_zallocate 100 null

and changing the size of an allocated area:

a := memory_resize a 200 null

and, if extra bytes have to be set to zero:

a := memory_zresize a 200 null

We can get the exact size allocated through:

var Int i
i := memory_size a

We can also get global informations or configuration about memory usage through several not frequently used functions.

First the amount of memory consumed by Pliant process through allocating pages (at operating system level, memory is not assigned to application as bytes, but as pages) can be obtained through:

i := memory_currently_consumed

then, we can now the maximum amount of memory consumed by Pliant since the start of the process:

i := memory_maximum consumed

We can also know the current amount of memory really used (the consumed memory is bigger than the used one because of holes resulting from various allocations and releases):

i := memory_currently_used

It is possible to check that the memory data structure has not been corrupted as a result of poking to a wrong place. Of course, it's just a debugging helper function which is not granted to work:

memory_checkup

## How memory allocation is implemented

The general picture is that it's a modification of the Doug Lea well known algorithm.
I wrote my own at some point, but Doug Lea one was just much better so I switched. Well, I did it so badly that my rewrite of Doug Lea algorithm with Pliant C way of coding is significantly slower than the one he provided. Anyway, implementation is in /pliant/language/memory/allocate.c and it adds the possibility to decommit pages in large free chunks, which is important to release memory to the operating system in case of a long term running process that consumed a lot of memory at some point but currently uses much less.
With released 112, Pliant switched away from Doug Lea 'buckets' (lists) data structure in favor of a slower full tree sorting chunks by size, because I noticed on some production server that the list linear algorithm would collapse when many many quite large blocks have been freed, and an even larger one was requested.

Then, in /pliant/language/schedule/multi_threads_locks.pli which is implementing the locking to turn Pliant to multithread safe (the C written Pliant bootstrap code is not), I added a mechanism to provide better allocation performances in multithreaded applications than using a single lock. The machinism consists of a small list of free chunks for each possible small number of bytes and is actived through setting 'alloc_steedup' constant. See 'memory_allocate_mt' and 'memory_free_mt' functions in the code for extra details.

- a more step by step explaination of the memory allocation mechanism would be required, starting by the operating system glue page allocation functions, but it's out the scope of this first generation of documention -

## Objects and references count

This paragraph is explaining how to use references count allocation mechanism, not why and when to use it. For a general introduction to various memory allocation techniques, see external literature, and also the 'An introduction to Pliant storage' article.

A Pliant object is a data with a header containing two special fields in front: the references counter and the type pointer. So, on a 32 bits plateform, an 'Int' data consumes 4 bytes, but an object with type 'Int' consomes 12 bytes (4 for the references counter, 4 for the type pointer, and 4 for the integer value).

We can create an object with type 'Int' through using the low level function (please notice that it's not recommanded at application level unless you know what you do):

a := entry_new Int # don't do that in real code

the returned address is not address of the first byte of the object header but the address of the first byte of the data. So, we can use an object as if it was a standard data through something like:

var Int i
i := a map Int

When the address is pointing a true Pliant object, you can also get access to the object using 'omap':

i := a omap Int

The difference between 'map' and 'omap' is just that 'omap' says to Pliant compiler that the underlying data is an object, so it can be passed to a function that requires an object as a parameter. Here is a sample:

function foo i
oarg Int i
...
foo (a map Int) # rejected
foo (a omap Int) # accepted

Then, an 'Arrow' is just like an 'Address' but it MUST contain the address of an object. Setting an arrow to an address which is not the one of an object will corrupt Pliant process memory layout, so probably turn it to unpredictable and finally crash. Such a mistake is nearly always catched immediately when running Pliant at debugging level 2, and it is one of the major reasons for running your software at debugging level 2 from time to time. Now, the magic feature of an arrow is that it automatically adjusts the references counter of the object it contains the address of:

var Arrow r
r := a

With greater details, when the address contained in a arrow is changed:

 1 the references counter of the object it used to contain the address of is decreased by 1. 2 if this references counter falls to zero as result, then the object is automaticaly destroyed and it's memory freed. 3 the references counter of the object the arrow receives the address of is increased by one.

The last rule is that 'null' address is supported and will just skip the corresponding increase or decrease operations.

The operations related to arrows are:
Get the type of the object the arrow contains the address of:

var Pointer:Type t :> entry_type r

What getting the pointer to the effective type might be usefull for is outside the scope of this article and will be later explained in an article about typing.

Get the references counter of the object the arrow contains the address of:

var Int i := entry_counter r

Please notice that calling 'entry_type' or 'entry_counter' on an arrow with null address is a bug that will immediately crash the Pliant process.
On the other hand these two function perfectly work with a parameter with type 'Address' instead of type 'Arrow' provided the address is the one of an object.

Now, just like we have 'Arrow' as a special case of 'Address' that automatically handles adjusting the references count of the pointed object, we have 'Link' as a special case of 'Pointer' that also automatically handles adjusting the references count:

l :> new Int

'new' is a function that allocates and builds an object with the specified type, just like 'entry_new', but returns a pointer to it instead of returning it's address. So, the previous code does the same as:

var Arrow r
r := entry_new Int

and the difference is that, in the first case, we could increment the object integer data value by one through:

l += 1

but in the second case, we would have to use the ugly explicit dereferencing notation:

r map Int += 1

Just like an arrow, a link must point to an object, or the Pliant memory layout will be corrupted. Here is one of the most classical mistakes in Pliant:

var Int i
...
var Link:Int l :> i # this is a bug

Since 'i' is a local variable, it is not an object, so setting a link to it is a bug immediately caught only when running at debugging level 2.