Understanding memory allocation schemes implications
When I first published Pliant in 1999, several peope told me: Hey, it does not have a garbage collector (GC), whereas all modern languages do.
I was a bit surprised, because I did know that a GC always bring serious issues that will forvever ... be corrected on the next release.
Since then, Apple removed the garbage collector from it's phone plateform in order to get rid of latency issues, but it seems that still very few people understand garbage collection nasty implications, so this article is intended to enlight people that are not necessary programmers, but still would like to understand the consequences of choices made at language design level.
There are mainly two allocations strategies:
First, the explicit memory allocation. The most well known usage is C language, where it is implemented as 'malloc' and 'free' functions.
'malloc' will request a new memory area, and the application shall store whatever it wants inside.
'free' will release a previously allocated memory area so that it shall be reused to satisfy a future 'malloc' call.
In such a system, the main role of the memory allocation system is to keep track of the freed areas in order to enable future reuse.
Then, garbage collection, also called automatic memory allocation, is a mechanism for higher level languages, where the allocation system understands all kind of data in the memory.
There is basically a single pointer that specifies where the not yet used memory starts.
So, when some room is required to store a new data, the pointer is just forwarded. As a result, at some point, the pointer reaches the end of the available memory, so that garbage collection must take place.
Garbage collection (stop and copy like implementation) means following all data structures indirections (that is why the allocation system needs to understand all kind of data in the memory), starting from some well defined roots, copying all met data to a new area of memory and updating pointers accordingly. At the end of the garbage collection, all the still in use data have been compacted at the beginning so that the pointer of not yet used memory moved backward.
With explicit memory allocation, the cost is each time the application calls 'malloc' or 'free'.
Depending on implementation details, the cost will range from constant time to log(n) when n is the number of allocated objects. Constant time means that freed areas are stored in a set of lists where each list contains freed areas with some well define size range. log(n) time means that freed areas are stored in a tree.
With garbage collection, the cost is each time garbage collection mechanism kicks in in order to move the pointer backward, and the cost is linear with the amount of memory still in used.
So, the important metaphore to memorise to properly understand memory allocation schemes implication is:
with explicit memory allocation, the application buys memory. It pays once for each allocated area (twice in fact because it will also pay to sell it back, but it does not change the overall application behaviour in the end).
with garbage collection, the application rents memory. It will pay at the end of each period until it stops using the area.
Most languages benchmarks found on the Internet are micro benchmarks comparing various languages on a trivial short term task.
As a result, they tend to find garbage collection based languages more efficient because, since the end of the period is never reached, garbage collection cost is never paid.
As stated early in this document, garbage collection implies that at some point, the memory allocation engine must be able to travel all the still in use data set. As a result, garbage collection becomes a nightmare on multithreaded applications that are required on nowadays hardware to get most performances of multi cores processors. The reason is that the memory allocation engine will have to either stop all threads to prevent them to disturb while garbage collection is taking place, or have some fine level locking mechanism that will imply costs on all accesses to memory at all time.
Another important thing to notice about garbage collection is that its cost will raise when the machine load increases, because the garbage collection will kick more often and last longer because more data need to be copied. The negative side effect is that it tends to amplify the consequences of any overload on the machine.
As a result, you can see garbage collection as the devil that keeps silent and unnoticed when everything is fine (at sofware validation time where each feature tend to be checked one after the other), and will appear just to make things worse when things start to go bad (when all users are rushing on the system to meet an important job deadline).
Let's assume that a programmer developped for you an application using a garbage collection based language.
His job has been easier because memory allocating was automatic.
Now, when using the application, you pay through strangely crawling system.
The Pliant memory allocation is explicit plus reference count memory allocation.
Reference count is a mechanism a bit less powerfull than garbage collection from the programmer point of view, because he has to make sure that the application data layout does not create cycles, but in most cases, it makes life for the programmer just as easy as with automatic memory allocation, but without changing the running system bahaviour, so that the user does not have to pay for the programmer in the end.