There are not many ways of extending the available memory on a computer. The easiest solution is to buy more RAM chips and install them on the motherboard, but this is expensive, requires manual intervention, and might not even solve the problem when input data sizes are overwhelming (larger than the largest amount of RAM one can possibly install on the motherboard).
Another solution on systems supporting swap filesystems (i.e. practically all modern operating systems) is too create swap space on the local disks. This must unfortunately be done by the superuser and requires some technical knowledge of the OS. This also has the limitation that the input data might still be larger than the largest amount of swap one can define on the machine. This is an alternative solution for building dedicated number-crunching machines cheaply. For an equivalent price, a hard disk has always provided an order of magnitude more storage space than RAM chips. But in the general case of a user wanting to run one data processing program on systems that cannot be customized for this use, this solution is not applicable.
A third solution is to make use of memory-mapped files. This is supported on all POSIX compliant operating systems that have the mmap() system call (i.e. all modern Unix flavours). The idea is that mmap allows a mapping between a memory pointer and a file on disk. All modifications brought to the pointer are reflected into the file. Since the memory zone is located in a file, it is in effect acting as extra swap buffer added to the system on the fly. The main difference is that mapped files are not swap space thus less efficient, but need no superuser privilege to be created and used.
What are large amounts of memory used for? In general, the first memory-greedy task is the data loader. It is responsible for bringing the potentially large input flux into a working space accessible to the programmer (RAM or swap space). Whatever the data format used for the input stream, there is a need for a memory allocator to provide space holding the contents of a file read. This is another task for which memory-mapping of files can be of great help. This memory module should offer some facilities related to that task.
Notice that a file/memory mapping primitive is usually available on all modern operating systems. The present module uses the facility provided on POSIX compatible systems, but an equivalent solution (an mmap() equivalent) can probably be found on other platforms.
In summary, the issues to solve are:
These issues are addressed in the following sections.