This behavior is not unique to MATLAB. In fact, MATLAB has no control over it, as it is Windows that causes it. Linux and MacOS show the same behavior.
I had noticed this exact same thing in a C program many years ago. It turns out that this is well documented behavior. This excellent answer explains in gory details how memory management works in most modern OSes (thanks Amro for sharing the link!). Read it if this answer doesn't have enough detail for you.
First, let's repeat Ander's experiment in C:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main (void) {
const int size = 1e8;
/* For Linux: */
// const char* ps_command = "ps --no-headers --format "rss vsz" -C so";
/* For MacOS: */
char ps_command[128];
sprintf(ps_command, "ps -o rss,vsz -p %d", getpid());
puts("At program start:");
system(ps_command);
/* Allocate large chunck of memory */
char* mem = malloc(size);
puts("After malloc:");
system(ps_command);
for(int ii = 0; ii < size/2; ++ii) {
mem[ii] = 0;
}
puts("After writing to half the array:");
system(ps_command);
for(int ii = size/2; ii < size; ++ii) {
mem[ii] = 0;
}
puts("After writing to the whole array:");
system(ps_command);
char* mem2 = calloc(size, 1);
puts("After calloc:");
system(ps_command);
free(mem);
free(mem2);
}
The code above works on a POSIX-compliant OS (i.e. any OS except Windows), but on Windows you can use Cygwin to become (mostly) POSIX-compliant. You might need to change the ps
command syntax depending on your OS. Compile with gcc so.c -o so
, run with ./so
. I see the following output on MacOS:
At program start:
RSS VSZ
800 4267728
After malloc:
RSS VSZ
816 4366416
After writing to half the array:
RSS VSZ
49648 4366416
After writing to the whole array:
RSS VSZ
98476 4366416
After calloc:
RSS VSZ
98476 4464076
There are two columns displayed, RSS and VSZ. RSS stands for "Resident set size", it is the amount of physical memory (RAM) that the program is using. VSZ stands for "Virtual size", it is the size of the virtual memory assigned to the program. Both quantities are in KiB.
The VSZ column shows 4 GiB at program start. I'm not sure what that is about, it seems over the top. But the value grows after malloc
and again after calloc
, both times with approximately 98,000 KiB (slightly over the 1e8 bytes we allocated).
In contrast, the RSS column shows an increase of only 16 KiB after we allocated 1e8 bytes. After writing to half the array, we have a bit over 5e7 bytes of memory in use, and after writing to the full array we have a bit over 1e8 bytes in use. Thus, the memory gets assigned as we use it, not when we first ask for it. Next, we allocate another 1e8 bytes using calloc
, and see no change in the RSS. Note that calloc
returns a memory block that is initialized to 0, exactly like MATLAB's zeros
does.
I am talking about calloc
because it is likely that MATLAB's zeros
is implemented through calloc
.
Explanation:
Modern computer architectures separate virtual memory (the memory space that a process sees) from physical memory. The process (i.e. a program) uses pointers to access memory, these pointers are addresses in virtual memory. These addresses are translated by the system into physical addresses when used. This has many advantages, for example it is impossible for one process to address memory assigned to another process, since none of the addresses it can generate will ever be translated to physical memory not assigned to that process. It also allows the OS to swap out memory of an idling process to let another process use that physical memory. Note that the physical memory for a contiguous block of virtual memory doesn't need to be contiguous!
The key is the bolded italic text above: when used. Memory assigned to a process might not actually exist until the process tries to read from or write to it. This is why we don't see any change in RSS when allocating a large array. Memory used is assigned to physical memory in pages (blocks typically of 4 KiB, sometimes up to 1 MiB). So when we write to one byte of our new memory block, only one page gets assigned.
Some OSes, like Linux, will even "overcommit" memory. Linux will assign more virtual memory to processes than it has the capacity to put into physical memory, under the assumption that those processes will not use all the memory they are assigned anyway. This answer will tell you more over overcommitting than you will want to know.
So what happens with calloc
, which returns zero-initialized memory? This is also explained in the answer I linked earlier. For small arrays malloc
and calloc
return a block of memory from a larger pool obtained from the OS at the start of the program. In this case, calloc
will write zeros to all bytes to make sure it is zero-initialized. But for larger arrays, a new block of memory is directly obtained from the OS. The OS always gives out memory that is zeroed out (again, it prevents one program to see data from another program). But because the memory doesn't get physically assigned until used, the zeroing out is also delayed until a memory page is put into physical memory.
Back to MATLAB:
The experiment above shows that it is possible to obtain a zeroed-out block of memory in constant time and without changing the physical size of a program's memory. This is how MATLAB's function zeros
allocates memory without you seeing any change in MATLAB's memory footprint.
The experiment also shows that zeros
allocates the full array (likely through calloc
), and that memory footprint only increases as this array is used, one page at a time.
The preallocation advice by the MathWorks states that
you can improve code execution time by preallocating the maximum amount of space required for the array.
If we allocate a small array, then want to increase its size, a new array has to be allocated and data copied over. How the array is associated to RAM has no influence on this, MATLAB only sees virtual memory, it has no control (or even knowledge?) of where in the physical memory (RAM) these data are stored. All that matters for an array from MATLAB's point of view (or that of any other program) is that the array is a contiguous block of virtual memory. Enlarging an existing block of memory is not always (usually not?) possible, and so a new block is obtained and data copied over. For example, see the graph in this other answer: when the array is enlarged (this happens at the large vertical spikes) data is copied; the larger the array, the more data needs to be copied.
Preallocating avoids enlarging the array, as we make it large enough to begin with. In fact, it is more efficient to make an array that is way too large for what we need, as the portion of the array that we don't use is actually never really given to the program. That is, if we allocate a very large block of virtual memory, and only use the first 1000 elements, we'll only really use a few pages of physical memory.
The behavior of calloc
described above explains also this other strange behavior of the zeros
function: For small arrays, zeros
is more expensive than for large arrays, because small arrays need to be zeroed explicitly by the program, whereas large arrays are implicitly zeroed by the OS.