The tracemem
function (R needs to be compiled to support it) provides an indication of when copying occurs. Here's what you do
> a <- 1:1000000; tracemem(a)
[1] "<0x7f791b39e010>"
> a[1] = 2
tracemem[0x7f791b39e010 -> 0x7f791a9d4010]:
and indeed there's a copy. But this is because you're coercing a
from an integer vector (1:1000000
creates a sequence of integers) to a numeric vector (because 2
is a numeric value, and R coerces to a common type). If instead you update your integer vector with an integer value, or a numeric vector with a numeric value, there is no copying
> a <- 1:1000000; tracemem(a)
[1] "<0x7f791a4ef010>"
> a[1] = 2L
> a = c(1, 2, 3); tracemem(a)
[1] "<0x5180470>"
> a[1] = 2
>
A little bit further insight comes from understanding at a superficial level how R's memory management works. Each allocation has a NAMED level associated with it. NAMED=0 or 1 indicates that there is at most 1 symbol that refers to it; it is therefore safe to copy in place. NAMED=2 means that there is, or has been, at least 2 symbols pointing to the same location, and that any attempt to update the value requires a duplication to preserve R's illusion of 'copy on change'. The following reveals some of the internal structure of a
, including that it of type INTSXP (integer) with NAM(1) (NAMED level 1) and that it's being TRaced. Hence updating (with an integer!) does not require a copy.
> a = 1:10; tracemem(a); .Internal(inspect(a))
[1] "<0x5170818>"
@5170818 13 INTSXP g0c4 [NAM(1),TR] (len=10, tl=0) 1,2,3,4,5,...
> a[1] = 2L
>
On the other had, here two symbols refer to the location in memory, hence NAMED is 2 and a copy is required
> a = b = 1:10; tracemem(a); .Internal(inspect(a))
[1] "<0x576d1a0>"
@576d1a0 13 INTSXP g0c4 [NAM(2),TR] (len=10, tl=0) 1,2,3,4,5,...
> a[1] = 2L
tracemem[0x576d1a0 -> 0x576d148]:
It is difficult to reason about NAMED, so at some level these types of games have a level of futility about them.
inspect
returns other information. Each R type is represented internally as an 'SEXP' (S-expression) type. These are enumerate, and the 13th SEXP type is an integer SEXP -- hence 13 INTSXP
. Check out .Internal(inspect(...))
for a numeric vector, character vector, or even function .Internal(inspect(function() {}))
.
R manages memory by periodically running a 'garbage collector' that checks to see if memory is currently referenced; if it is not, then it is reclaimed for use by another symbol. The garbage collector is 'generational', which means that recently allocated memory is checked for reclamation more frequently than older memory (this is because, empirically, variables tend to have a short half-life, e.g., for the duration of a function call, so recently allocated memory is more likely to be available for reclamation than memory that has been in use for a longer time). The g0c4
and similar annotations are providing information about the generation the SEXP belongs to.
The TR
represents a 'bit' set in the SEXP to indicate that the variable is being traced; it was set when we said tracemem(a)
.
Some of these topics are discussed in the documentation of R's internal implementation RShowDoc("R-ints")
and in the C header file Rinternals.h.