You are getting this error because the data.frame
/ data.table
created by the join has more than 2^31 - 1
rows (2,147,483,647).
Due to the way vectors are constructed internally by R, the maximum length of any vector is 2^31 - 1
elements (see: https://stackoverflow.com/a/5234293/2341679). Since a data.frame
/ data.table
is really a list()
of vectors, this limit also applies to the number of rows.
As other people have commented and answered, unfortunately you won't be able to construct this data.table
, and its likely there are that many rows because of duplicate matches between your two data.tables
(these may or may not be intentional on your part).
The good news is, if the duplicate matches are not errors, and you still want to perform the join, there is a way around it: you just need to do whatever computation you wanted to do on the resulting data.table
in the same call as the join using the data.table[]
operator, e.g.
:
dt_left[dt_right, on = .(GVKEY, YEAR),
j = .(sum(firm_related_wealth), mean(fracdirafterindep),
by = .EACHI]
If you're not familiar with the data.table
syntax, you can perform calculations on columns within a data.table
as shown above using the j
argument. When performing a join using this syntax, computation in j
is performed on the data.table
created by the join.
The key here is the by = .EACHI
argument. This breaks the join (and subsequent computation in j
) down into smaller components: one data.table
for each row in dt_right
and its matches in dt_left
, avoiding the problem of creating a data.table
with > 2^31 - 1
rows.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…