I am trying to exclude only the empty rows that are at the end of a data.table. Is there a packaged and fast way of doing it?
EDIT1: that is, selection criteria: drop all rows that are empty (NAs in all columns for that row) AND for which all subsequent rows are also empty (or is the last empty row itself)
I came up with the solution below, which works but is too slow (I am using this function on thousands of tables), probably because of the while loop.
## Aux function to remove NA rows below table
remove_empty_row_last <- function(dt){
dt[ ,row_empty:=rowSums(is.na(dt))==ncol(dt)]
while (dt[.N,row_empty]==TRUE) {
dt <- dt[1:(.N-1)]
}
dt %>% return()
}
d <- data.table(a=c(1,NA,3,NA,5,NA,NA),b=c(1,NA,3,4,5,NA,NA))
remove_empty_row_last(d)
#EDIT2: adding more test cases
d2 <- data.table(A=c(1,NA,3,NA,5,1 ,NA),B=c(1,NA,3,4,5,NA,NA))
remove_empty_row_last(d2)
d3 <- data.table(A=c(1,NA,3,NA,5,NA,NA),B=c(1,NA,3,4,5,1,NA))
remove_empty_row_last(d3)
#Edit3:adding no NA rows test case
d4 <- data.table(A=c(1,2,3,NA,5,NA,NA),B=c(1,2,3,4,5,1,7))
d4 %>% remove_empty_row_last()