tidyverse - dplyr: different variables defined based on different lists

Question

Welcome To Ask or Share your Answers For Others

tidyverse - dplyr: different variables defined based on different lists

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

tidyverse - dplyr: different variables defined based on different lists

Take a dataset like

df <- data.frame(ID = 1:3,
                 animal = c("elephant", "bee", "dog))

listofmamals <- c("elephant", "dog", "cat")
listofinsects <- c("bee", "wasp")

To define the variables mamal and insect I could do

df <- df%>%
   mutate(mamal = (animal %in% listofmamals),
          insect = (animal %in% listofinsects))

But I would rather have a more automated approach based on the list:

listoflists <- list(listofmamals, listofinsects)

where, if I extend the listoflists, an extra variable is automatically created based on the additional element. I can think of a for-loop to do this, but ideally, there would be a dplyr-approach.

question from:https://stackoverflow.com/questions/65883450/dplyr-different-variables-defined-based-on-different-lists

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:20:37+0000

Method 1

I think, rather than making lists and lists of lists, you'd be better off making a new table and using bind_rows each time you need to extend your categorisation table, then using left_join to get those categorisations onto your df.

So, for e.g., lets say you've got a categorisation table:

animals_df <- tibble(animal = c("elephant", "dog", "cat"), type = "mammal")

and you want to add to it:

new_animals_df <- tibble(animal = c("bee", "wasp"), type = "insect")

(tibble recycles the "insect" value for the column "type" as it's length 1).

Extend your original categorisation table:

animals_df <- bind_rows(animals_df, new_animals_df)

And now, to define which is which (I know it's totally redundant in this example, but running with it...):

left_join(df, animals_df)

method 2

Ultimately, if you want to use lists and lists of lists, the way I'd do it might be a bit hacky, but I'd just convert the lists into something resembling the above table, and then left_join in the same way:

listofmammals <- c("elephant", "dog", "cat")
listofinsects <-  c("bee", "wasp")
listoflists <- list(listofmammals, listofinsects)

Make a table similar to the one we had before:

animal_types <- listoflists %>% map_df(~tibble(.), .id="type")

And then left_join it to get the categories:

left_join(animals, animal_types, by = c("animal" = "."))

Maybe there's a more elegant way of doing it with lists - in my list method I'm left with the "type" column being just 1's and 2's, which I'd then have to recode, which isn't very satisfactory. Personally, I find the first method far easier to understand and almost just as easy to type out.

Categories

tidyverse - dplyr: different variables defined based on different lists

tidyverse - dplyr: different variables defined based on different lists

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Method 1

method 2

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags