I have a list of articles obtained from an API, with my dataframe looking like this:
PMID Year Title Journal Author
33326729 2020 Avelumab Maintenance PLoS biology T., Powles
33326729 2020 Avelumab Maintenance PLoS biology B., Huang
33326729 2020 Avelumab Maintenance PLoS biology A., Di Pietro
Which I need to merge to this:
PMID Year Title Journal Author-1 Author-2 Author-3
33326729 2020 Avelumab Maintenance PLoS biology T., Powles B., Huang A., Di Pietro
So basically, I need the articles to have the authors merged in a single row. I thought of sorting by id by doing as follows:
test <- setDT(PubMed_df)[, lapply(.SD, function(x) toString(na.omit(x))), by = "pmid"]
Outputs:
33326729 2020,2020,2020 Avelumab Maintenance,Avelumab Maintenance,Avelumab Maintenance PLoS biology,PLoS biology,PLoS biology T., Powles,B., Huang,A., Di Pietro
However, that generates the data with commas instead of separate columns. Does anyone know of a different function or how to adapt the setDT function in order to get my intended result? Thanks in advance
Edit:
As requested the output of dput(head(PubMed_df))
:
structure(list(pmid = c("33326729", "33326729", "33326729", "33320856",
"33320856", "33320856"), year = c("2020", "2020", "2020", "2021",
"2021", "2021"), month = c("12", "12", "12", "01", "01", "01"
), day = c("21", "21", "21", "07", "07", "07"), lastname = c("Powles",
"Huang", "di Pietro", "Reijns", "Thompson", "Acosta"), firstname = c("Thomas",
"Bo", "Alessandra", "Martin A M", "Louise", "Juan Carlos"), address = c("St. Bartholomew's Hospital, London, United Kingdom thomas.powles1@nhs",
"Pfizer, Groton, CT", "Pfizer, Milan, Italy", "MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, United Kingdom",
"The South East of Scotland Clinical Genetic Service, Western General Hospital, NHS Lothian, Edinburgh, United Kingdom",
"Cancer Research UK Edinburgh Centre, MRC Institute of Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, United Kingdom"
), journal = c("The New England journal of medicine", "The New England journal of medicine",
"The New England journal of medicine", "PLoS biology", "PLoS biology",
"PLoS biology"), title = c("Avelumab Maintenance for Urothelial Carcinoma. Reply.",
"Avelumab Maintenance for Urothelial Carcinoma. Reply.", "Avelumab Maintenance for Urothelial Carcinoma. Reply.",
"A sensitive and affordable multiplex RT-qPCR assay for SARS-CoV-2 detection.",
"A sensitive and affordable multiplex RT-qPCR assay for SARS-CoV-2 detection.",
"A sensitive and affordable multiplex RT-qPCR assay for SARS-CoV-2 detection."
), abstract = c(NA, NA, NA, "", "", ""), doi = c("10.1056/NEJMc2032018",
"10.1056/NEJMc2032018", "10.1056/NEJMc2032018", "10.1371/journal.pbio.3001030",
"10.1371/journal.pbio.3001030", "10.1371/journal.pbio.3001030"
), keywords = c("Antibodies, Monoclonal; Antibodies, Monoclonal, Humanized; Carcinoma, Transitional Cell; Humans; Urologic Neoplasms",
"Antibodies, Monoclonal; Antibodies, Monoclonal, Humanized; Carcinoma, Transitional Cell; Humans; Urologic Neoplasms",
"Antibodies, Monoclonal; Antibodies, Monoclonal, Humanized; Carcinoma, Transitional Cell; Humans; Urologic Neoplasms",
"COVID-19; COVID-19 Testing; Humans; Multiplex Polymerase Chain Reaction; RNA, Viral; Reverse Transcriptase Polymerase Chain Reaction; SARS-CoV-2",
"COVID-19; COVID-19 Testing; Humans; Multiplex Polymerase Chain Reaction; RNA, Viral; Reverse Transcriptase Polymerase Chain Reaction; SARS-CoV-2",
"COVID-19; COVID-19 Testing; Humans; Multiplex Polymerase Chain Reaction; RNA, Viral; Reverse Transcriptase Polymerase Chain Reaction; SARS-CoV-2"
)), row.names = c(NA, 6L
), class = c("data.table", "data.frame"))
Edit 2:
Highly detailed and specific request:
I need to get the data of which the head is presented above into a form in which every row has:
PMID | Date of publication | Author 1 | Affiliation | Address | City | State (if US) | Country | Author 2 | Affiliation of Author 2 | Address | City | State (if US) | Country | And so on for each co-Author | Journal | Title | Abstract* | MH term
I will have to break up the addresses but thats something ill be focusing on later. For now my goal is to just get all the info for every author added to the right article without having 3 rows of the same article.
Edit 2 - To be used to get the answer from @r2evans to work in my case:
The answer provided works if you use dcast as data.table::dcast!