Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
588 views
in Technique[技术] by (71.8m points)

import - Cannot read file with "#" and space using read.table or read.csv in R

I have a file where the first row is a header. The header can have spaces and the # symbol (there may be other special characters as well). I am trying to read this file using read.csv or read.table but it keeps throwing me errors:

undefined columns selected 

more columns than column names 

My tab-delimited chromFile file looks like:

Chromosome# Chr chr Size    UCSC NCBI36/hg18    NCBIBuild36 NCBIBuild37
1   Chr1    chr1    247199719   247249719   247249719   249250621
2   Chr2    chr2    242751149   242951149   242951149   243199373

Command:

chromosomes <- read.csv(chromFile, sep="",skip =0, header = TRUE,  )

I want to first look for a way to read the file as it as without replacing the space or # with some other readable symbol.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

From the documentation (?read.csv):

comment.char character: a character vector of length one containing a single character or an empty string. Use "" to turn off the interpretation of comments altogether.

The default is comment.char = "#" which is causing you trouble. Following the documentation, you should use comment.char = "".

Spaces in the header is another issue which, as mrdwab kindly pointed out, can be addressed by setting check.names = FALSE.

chromosomes <- read.csv(chromFile, sep = "", skip = 0, header = TRUE,
                        comment.char = "", check.names = FALSE)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...