reading csv files in scipy/numpy in Python

Question

Welcome To Ask or Share your Answers For Others

reading csv files in scipy/numpy in Python

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

reading csv files in scipy/numpy in Python

I am having trouble reading a csv file, delimited by tabs, in python. I use the following function:

def csv2array(filename, skiprows=0, delimiter='', raw_header=False, missing=None, with_header=True):
    """
    Parse a file name into an array. Return the array and additional header lines. By default,
    parse the header lines into dictionaries, assuming the parameters are numeric,
    using 'parse_header'.
    """
    f = open(filename, 'r')
    skipped_rows = []
    for n in range(skiprows):
        header_line = f.readline().strip()
        if raw_header:
            skipped_rows.append(header_line)
        else:
            skipped_rows.append(parse_header(header_line))
    f.close()
    if missing:
        data = genfromtxt(filename, dtype=None, names=with_header,
                          deletechars='', skiprows=skiprows, missing=missing)
    else:
    if delimiter != '':
        data = genfromtxt(filename, dtype=None, names=with_header, delimiter=delimiter,
                  deletechars='', skiprows=skiprows)
    else:
        data = genfromtxt(filename, dtype=None, names=with_header,
                  deletechars='', skiprows=skiprows)        
    if data.ndim == 0:
    data = array([data.item()])
    return (data, skipped_rows)

the problem is that genfromtxt complains about my files, e.g. with the error:

Line #27100 (got 12 columns instead of 16)

I am not sure where these errors come from. Any ideas?

Here's an example file that causes the problem:

#Gene   120-1   120-3   120-4   30-1    30-3    30-4    C-1 C-2 C-5 genesymbol  genedesc
ENSMUSG00000000001  7.32    9.5 7.76    7.24    11.35   8.83    6.67    11.35   7.12    Gnai3   guanine nucleotide binding protein alpha
ENSMUSG00000000003  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Pbsn    probasin

Is there a better way to write a generic csv2array function? thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:18:23+0000

Check out the python CSV module: http://docs.python.org/library/csv.html

import csv
reader = csv.reader(open("myfile.csv", "rb"), 
                    delimiter='', quoting=csv.QUOTE_NONE)

header = []
records = []
fields = 16

if thereIsAHeader: header = reader.next()

for row, record in enumerate(reader):
    if len(record) != fields:
        print "Skipping malformed record %i, contains %i fields (%i expected)" %
            (record, len(record), fields)
    else:
        records.append(record)

# do numpy stuff.

Categories

reading csv files in scipy/numpy in Python

reading csv files in scipy/numpy in Python

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags