Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

string - How to read a dataset from a txt file in Python?

I have a dataset in this format:

example data

I need to import the data and work with it.

The main problem is that the first and the fourth columns are strings while the second and third columns are floats and ints, respectively.

I'd like to put the data in a matrix or at least obtain a list of each column's data.

I tried to read the whole dataset as a string but it's a mess:

f = open ( 'input.txt' , 'r')
l = [ map(str,line.split('')) for line in f ]

What could be a good solution?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use pandas. They are great for reading csv files, tab delimited files etc. Pandas will almost all the time read the data type correctly and put them in an numpy array when accessed using rows/columns as demonstrated.

I used this tab delimited 'test.txt' file:

    bbbbffdd    434343  228 D 
    bbbWWWff    43545343    289 E
    ajkfbdafa   2345345 2312    F

Here is the pandas code. Your file will be read in a nice dataframe using one line in python. You can change the 'sep' value to anything else to suit your file.

    import pandas as pd
    X = pd.read_csv('test.txt', sep="", header=None)

Then try:

    print X
            0         1     2   3
    0   bbbbffdd    434343   228  D 
    1   bbbWWWff  43545343   289   E
    2  ajkfbdafa   2345345  2312   F

    print X[0]
    0     bbbbffdd
    1     bbbWWWff
    2    ajkfbdafa

    print X[2]
    0     228
    1     289
    2    2312

    print X[1][1:]
    1    43545343
    2     2345345

You can add column names as:

    X.columns = ['random_letters', 'number', 'simple_number', 'letter']

And then get the columns as:

    X['number'].values
    array([  434343, 43545343,  2345345])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...