Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
513 views
in Technique[技术] by (71.8m points)

python - Merge files with similar name convention to a dataframe

I have a list of files stored in directory such as

filenames=[
        abc_1.txt
        abc_2.txt
        abc_3.txt

        bcd_1.txt
        bcd_2.txt
        bcd_3.txt
       ]

pattern=[abc]

I want to read multiple txt files into one dataframe such that all files starting with abc will be in one dataframe then all all filename starting with bcd etc.

My code:

file_path = '/home/iolie/Downloads/test/'
filenames = os.listdir(file_path)


prefixes = list(set(i.split('_')[0] for i in filenames))

for prefix in prefixes:
    print('Reading files with prefix:',prefix)
    for file in filenames: 
        if file.startswith(prefix):
            print('Reading files:',file)
            list_of_dfs = [pd.concat([pd.read_csv(os.path.join(file_path, file), header=None) ],ignore_index=True)]
            final = pd.concat(list_of_dfs)

This code doesnt't append but overwrites the dataframe. Can someone help wih this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A better idea than creating an arbitrary number of unlinked dataframes is to output a dictionary of dataframes, where the key is the prefix:

from collections import defaultdict

filenames = ['abc_1.txt', 'abc_2.txt', 'abc_3.txt',
             'bcd_1.txt', 'bcd_2.txt', 'bcd_3.txt']

dd = defaultdict(list)

for fn in filenames:
    dd[fn.split('_')[0]].append(fn)

dict_of_dfs = {}
for k, v in dd.items():
    dict_of_dfs[k] = pd.concat([pd.read_csv(fn) for fn in v], ignore_index=True)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...