python - How to append rows in a pandas dataframe in a for loop?

Question

Welcome To Ask or Share your Answers For Others

python - How to append rows in a pandas dataframe in a for loop?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to append rows in a pandas dataframe in a for loop?

I have the following for loop:

for i in links:
     data = urllib2.urlopen(str(i)).read()
     data = json.loads(data)
     data = pd.DataFrame(data.items())
     data = data.transpose()
     data.columns = data.iloc[0]
     data = data.drop(data.index[[0]])

Each dataframe so created has most columns in common with the others but not all of them. Moreover, they all have just one row. What I need to to is to add to the dataframe all the distinct columns and each row from each dataframe produced by the for loop

I tried pandas concatenate or similar but nothing seemed to work. Any idea? Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:08:46+0000

Suppose your data looks like this:

import pandas as pd
import numpy as np

np.random.seed(2015)
df = pd.DataFrame([])
for i in range(5):
    data = dict(zip(np.random.choice(10, replace=False, size=5),
                    np.random.randint(10, size=5)))
    data = pd.DataFrame(data.items())
    data = data.transpose()
    data.columns = data.iloc[0]
    data = data.drop(data.index[[0]])
    df = df.append(data)
print('{}
'.format(df))
# 0   0   1   2   3   4   5   6   7   8   9
# 1   6 NaN NaN   8   5 NaN NaN   7   0 NaN
# 1 NaN   9   6 NaN   2 NaN   1 NaN NaN   2
# 1 NaN   2   2   1   2 NaN   1 NaN NaN NaN
# 1   6 NaN   6 NaN   4   4   0 NaN NaN NaN
# 1 NaN   9 NaN   9 NaN   7   1   9 NaN NaN

Then it could be replaced with

np.random.seed(2015)
data = []
for i in range(5):
    data.append(dict(zip(np.random.choice(10, replace=False, size=5),
                         np.random.randint(10, size=5))))
df = pd.DataFrame(data)
print(df)

In other words, do not form a new DataFrame for each row. Instead, collect all the data in a list of dicts, and then call df = pd.DataFrame(data) once at the end, outside the loop.

Each call to df.append requires allocating space for a new DataFrame with one extra row, copying all the data from the original DataFrame into the new DataFrame, and then copying data into the new row. All that allocation and copying makes calling df.append in a loop very inefficient. The time cost of copying grows quadratically with the number of rows. Not only is the call-DataFrame-once code easier to write, its performance will be much better -- the time cost of copying grows linearly with the number of rows.

Categories

python - How to append rows in a pandas dataframe in a for loop?

python - How to append rows in a pandas dataframe in a for loop?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags