Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
185 views
in Technique[技术] by (71.8m points)

python - create a variable iterating over a column in a large dataset in pandas

I have to create a variable named transition to a dataframe which adds 1 to every change in the variable V2010 of each KeyInd.

Here is a sample of the dataframe:

keyInd V1016 V2010
110000016107-1 1 4
110000016107-1 2 4
110000016107-1 3 4
110000016107-1 4 4
110000016107-1 5 2
110000016107-2 1 1
110000016107-2 2 4
110000016107-2 3 3
110000016107-2 4 3
110000016107-2 5 2
question from:https://stackoverflow.com/questions/66068143/create-a-variable-iterating-over-a-column-in-a-large-dataset-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Try using shift(-1) to compare rows, then change the tail(1) to np.nan. Group by your keyInd, and then do the analysis on each grouping. This should avoid row-wise looping.

def transition(x):
    t = np.where(x['V2010']==x['V2010'].shift(-1), 0, 1)
    x['transition'] = np.cumsum(t)
    x['transition'] = x['transition'].astype('float')
    x['transition'].iat[-1] = np.nan
    return x

dft = df.groupby('keyInd').apply(transition)

Output:

In [105]: dft
Out[105]:
           keyInd  V1016  V2010  transition
0  110000016107-1      1      4       0.000
1  110000016107-1      2      4       0.000
2  110000016107-1      3      4       0.000
3  110000016107-1      4      4       1.000
4  110000016107-1      5      2         NaN
5  110000016107-2      1      1       1.000
6  110000016107-2      2      4       2.000
7  110000016107-2      3      3       2.000
8  110000016107-2      4      3       3.000
9  110000016107-2      5      2         NaN

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...