python - How can update my cumulative column in a pandas dataframe

Question

Welcome To Ask or Share your Answers For Others

python - How can update my cumulative column in a pandas dataframe

asked Jan 29, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How can update my cumulative column in a pandas dataframe

I have a pandas dataframe named 'df' with 4 columns: date, game_name, total_registered,newly_registered

The total registered column is a cumulative column.

Sample:

+------------+-----------+------------------+------------------+
|    date    | game_name | total_registered | newly_registered |
+------------+-----------+------------------+------------------+
| 2020-12-1  | abc       |               10 |                4 |
| 2020-12-2  | abc       |               14 |                3 |
| 2020-12-3  | abc       |                0 |                5 |
| 2020-12-1  | zzz       |               20 |                2 |
| 2020-12-2  | zzz       |               22 |               10 |
| 2020-12-3  | zzz       |                0 |                5 |
+------------+-----------+------------------+------------------+

As you can see, my cumulative column broke on the 2nd for 'abc' and on the 3rd for 'zzz'

To be safe I would like to redo my cumulative column starting on the 2nd. How can modify the current row total_registered by taking the prior day's total_registered+newly_registered values of its respective game?

Desired Output:

+------------+-----------+------------------+------------------+
|    date    | game_name | total_registered | newly_registered |
+------------+-----------+------------------+------------------+
| 2020-12-1  | abc       |               10 |                4 |
| 2020-12-2  | abc       |               14 |                3 |
| 2020-12-3  | abc       |               17 |                5 |
| 2020-12-1  | zzz       |               20 |                2 |
| 2020-12-2  | zzz       |               22 |               10 |
| 2020-12-3  | zzz       |               32 |                5 |
+------------+-----------+------------------+------------------+

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-29T04:31:00+0000

Concept

pick out first total_registered for each group name it newly_registered and put against epoch date
concat() with actual data
recalculate total_registered using cumsum() and shift() including data from step 1

import io
import datetime as dt
df = pd.read_csv(io.StringIO("""    date    | game_name | total_registered | newly_registered |
| 2020-12-1  | abc       |               10 |                4 |
| 2020-12-2  | abc       |               14 |                3 |
| 2020-12-3  | abc       |                0 |                5 |
| 2020-12-1  | zzz       |               20 |                2 |
| 2020-12-2  | zzz       |               22 |               10 |
| 2020-12-3  | zzz       |                0 |                5 |"""),sep="s*|s*", skipinitialspace=True, engine="python").reset_index(drop=True).drop(columns="Unnamed: 4")

df["date"] = pd.to_datetime(df["date"])

df = (pd.concat([df,
# synthesize first total_registered into a new row as newly_registered
(df.groupby("game_name").first()
 .reset_index()
 .assign(date=pd.to_datetime(dt.date(1970,1,1)),
        newly_registered=lambda dfa: dfa["total_registered"])
)
])
 .sort_values(["game_name","date"])
 # total_registered is now simple cumsum with a shift in group
 .assign(total_registered=lambda dfa: dfa.groupby("game_name")["newly_registered"]
         .transform(lambda x: x.cumsum().shift())
         .fillna(0)
         .astype(int))
 # remove synthesized rows
 .query("date.dt.year>1970")
)

output

      date game_name  total_registered  newly_registered
2020-12-01       abc                10                 4
2020-12-02       abc                14                 3
2020-12-03       abc                17                 5
2020-12-01       zzz                20                 2
2020-12-02       zzz                22                10
2020-12-03       zzz                32                 5

Categories

python - How can update my cumulative column in a pandas dataframe

python - How can update my cumulative column in a pandas dataframe

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

output

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags