Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
950 views
in Technique[技术] by (71.8m points)

python - Pandas Groupby How to Show Zero Counts in DataFrame

I have the following Pandas dataframe:

Name   | EventSignupNo | Attended | Points
Smith  | 0145          | Y        | 20.24
Smith  | 0174          | Y        | 29.14
Smith  | 0239          | N        | 0
Adams  | 0145          | N        | 0
Adams  | 0174          | Y        | 33.43
Morgan | 0239          | Y        | 31.23
Morgan | 0244          | Y        | 23.15

and what I'd like is a count of the number of events attended and not attended per person, and the sum of their points, per person. So I do a groupby: df.groupby([Name, Attended]).agg({"Attended": "count", "Points": "sum"}).rename(columns = {"Attended: "Count"}).reset_index()

which would give me something like:

Name   | Attended | Count | Points
Smith  | Y        | 2     | 49.38
Smith  | N        | 1     | 0
Adams  | Y        | 1     | 33.43
Adams  | N        | 1     | 0
Morgan | Y        | 2     | 54.38

but I'd want something like:

Name   | Attended | Count | Points
Smith  | Y        | 2     | 49.38
Smith  | N        | 1     | 0
Adams  | Y        | 1     | 33.43
Adams  | N        | 1     | 0
Morgan | Y        | 2     | 54.38
Morgan | N        | 0     | 0

I tried playing around with pd.MultiIndex to try to fill the missing zero count, but to no avail. I've read the other similar questions but I'm having trouble dealing with the continuous Points column using MultiIndex. Any idea how to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You could do this with groupby + agg. For your exact output with Y and N at each level, you'd need reindex:

g = df.groupby(['Name', 'Attended'], sort=False).Points.agg(['count', 'sum'])

g
                 count    sum
Name   Attended              
Smith  Y             2  49.38
       N             1   0.00
Adams  N             1   0.00
       Y             1  33.43
Morgan Y             2  54.38
idx = pd.MultiIndex.from_product([g.index.levels[0], ['Y', 'N']])

idx
MultiIndex(levels=[['Adams', 'Morgan', 'Smith'], ['N', 'Y']],
           labels=[[2, 2, 0, 0, 1, 1], [1, 0, 1, 0, 1, 0]])


g.reindex(idx, fill_value=0)

          count    sum
Smith  Y      2  49.38
       N      1   0.00
Adams  Y      1  33.43
       N      1   0.00
Morgan Y      2  54.38
       N      0   0.00

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...