Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

grouping - Dask Dataframe groupby and aggregate for column

I had a pd.DataFrame that I converted to Dask.DataFrame for faster computations. My requirement is that I have to find out the 'Total Views' of a channel.

In pandas it would be, df.groupby(['ChannelTitle'])['VideoViewCount'].sum() but in dask the columns dtypes is object and groupby is taking these as string and not int(see image 2)

To handle above issue, I added two columns separating figure(115) and multiplier(6 for M, 3 for K) of views hoping to do an operation like ddf['new_views_f'] * (10**ddf['new_views_m']), but now I cannot find mul for two columns in dask.

Either I am missing something or complicating the requirement.

enter image description here

enter image description here

question from:https://stackoverflow.com/questions/66053231/dask-dataframe-groupby-and-aggregate-for-column

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It does sound like you are complicating the requirement. For column multiplication, the regular pandas syntax will work (df['c'] = df['a'] * df['b']). In your case, it's possible to use pd.eval to get the actual numeric value for views:

import pandas as pd
import numpy as np
import dask.dataframe as dd
import random

df = pd.DataFrame(15*np.random.rand(15), columns=['views'])
df['views'] = df['views'].round(2).astype('str') + [random.choice(['K views', 'M views']) for _ in range(len(df))]
df['group'] = [random.choice([1,2,3]) for _ in range(len(df))]
ddf = dd.from_pandas(df, npartitions=2)

ddf['views_digits'] = ddf['views'].replace({'K views': '*1e3', 'M views': '*1e6'}, regex=True).map(pd.eval, meta=ddf['group'])
aggregate_df = ddf.groupby(['group']).agg({'views_digits': 'sum'}).compute()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...