I think need:
out = pd.DataFrame({ 'Date': pd.to_datetime(['2015-01-01','2015-05-01','2015-07-01','2015-10-01','2015-04-01','2015-12-01','2016-01-01','2016-02-01','2015-05-01', '2015-10-01']), 'Churn': ['Yes'] * 8 + ['No'] * 2 })
print (out)
Churn Date
0 Yes 2015-01-01
1 Yes 2015-05-01
2 Yes 2015-07-01
3 Yes 2015-10-01
4 Yes 2015-04-01
5 Yes 2015-12-01
6 Yes 2016-01-01
7 Yes 2016-02-01
8 No 2015-05-01
9 No 2015-10-01
df = (out.loc[out['Churn'] == 'Yes']
.groupby([out["Date"].dt.year,out["Date"].dt.quarter])["Churn"]
.count()
.rename_axis(('year','quarter'))
.reset_index(name='count'))
print(df)
year quarter count
0 2015 1 1
1 2015 2 2
2 2015 3 1
3 2015 4 2
4 2016 1 2
For separate DataFrames by years is possible create dictionary of DataFrames
:
dfs = dict(tuple(out.groupby(out['Date'].dt.year)))
print (dfs)
{2016: Churn Date
6 Yes 2016-01-01
7 Yes 2016-02-01, 2015: Churn Date
0 Yes 2015-01-01
1 Yes 2015-05-01
2 Yes 2015-07-01
3 Yes 2015-10-01
4 Yes 2015-04-01
5 Yes 2015-12-01
8 No 2015-05-01
9 No 2015-10-01}
print (dfs.keys())
dict_keys([2016, 2015])
print (dfs[2015])
Churn Date
0 Yes 2015-01-01
1 Yes 2015-05-01
2 Yes 2015-07-01
3 Yes 2015-10-01
4 Yes 2015-04-01
5 Yes 2015-12-01
8 No 2015-05-01
9 No 2015-10-01
Tenure column looks like this
out["tenure"].unique()
Out[14]:
array([ 8, 15, 32, 9, 48, 58, 10, 29, 1, 66, 24, 68, 4, 53, 6, 20, 52,
49, 71, 2, 65, 67, 27, 18, 47, 45, 43, 59, 13, 17, 72, 61, 34, 11,
35, 69, 63, 30, 19, 39, 3, 46, 54, 36, 12, 41, 50, 40, 28, 44, 51,
33, 21, 70, 23, 16, 56, 14, 62, 7, 25, 31, 60, 5, 42, 22, 37, 64,
57, 38, 26, 55])
It contains no of months, it seems like 1 to 72.
I need to split tenure column into "range".
For example, this column contains 1 to 72 numbers, need to range up to 4 range.
like 1 to 18 --> 1 range
19 to 36 --> 2nd range
37 to 54 --> 3rd range like that
here i found quarterlywise churn count and with that churn count later i found churn rate with churn count and total count.
quarterly_churn_yes = out.loc[out['Churn'] == 'Yes'].groupby([out["Date"].dt.year,out["Date"].dt.quarter]).count().rename_axis(('year','quarter'))
quarterly_churn_yes["Churn"]
quarterly_churn_rate = out.groupby(out["Date"].dt.quarter).apply(lambda x: quarterly_churn_yes["Churn"] / total_churn).sum()
print(quarterly_churn_rate)
Like this I need to find tenure wise 4 range to find churn count.