It is hard to implement this using the bitwise or
operator because pandas.DataFrame
implements it. If you don't mind replacing |
with >>
, you can try this:
import pandas as pd
def select(df, *args):
cols = [x for x in args]
return df[cols]
def rename(df, **kwargs):
for name, value in kwargs.items():
df = df.rename(columns={'%s' % name: '%s' % value})
return df
class SinkInto(object):
def __init__(self, function, *args, **kwargs):
self.args = args
self.kwargs = kwargs
self.function = function
def __rrshift__(self, other):
return self.function(other, *self.args, **self.kwargs)
def __repr__(self):
return "<SinkInto {} args={} kwargs={}>".format(
self.function,
self.args,
self.kwargs
)
df = pd.DataFrame({'one' : [1., 2., 3., 4., 4.],
'two' : [4., 3., 2., 1., 3.]})
Then you can do:
>>> df
one two
0 1 4
1 2 3
2 3 2
3 4 1
4 4 3
>>> df = df >> SinkInto(select, 'one')
>> SinkInto(rename, one='new_one')
>>> df
new_one
0 1
1 2
2 3
3 4
4 4
In Python 3 you can abuse unicode:
>>> print('u01c1')
∥
>>> ∥ = SinkInto
>>> df >> ∥(select, 'one') >> ∥(rename, one='new_one')
new_one
0 1
1 2
2 3
3 4
4 4
[update]
Thanks for your response. Would it be possible to make a separate class (like SinkInto) for each function to avoid having to pass the functions as an argument?
How about a decorator?
def pipe(original):
class PipeInto(object):
data = {'function': original}
def __init__(self, *args, **kwargs):
self.data['args'] = args
self.data['kwargs'] = kwargs
def __rrshift__(self, other):
return self.data['function'](
other,
*self.data['args'],
**self.data['kwargs']
)
return PipeInto
@pipe
def select(df, *args):
cols = [x for x in args]
return df[cols]
@pipe
def rename(df, **kwargs):
for name, value in kwargs.items():
df = df.rename(columns={'%s' % name: '%s' % value})
return df
Now you can decorate any function that takes a DataFrame
as the first argument:
>>> df >> select('one') >> rename(one='first')
first
0 1
1 2
2 3
3 4
4 4
Python is awesome!
I know that languages like Ruby are "so expressive" that it encourages people to write every program as new DSL, but this is kind of frowned upon in Python. Many Pythonists consider operator overloading for a different purpose as a sinful blasphemy.
[update]
User OHLáLá is not impressed:
The problem with this solution is when you are trying to call the function instead of piping. – OHLáLá
You can implement the dunder-call method:
def __call__(self, df):
return df >> self
And then:
>>> select('one')(df)
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0
Looks like it is not easy to please OHLáLá:
In that case you need to call the object explicitly:
select('one')(df)
Is there a way to avoid that? – OHLáLá
Well, I can think of a solution but there is a caveat: your original function must not take a second positional argument that is a pandas dataframe (keyword arguments are ok). Lets add a __new__
method to our PipeInto
class inside the docorator that tests if the first argument is a dataframe, and if it is then we just call the original function with the arguments:
def __new__(cls, *args, **kwargs):
if args and isinstance(args[0], pd.DataFrame):
return cls.data['function'](*args, **kwargs)
return super().__new__(cls)
It seems to work but probably there is some downside I was unable to spot.
>>> select(df, 'one')
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0
>>> df >> select('one')
one
0 1.0
1 2.0
2 3.0
3 4.0
4 4.0