Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
380 views
in Technique[技术] by (71.8m points)

python - Why does df.apply(tuple) work but not df.apply(list)?

Here's a dataframe:

    A  B  C
0   6  2 -5
1   2  5  2
2  10  3  1
3  -5  2  8
4   3  6  2

I could retrieve a column which is basically a tuple of columns from the original df using df.apply:

out = df.apply(tuple, 1)
print(out)

0    (6, 2, -5)
1     (2, 5, 2)
2    (10, 3, 1)
3    (-5, 2, 8)
4     (3, 6, 2)
dtype: object

But if I want a list of values instead of a tuple of them, I can't do it, because it doesn't give me what I expect:

out = df.apply(list, 1)
print(out)

    A  B  C
0   6  2 -5
1   2  5  2
2  10  3  1
3  -5  2  8
4   3  6  2

Instead, I need to do:

out = pd.Series(df.values.tolist())
print(out)

0    [6, 2, -5]
1     [2, 5, 2]
2    [10, 3, 1]
3    [-5, 2, 8]
4     [3, 6, 2]
dtype: object

Why can't I use df.apply(list, 1) to get what I want?


Appendix

Timings of some possible workarounds:

df_test = pd.concat([df] * 10000, 0)

%timeit pd.Series(df.values.tolist()) # original workaround
10000 loops, best of 3: 161 μs per loop

%timeit df.apply(tuple, 1).apply(list, 1) # proposed by Alexander
1000 loops, best of 3: 615 μs per loop
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The culprit is here. With func=tuple it works, but using func=list raises an exception from within the compiled module lib.reduce:

ValueError: ('function does not reduce', 0)

As you can see, they catch the exception but don't bother to handle it.

Even without the too-broad except clause, that's a bug in pandas. You might try to raise it on their tracker, but similar issues have been closed with some flavour of wont-fix or dupe.

16321: weird behavior using apply() creating list based on current columns

15628: Dataframe.apply does not always return a Series when reduce=True

That latter issue got closed, then reopened, and converted into a docs enhancement request some months ago, and now seems to be being used as a dumping ground for any related issues.

Presumably it's not a high priority because, as piRSquared commented (and one of the pandas maintainers commented the same), you're better off with a list comprehension:

pd.Series([list(x) for x in df.itertuples(index=False)])

Typically apply would be using a numpy ufunc or similar.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...