Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
538 views
in Technique[技术] by (71.8m points)

python - Pandas .str.replace and case insensitivity

Making the replace case insensitive does not seem to have an effect in the following example (I want to replace jr. or Jr. with jr):

In [0]: pd.Series('Jr. eng').str.replace('jr.', 'jr', regex=False, case=False)
Out[0]: 0    Jr. eng

Why? What am I misunderstanding?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The case argument is actually a convenience as an alternative to specifying flags=re.IGNORECASE. It has no bearing on replacement if the replacement is not regex-based.

So, when regex=True, these are your possible choices:

pd.Series('Jr. eng').str.replace(r'jr.', 'jr', regex=True, case=False)
# pd.Series('Jr. eng').str.replace(r'jr.', 'jr', case=False)

0    jr eng
dtype: object

Or,

pd.Series('Jr. eng').str.replace(r'jr.', 'jr', regex=True, flags=re.IGNORECASE)
# pd.Series('Jr. eng').str.replace(r'jr.', 'jr', flags=re.IGNORECASE)

0    jr eng
dtype: object

You can also get cheeky and bypass both keyword arguments by incorporating the case insensitivity flag as part of the pattern as ?i. See

pd.Series('Jr. eng').str.replace(r'(?i)jr.', 'jr')
0    jr eng
dtype: object

Note
You will need to escape the period . in regex mode, because the unescaped dot is a meta-character with a different meaning (match any character). If you want to dynamically escape meta-chars in patterns, you can use re.escape.

For more information on flags and anchors, see this section of the docs and re HOWTO.


From the source code, it is clear that the "case" argument is ignored if regex=False. See

# Check whether repl is valid (GH 13438, GH 15055)
if not (is_string_like(repl) or callable(repl)):
    raise TypeError("repl must be a string or callable")

is_compiled_re = is_re(pat)
if regex:
    if is_compiled_re:
        if (case is not None) or (flags != 0):
            raise ValueError("case and flags cannot be set"
                             " when pat is a compiled regex")
    else:
        # not a compiled regex
        # set default case
        if case is None:
            case = True

        # add case flag, if provided
        if case is False:
            flags |= re.IGNORECASE
    if is_compiled_re or len(pat) > 1 or flags or callable(repl):
        n = n if n >= 0 else 0
        compiled = re.compile(pat, flags=flags)
        f = lambda x: compiled.sub(repl=repl, string=x, count=n)
    else:
        f = lambda x: x.replace(pat, repl, n)

You can see the case argument is only checked inside the if statement.

IOW, the only way is to ensure regex=True so that replacement is regex-based.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...