I have reduced your problem to this:
my $text = 'M Y H A P P Y T E X T';
my $regex = '(?<!st)A';
print ($text =~ m/$regex/i ? "true
" : "false
");
Due to presence of /i
(case insensitive) modifier and presence of certain character combinations such as "ss"
or "st"
that can be replaced by a Typographic_ligature causing it to be a variable length (/August/i
matches for instance on both AUGUST
(6 characters) and augu?
(5 characters, the last one being U+FB06)).
However if we remove /i
(case insensitive) modifier then it works because typographic ligatures are not matched.
Solution: Use aa
modifiers i.e.:
/(?<!st)A/iaa
Or in your regex:
my $text = 'M Y H A P P Y T E X T';
my $regex = '(?<!(Mon|Fri|Sun)day |August )abcd';
print ($text =~ m/$regex/iaa ? "true
" : "false
");
From perlre:
To forbid ASCII/non-ASCII matches (like "k" with "N{KELVIN SIGN}"), specify the "a" twice, for example /aai
or /aia
. (The first occurrence of "a" restricts the d
, etc., and the second occurrence adds the "/i" restrictions.) But, note that code points outside the ASCII range will use Unicode rules for /i
matching, so the modifier doesn't really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.
See a closely related discussion here
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…