Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
765 views
in Technique[技术] by (71.8m points)

perl - When is G useful application in a regex?

I am not clear on the use/need of the G operator.
I read in the perldoc:

You use the G anchor to start the next match on the same string where the last match left off.

I don't really understand this statement. When we use g we usually move to the character after the last match anyway.
As the example shows:

$_ = "1122a44";  
my @pairs = m/(dd)/g;   # qw( 11 22 44 )  

Then it says:

If you use the G anchor, you force the match after 22 to start with the a:

$_ = "1122a44";
my @pairs = m/G(dd)/g;

The regular expression cannot match there since it does not find a digit, so the next match fails and the match operator returns the pairs it already found

I don't understand this either. "If you use the G anchor, you force the match after 22 to start with a." But without the G the matching will be attempted at a anyway right? So what is the meaning of this sentence?
I see that in the example the only pairs printed are 11 and 22. So 44 is not tried.

The example also shows that using c option makes it index 44 after the while.

To be honest, from all these I can not understand what is the usefulness of this operator and when it should be applied.
Could someone please help me understand this, perhaps with a meaningful example?

Update
I think I did not understand this key sentence:

If you use the G anchor, you force the match after 22 to start with the a . The regular expression cannot match there since it does not find a digit, so the next match fails and the match operator returns the pairs it already found.

This seems to mean that when the match fails, the regex does not proceed further attempts and is consistent with the examples in the answers

Also:

After the match fails at the letter a , perl resets pos() and the next match on the same string starts at the beginning.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

G is an anchor; it indicates where the match is forced to start. When G is present, it can't start matching at some arbitrary later point in the string; when G is absent, it can.

It is most useful in parsing a string into discrete parts, where you don't want to skip past other stuff. For instance:

my $string = " a 1 # ";
while () {
    if ( $string =~ /Gs+/gc ) {
        print "whitespace
";
    }
    elsif ( $string =~ /G[0-9]+/gc ) {
        print "integer
";
    }
    elsif ( $string =~ /Gw+/gc ) {
        print "word
";
    }
    else {
        print "done
";
        last;
    }
}

Output with G's:

whitespace
word
whitespace
integer
whitespace
done

without:

whitespace
whitespace
whitespace
whitespace
done

Note that I am demonstrating using scalar-context /g matching, but G applies equally to list context /g matching and in fact the above code is trivially modifiable to use that:

my $string = " a 1 # ";
my @matches = $string =~ /G(?:(s+)|([0-9]+)|(w+))/g;
while ( my ($whitespace, $integer, $word) = splice @matches, 0, 3 ) {
    if ( defined $whitespace ) {
        print "whitespace
";
    }
    elsif ( defined $integer ) {
        print "integer
";
    }
    elsif ( defined $word ) {
        print "word
";
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...