Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

php - recursive regular expression to process nested strings enclosed by {| and |}

In a project I have a text with patterns like that:

{| text {| text |} text |}
more text

I want to get the first part with brackets. For this I use preg_match recursively. The following code works fine already:

preg_match('/{((?>[^{}]+)|(?R))*}/x',$text,$matches);

But if I add the symbol "|", I got an empty result and I don't know why:

preg_match('/{|((?>[^{}]+)|(?R))*|}/x',$text,$matches);

I can't use the first solution because in the text something like { text } can also exist. Can somebody tell me what I do wrong here? Thx

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Try this:

'/(?s){|(?:(?:(?!{|||}).)++|(?R))*|}/'

In your original regex you use the character class [^{}] to match anything except a delimiter. That's fine when the delimiters are only one character, but yours are two characters. To not-match a multi-character sequence you need something this:

(?:(?!{|||}).)++

The dot matches any character (including newlines, thank to the (?s)), but only after the lookahead has determined that it's not part of a {| or |} sequence. I also dropped your atomic group ((?>...)) and replaced it with a possessive quantifier (++) to reduce clutter. But you should definitely use one or the other in that part of the regex to prevent catastrophic backtracking.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...