Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

grep - Regex to match post-alveolar consonants

I have a text file called a.txt that has this elements in it:

fall#i#1    fall (as a fruit) (v.)  fall    jatuh   fall (as a fruit) (v.)  jatuh*  t??ampa?
dog#n#1 dog dog anjing  dog anjing  ?and??i
wing#n#1    wing    wing    sayap   wing    sayap   kopa?
fly#i#1 fly (v.)    fly (vb)    terbang fly (v.)    terbang toba?
mosquito#n#1    mosquito    mosquito    nyamuk  mosquito    nyamuk  ?amu?
flower#n#2  flower  flower  bunga (yg jadi buah), kuntum    flower  bunga*  bu?o
sky#n#1 sky sky langit  sky langit* ?la??t

First, I need a regular expression to match the lines that have final post-alveolar consonant like [?ɡ?] the out put should look like this:

fall#i#1    fall (as a fruit) (v.)  fall    jatuh   fall (as a fruit) (v.)  jatuh*  t??ampa?
wing#n#1    wing    wing    sayap   wing    sayap   kopa?
fly#i#1 fly (v.)    fly (vb)    terbang fly (v.)    terbang toba?
mosquito#n#1    mosquito    mosquito    nyamuk  mosquito    nyamuk  ?amu?

Second, I need a regular expression to match the post-alveolar at the beginning of the words out put should look like this:

dog#n#1 dog dog anjing  dog anjing  ?and??i
sky#n#1 sky sky langit  sky langit* ?la??t

Third, I need a regular expression to match the post-alveolar between vowels like this output:

flower#n#2  flower  flower  bunga (yg jadi buah), kuntum    flower  bunga*  bu?o
sky#n#1 sky sky langit  sky langit* ?la??t

I used to use this regex in Ubuntu terminal to match them all:

grep -P '[??ɡk]|[??ɡk]|[aiueo][??ɡk][aiueo]' a.txt

but I couldn't find a regex to match them separately I mean once match post-alveolar at the end another regex match only at the beginning and the other regex match between vowels can any one please help me with that thanks

question from:https://stackoverflow.com/questions/65914231/regex-to-match-post-alveolar-consonants

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The regexps you can use are

grep -P '(*UCP)[?ɡ?]' file           # 1
grep -P '(*UCP)[?ɡ?]' file           # 2
grep -P '[ai?ueo][??ɡk][a?iueo]' file  # 3

Where

  • (*UCP)[?ɡ?] - matches ?, ɡ or ? followed with a Unicode-aware (due to (*UCP) PCRE verb) word boundary
  • (*UCP)[?ɡ?] - matches ?, ɡ or ? preceded with a Unicode-aware (due to (*UCP) PCRE verb) word boundary
  • [ai?ueo][??ɡk][a?iueo] matches ?, ɡ, k or ? in between a, i, ?, u, e and o vowels (NOTE i and ? are not the same letters!)

See a grep demo:

s='fall#i#1    fall (as a fruit) (v.)  fall    jatuh   fall (as a fruit) (v.)  jatuh*  t??ampa?
dog#n#1 dog dog anjing  dog anjing  ?and??i
wing#n#1    wing    wing    sayap   wing    sayap   kopa?
fly#i#1 fly (v.)    fly (vb)    terbang fly (v.)    terbang toba?
mosquito#n#1    mosquito    mosquito    nyamuk  mosquito    nyamuk  ?amu?
flower#n#2  flower  flower  bunga (yg jadi buah), kuntum    flower  bunga*  bu?o
sky#n#1 sky sky langit  sky langit* ?la??t'
grep -P '(*UCP)[?ɡ?]' <<< "$s"
echo "----"
grep -P '(*UCP)[?ɡ?]' <<< "$s"
echo "----"
grep -P '[ai?ueo][??ɡk][a?iueo]' <<< "$s"

Output:

fall#i#1    fall (as a fruit) (v.)  fall    jatuh   fall (as a fruit) (v.)  jatuh*  t??ampa?
wing#n#1    wing    wing    sayap   wing    sayap   kopa?
fly#i#1 fly (v.)    fly (vb)    terbang fly (v.)    terbang toba?
mosquito#n#1    mosquito    mosquito    nyamuk  mosquito    nyamuk  ?amu?
----
dog#n#1 dog dog anjing  dog anjing  ?and??i
sky#n#1 sky sky langit  sky langit* ?la??t
----
flower#n#2  flower  flower  bunga (yg jadi buah), kuntum    flower  bunga*  bu?o
sky#n#1 sky sky langit  sky langit* ?la??t

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...