regex - Filter lines that have only alphabets in first column

Question

Welcome To Ask or Share your Answers For Others

regex - Filter lines that have only alphabets in first column

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

regex - Filter lines that have only alphabets in first column

I am working with google english 1gram dataset link here, it looks like the following:

C'ape   1804    1       1
C'ape   1821    1       1
C'ape   1826    1       1
C'ape   1838    2       2
C'ape   1844    1       1
C'ape   1869    1       1
C'ape   1874    1       1
C'ape   1878    2       2
C'ape   1879    1       1
C'ape   1880    1       1
CABMEL  1873    1       1
CABMEL  1874    1       1
CABMEL  1875    1       1
CABMEL  1879    1       1
CABMEL  1884    1       1
CABMEL  1890    1       1
CABMEL  1899    1       1
CABMEL  1901    1       1
CABMEL  1903    3       2
CABMEL  1910    2       2
CABMEL  1912    1       1
CABMEL  1915    1       1
CABMEL  1926    2       2
CABMEL  1927    3       2
CABMEL  1928    4       2
CABMEL  1930    2       2

At least 4 columns, and some rows also contain 5. First column is a 1-gram, a string, I want to extract only those lines which have a string in first column that only contains letters (upper case or lower case alphabets only). I am thinking grep should do it but I cannot find the correct regex to do this job. Any unix utilty that can easily get the job done? Columns are tab delimited I believe.

EDIT: Output will contain only the lines with CABMEL

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:26:41+0000

Using Perl:

# Match all lines that start with a-z or A-Z and are followed by a space
perl -ne 'print if m/^[a-z]+s/i' file

Using awk:

# Match first field's that only contain a-z or A-Z
awk '$1 ~ /^[a-zA-Z]+$/' file

Both will output:

CABMEL  1873    1       1
CABMEL  1874    1       1
CABMEL  1875    1       1
CABMEL  1879    1       1
CABMEL  1884    1       1
CABMEL  1890    1       1
CABMEL  1899    1       1
CABMEL  1901    1       1
CABMEL  1903    3       2
CABMEL  1910    2       2
CABMEL  1912    1       1
CABMEL  1915    1       1
CABMEL  1926    2       2
CABMEL  1927    3       2
CABMEL  1928    4       2
CABMEL  1930    2       2

Categories

regex - Filter lines that have only alphabets in first column

regex - Filter lines that have only alphabets in first column

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags