I am working with google english 1gram dataset link here, it looks like the following:
C'ape 1804 1 1
C'ape 1821 1 1
C'ape 1826 1 1
C'ape 1838 2 2
C'ape 1844 1 1
C'ape 1869 1 1
C'ape 1874 1 1
C'ape 1878 2 2
C'ape 1879 1 1
C'ape 1880 1 1
CABMEL 1873 1 1
CABMEL 1874 1 1
CABMEL 1875 1 1
CABMEL 1879 1 1
CABMEL 1884 1 1
CABMEL 1890 1 1
CABMEL 1899 1 1
CABMEL 1901 1 1
CABMEL 1903 3 2
CABMEL 1910 2 2
CABMEL 1912 1 1
CABMEL 1915 1 1
CABMEL 1926 2 2
CABMEL 1927 3 2
CABMEL 1928 4 2
CABMEL 1930 2 2
At least 4 columns, and some rows also contain 5. First column is a 1-gram, a string, I want to extract only those lines which have a string in first column that only contains letters (upper case or lower case alphabets only). I am thinking grep should do it but I cannot find the correct regex to do this job. Any unix utilty that can easily get the job done?
Columns are tab delimited I believe.
EDIT: Output will contain only the lines with CABMEL
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…