For reasonably modern versions of sed, edit the standard input to yield the standard output with
$ echo 'τ?χνη βιβλ?ο γη κ?πο?' | sed -E -e 's/[[:blank:]]+/
/g'
τ?χνη
βιβλ?ο
γη
κ?πο?
If your vocabulary words are in files named lesson1
and lesson2
, redirect sed’s standard output to the file all-vocab
with
sed -E -e 's/[[:blank:]]+/
/g' lesson1 lesson2 > all-vocab
What it means:
- The character class
[[:blank:]]
matches either a single space character or
a single tab character.
- Use
[[:space:]]
instead to match any single whitespace character (commonly space, tab, newline, carriage return, form-feed, and vertical tab).
- The
+
quantifier means match one or more of the previous pattern.
- So
[[:blank:]]+
is a sequence of one or more characters that are all space or tab.
- The
in the replacement is the newline that you want.
- The
/g
modifier on the end means perform the substitution as many times as possible rather than just once.
- The
-E
option tells sed to use POSIX extended regex syntax and in particular for this case the +
quantifier. Without -E
, your sed command becomes sed -e 's/[[:blank:]]+/
/g'
. (Note the use of +
rather than simple +
.)
Perl Compatible Regexes
For those familiar with Perl-compatible regexes and a PCRE-capable sed, use s+
to match runs of at least one whitespace character, as in
sed -E -e 's/s+/
/g' old > new
or
sed -e 's/s+/
/g' old > new
These commands read input from the file old
and write the result to a file named new
in the current directory.
Maximum portability, maximum cruftiness
Going back to almost any version of sed since Version 7 Unix, the command invocation is a bit more baroque.
$ echo 'τ?χνη βιβλ?ο γη κ?πο?' | sed -e 's/[ ][ ]*/
/g'
τ?χνη
βιβλ?ο
γη
κ?πο?
Notes:
- Here we do not even assume the existence of the humble
+
quantifier and simulate it with a single space-or-tab ([ ]
) followed by zero or more of them ([ ]*
).
- Similarly, assuming sed does not understand
for newline, we have to include it on the command line verbatim.
- The
and the end of the first line of the command is a continuation marker that escapes the immediately following newline, and the remainder of the command is on the next line.
- Note: There must be no whitespace preceding the escaped newline. That is, the end of the first line must be exactly backslash followed by end-of-line.
- This error prone process helps one appreciate why the world moved to visible characters, and you will want to exercise some care in trying out the command with copy-and-paste.
Note on backslashes and quoting
The commands above all used single quotes (''
) rather than double quotes (""
). Consider:
$ echo '\\' ""
\\ \
That is, the shell applies different escaping rules to single-quoted strings as compared with double-quoted strings. You typically want to protect all the backslashes common in regexes with single quotes.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…