regex - How to compare and substitute strings in different lines in unix

Question

Welcome To Ask or Share your Answers For Others

regex - How to compare and substitute strings in different lines in unix

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - How to compare and substitute strings in different lines in unix

I want to compare and substitute strings present in different lines in unix

For example I have a file with two words in each line

<a> <b>
<d> <e>
<b> <c>
<c> <e>

If second word of any line matched with first word of any other line then second word of this line should be replaced with second word of matched line and it should iterate until there is no match between second word of the line with first word of another line

I need result like

<a> <e>
<b> <e>
<c> <e>
<d> <e>

I am new to unix and not getting any idea how to implement this. Can any one give suggestions or explain how we can do this

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:57:46+0000

This is VERY clearly a case for a recursive descent solution:

$ cat tst.awk
function descend(node) {return (map[node] in map ? descend(map[node]) : map[node])}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

$ awk -f tst.awk file
<a> <e>
<b> <e>
<c> <e>
<d> <e>

If infinite recursion in your input is a possibility, here;s an approach that will print as the 2nd field the last node before the recursion starts and put a "*" next to it so you know it's happening:

$ cat tst.awk
function descend(node,  child, descendant) {
    stack[node]
    child = map[node]
    if (child in map) {
        if (child in stack) {
            descendant = node "*"
        }
        else {
            descendant = descend(child)
        }
    }
    else {
        descendant = child
    }
    delete stack[node]
    return descendant
}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

.

$ cat file
<w> <w>
<x> <y>
<y> <z>
<z> <x>
<a> <b>
<d> <e>
<b> <c>
<c> <e>

$ awk -f tst.awk file
<w> <w>*
<x> <z>*
<y> <x>*
<z> <y>*
<a> <e>
<b> <e>
<c> <e>
<d> <e>

If you need the output order to match the input order and/or or to print duplicate lines twice, change the bottom 2 lines of the script to:

{ keys[++numKeys] = $1; map[$1] = $2 }
END {
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        print key, descend(key)
    }
}

Categories

regex - How to compare and substitute strings in different lines in unix

regex - How to compare and substitute strings in different lines in unix

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags