I have a fasta file where the sequences are broken up with newlines. I'd like to remove the newlines. Here's an example of my file:
>accession1
ATGGCCCATG
GGATCCTAGC
>accession2
GATATCCATG
AAACGGCTTA
I'd like to convert it into this:
>accession1
ATGGCCCATGGGATCCTAGC
>accession2
GATATCCATGAAACGGCTTA
I found a potential solution on this site, which looks like this:
cat input.fasta | awk '{if (substr($0,1,1)==">"){if (p){print "
";} print $0} else printf("%s",$0);p++;}END{print "
"}' > joinedlineoutput.fasta
However, this places an extra line break between each entry, so file looks like this:
>accession1
ATGGCCCATGGGATCCTAGC
>accession2
GATATCCATGAAACGGCTTA
I'm an awk noob, but I took a shot at modifying the command. My guess was the if (p){print "
";}
was the culprit...potentially print "
"
is adding two line breaks. I couldn't figure out how to add just one newline...this is probably something easy, but like I said, I'm a noob. Here was my (unsuccessful) solution:
awk '{if (substr($0,1,1)==">"){print "
"$0} else printf("%s",$0);p++;}END{print "
"}' input.fasta > joinedoutput.fasta
However, this adds an empty line at the beginning of the file because it's always printing a new line before it prints the first accession number:
{empty line}
>accession1
ATGGCCCATGGGATCCTAGC
>accession2
GATATCCATGAAACGGCTTA
Anyone have a solution to get my file in the correct format? Thanks!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…