Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
386 views
in Technique[技术] by (71.8m points)

python - parsing from colum in perl or bash

a file I am working with looks like this

NAMES   n0  n1  n2  n3  n4  n5  n6  n7
REGION  chr 1   100000
404 AAAAAAGA
992 TTTTTTTA
1146    CCCCGGCC
1727    CCCCCACC
1778    GCCCCCCC

would need to split the file based on the number in the column - create a new file for every 1000 units so the output would e be

file1
 NAMES  n0  n1  n2  n3  n4  n5  n6  n7
    REGION  chr 404 992
    404 AAAAAAGA
    992 TTTTTTTA

file2
 NAMES  n0  n1  n2  n3  n4  n5  n6  n7
     REGION chr 1146    1778
1146 CCCCGGCC
1727 CCCCCACC
1778 GCCCCCCC

so split the first colum every 1000 units (first is from 1 to 1000) file 2 is from 1000 to 2000 also the start an end positions would be changed in every file (line starting with REG) as the first number is the number in the first line of the file adn the other number is the number in the last line of hte file. The header needs to be present in all files. Is there a way to name the files from that systematically with file1, file2....? /t is used throughout all files to make space...

i tried

awk '
NR==1 {
   h = $0
   k = 1000
   f = "file"k/1000
   print > f
   getline
   print "REGION chr",k-999,k > f
   next
} 
$1 <=k {
   print > f
   next
} 
{
   k=1000*int(1+$1/1000)
   f="file"k/1000
   print h > f
   print "REGION chr",k-999,k > f
   print > f
}' file
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use this awk command:

awk 'function print_vals() {
   fn="file" c;
   print hdr > fn;
   print "REGION  chr", sn, en >> fn;
   for (i in a)
      print a[i] >> fn;
} NR == 1 {
   hdr=$0;
   c=0;
   next
} NF==2 && $1 >= 1000*c {
   if (c)
      print_vals();
   delete a;
   i=0;
   c++;
   sn=$1;
} NF==2 {
   a[++i]=$0;
   en=$1;
} END {
   print print_vals();
 }' file

Verification:

cat file1
NAMES   n0  n1  n2  n3  n4  n5  n6  n7
REGION  chr 404 992
404 AAAAAAGA
992 TTTTTTTA

cat file2
NAMES   n0  n1  n2  n3  n4  n5  n6  n7
REGION  chr 1146 1778
1146    CCCCGGCC
1727    CCCCCACC
1778    GCCCCCCC

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...