a file I am working with looks like this
NAMES n0 n1 n2 n3 n4 n5 n6 n7
REGION chr 1 100000
404 AAAAAAGA
992 TTTTTTTA
1146 CCCCGGCC
1727 CCCCCACC
1778 GCCCCCCC
would need to split the file based on the number in the column - create a new file for every 1000 units so the output would e be
file1
NAMES n0 n1 n2 n3 n4 n5 n6 n7
REGION chr 404 992
404 AAAAAAGA
992 TTTTTTTA
file2
NAMES n0 n1 n2 n3 n4 n5 n6 n7
REGION chr 1146 1778
1146 CCCCGGCC
1727 CCCCCACC
1778 GCCCCCCC
so split the first colum every 1000 units (first is from 1 to 1000) file 2 is from 1000 to 2000 also the start an end positions would be changed in every file (line starting with REG) as the first number is the number in the first line of the file adn the other number is the number in the last line of hte file. The header needs to be present in all files. Is there a way to name the files from that systematically with file1, file2....? /t is used throughout all files to make space...
i tried
awk '
NR==1 {
h = $0
k = 1000
f = "file"k/1000
print > f
getline
print "REGION chr",k-999,k > f
next
}
$1 <=k {
print > f
next
}
{
k=1000*int(1+$1/1000)
f="file"k/1000
print h > f
print "REGION chr",k-999,k > f
print > f
}' file
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…