How to quickly find the "real" table header of a variable width csv file?
My target is to process a bunch of .csv files by Python and the format is like:
row0: config1, val1
row1: config2, val2
row2: misc, val_a, val_b, val_c,
row3: misc2, val_a, val_b, val_c, val_d
row4: misc3,val_a, val_b
...
rowk: configk, valk
rowk+1: header1, header2, header3, ..., headern
rowk+2: item1, item2, item3, ..., itemn
....
rowk+m: item1, item2, item3, ..., itemn
,,,,
,,,,
footer, row1
footer, row2
In the above table, the content preceding :
(i.e. row1:, row2:, etc.) is my comment to help understanding and mark the rows and they are not part of the csv file.
row0-rowk are of variable length(each row has different number of columns), but from rowk+1, each row has a fixed length, until rowk+m(m rows with fixed length); then after several emtpy rows, there might be 2 or 3 footer rows with variable length.
The target is to quickly locate the header row so that I can load the table as dataframe using pandas. I tried several methods but couldnt find a satisfying one. Any suggestions are appreciated.
question from:
https://stackoverflow.com/questions/65831587/how-to-quickly-find-the-real-table-header-of-a-variable-width-csv-file-by-pyth