I have a list. Mostly obituary related.
Leonard Wilson 1867 - 1936
Mark Jonson 1892 - 1961
Alex Jean Kinshaw 1951 - 1993
Elizabeth Mae Martin 1934 - 1998
The data needs to be analysed for a study and it needs to be arranged in 'csv' format, with a timeline(delimited by "," and using "-" for null values) starting from 1850 upto 2015.
Leonard Wilson,-,-,-,-,-,-,-,-,-,-,-,-,-,-,1867,1868,1869......1934,1935,1936,-,-,-,-,-,-,-,-,-,-,-,-
Mark Jonson,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,1892,1893,1894,1895,1896,1897......,1958,1959,1960,1961,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
....
# All years in the middle needs to be populated please
In the above data you can see that the years before the person was born is marked by '-' and also the years after death(till 2015) with the same. All the years in between, needs to be populated.
The python/pandas code needs to detect the year of start and end, and
- fill the before null values
- in-middle years and
- ending null values
Is there anyway this can be achieved, as I have a data of over 30k rows?