I have the following data in a csv file:
from StringIO import StringIO
import pandas as pd
the_data = """
ABC,2016-6-9 0:00,95,{'//PurpleCar': [115L], '//YellowCar': [403L], '//BlueCar': [16L], '//WhiteCar-XYZ': [0L]}
ABC,2016-6-10 0:00,0,{'//PurpleCar': [219L], '//YellowCar': [381L], '//BlueCar': [90L], '//WhiteCar-XYZ': [0L]}
ABC,2016-6-11 0:00,0,{'//PurpleCar': [817L], '//YellowCar': [21L], '//BlueCar': [31L], '//WhiteCar-XYZ': [0L]}
ABC,2016-6-12 0:00,0,{'//PurpleCar': [80L], '//YellowCar': [2011L], '//BlueCar': [8888L], '//WhiteCar-XYZ': [0L]}
ABC,2016-6-13 0:00,0,{'//PurpleCar': [32L], '//YellowCar': [15L], '//BlueCar': [4L], '//WhiteCar-XYZ': [0L]}
DEF,2016-6-16 0:00,0,{'//PurpleCar': [32L], '//BlackCar': [15L], '//PinkCar': [4L], '//NPO-GreenCar': [0L]}
DEF,2016-6-17 0:00,0,{'//PurpleCar': [32L], '//BlackCar': [15L], '//PinkCar': [4L], '//NPO-GreenCar': [0L]}
DEF,2016-6-18 0:00,0,{'//PurpleCar': [32L], '//BlackCar': [15L], '//PinkCar': [4L], '//NPO-GreenCar': [0L]}
DEF,2016-6-19 0:00,0,{'//PurpleCar': [32L], '//BlackCar': [15L], '//PinkCar': [4L], '//NPO-GreenCar': [0L]}
DEF,2016-6-20 0:00,0,{'//PurpleCar': [32L], '//BlackCar': [15L], '//PinkCar': [4L], '//NPO-GreenCar': [0L]}
"""
I read the file into a Pandas data frame, as follows:
df = pd.read_csv(StringIO(the_data), sep=',')
Then, I add a few column headers, as follows:
df.columns = ['Company',
'Date',
'Volume',
'Car1',
'Car2',
'Car3',
'Car4']
I see that the data is coming through as follows:
ABC,2016-6-9 0:00,95,{'//PurpleCar': [115L], '//YellowCar': [403L], '//BlueCar': [16L], '//WhiteCar-XYZ': [0L]
But, I'd like to see the data without any of the following:
a) the curly braces ("{"
) at the beginning and the curly brace ("}"
) at the end of the dictionary
b) the "L" after the numerical values
c) the square brackets ("["
and "]"
) surrounding the numerical values
d) the apostrophes surrounding the keys
Ideally, the data would be transformed as follows:
ABC,2016-6-9 0:00,95,//PurpleCar: 115, //YellowCar: 403, //BlueCar: 16, //WhiteCar-XYZ: 0
I tried this:
df['Car1'] = df['Car1'].str.strip(['{', '}', '[', 'L]'])
But, it doesn't work. It results in the 'Car1' column becoming NaN values.
Is it possible to transform the data frame such that each row of the data frame reads as follows?
ABC,2016-6-9 0:00,95,//PurpleCar: 115, //YellowCar: 403, //BlueCar: 16, //WhiteCar-XYZ: 0
Thanks!
UPDATE:
Using the following regular expression:
df['Car1'] = df['Car1'].str.replace(r'D+', '').astype('int')
Results in this:
ABC,2016-6-9 0:00,95, 115 , //YellowCar: 403, //BlueCar: 16, //WhiteCar-XYZ: 0
We lose '//PurpleCar' and are left with only the numeric value of 115. That's a good start, but it would be great if we can see the '//PurpleCar' key, too.
Any ideas?
UPDATE 2:
Based on the comments by piRSquared and HYRY, my goal is to be able to plot the numerical results. So, I would like to have the data frame look as follows:
Company Date PurpleCar YellowCar BlueCar WhiteCar
0 ABC 2016-6-9 0:00 115 403 16 0
1 ABC 2016-6-10 0:00 219 381 90 0
2 ABC 2016-6-11 0:00 817 21 31 0
3 ABC 2016-6-12 0:00 80 2011 8888 0
4 ABC 2016-6-13 0:00 32 15 4 0
5 DEF 2016-6-16 0:00 32 15 4 0
6 DEF 2016-6-17 0:00 32 15 4 0
7 DEF 2016-6-18 0:00 32 15 4 0
8 DEF 2016-6-19 0:00 32 15 4 0
9 DEF 2016-6-20 0:00 32 15 4 0
* UPDATE 3: *
The data originally posted had a small mistake. Here is the data:
the_data = """
ABC,2016-6-9 0:00,95,"{'//Purple': [115L], '//Yellow': [403L], '//Blue': [16L], '//White-XYZ': [0L]}"
ABC,2016-6-10 0:00,0,"{'//Purple': [219L], '//Yellow': [381L], '//Blue': [90L], '//White-XYZ': [0L]}"
ABC,2016-6-11 0:00,0,"{'//Purple': [817L], '//Yellow': [21L], '//Blue': [31L], '//White-XYZ': [0L]}"
ABC,2016-6-12 0:00,0,"{'//Purple': [80L], '//Yellow': [2011L], '//Blue': [8888L], '//White-XYZ': [0L]}"
ABC,2016-6-13 0:00,0,"{'//Purple': [32L], '//Yellow': [15L], '//Blue': [4L], '//White-XYZ': [0L]}"
DEF,2016-6-16 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [3L]}"
DEF,2016-6-17 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [0L]}"
DEF,2016-6-18 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [7L]}"
DEF,2016-6-19 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [14L]}"
DEF,2016-6-20 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [21L]}"
"""
The difference between this data and the original data is the apostrophes (")
before the opening curly brace ("{"
) and after the closing curly brace ("}"
).
See Question&Answers more detail:
os