Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
759 views
in Technique[技术] by (71.8m points)

python - Read string representation of 2D array from CSV column into a 2D numpy array

I have a pandas dataframe, for which one of the columns holds 2D numpy arrays corresponding to pixel data from grayscale images. These 2D numpy arrays have the shape (480, 640) or (490, 640). The dataframe has other columns containing other information. I then generate a csv file out of it through pandas' to_csv() function. Now my issue is: my 2D numpy arrays all appear as strings in my CSV, so how can I read them back and convert them into 2D numpy arrays again?

I know there are similar questions on StackOverflow, but I couldn't find any that really focuses on 2D numpy arrays. They seem to be mostly about 1D numpy arrays, and the solutions provided don't seem to work.

Any help is greatly appreciated.

UPDATE:

As requested, I am adding some code below to clarify what my problem is.

# Function to switch images to grayscale format
grayscale(img):
  cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Iterating through my dataframe (called data), reading all image files, making them grayscale and then adding them to my collection.
grayscale_images = []
for index, row in data.iterrows():
  img_path = row['Image path']
  cv_image = cv2.imread(img_path)
  gray = grayscale(cv_image)
  grayscale_images.append(gray)

# Make numpy array elements show without truncation
np.set_printoptions(threshold=sys.maxsize)

# Adding a new column to the dataframe containing each image's numpy array corresponding to pixels
data['Image data'] = grayscale_images

So when I'm done doing that and other operations on other columns, I export my dataframe to CSV like this:

data.to_csv('new_dataset.csv', index=False)

In a different Jupyter notebook, I try to read my CSV file and then extract my image's numpy arrays to feed them to a convolutional neural network as input, as part of supervised training.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
import re

data = pd.read_csv('new_dataset.csv')
# data.head() -- It looks fine here

# Config to make numpy arrays display in their entirety without truncation
np.set_printoptions(threshold=sys.maxsize)

# Checking if I can extract a 2D numpy array for conversion from a cell.
# That's where I notice it's a string, and I'm having trouble turning it back to a 2D numpy array
image_arr = data.iloc[0,0]

But, I'm stuck converting back my string-type representation from my CSV file into a 2D numpy array, especially one with the shape (490, 640) as it was before I exported the dataframe to CSV.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Construct a csv with array strings:

In [385]: arr = np.empty(1, object)                                             
In [386]: arr[0]=np.arange(12).reshape(3,4)                                     
In [387]: S = pd.Series(arr,name='x')                                           
In [388]: S                                                                     
Out[388]: 
0    [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Name: x, dtype: object
In [389]: S.to_csv('series.csv')                                                
/usr/local/bin/ipython3:1: FutureWarning: The signature of `Series.to_csv` was aligned to that of `DataFrame.to_csv`, and argument 'header' will change its default value from False to True: please pass an explicit value to suppress this warning.
  #!/usr/bin/python3
In [390]: cat series.csv                                                        
0,"[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]"

load it:

In [391]: df = pd.read_csv('series.csv',header=None)                            
In [392]: df                                                                    
Out[392]: 
   0                                                1
0  0  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

In [394]: astr=df[1][0]                                                         
In [395]: astr                                                                  
Out[395]: '[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]'

parse the string representation of the array:

In [396]: astr.split('
')                                                      
Out[396]: ['[[ 0  1  2  3]', ' [ 4  5  6  7]', ' [ 8  9 10 11]]']

In [398]: astr.replace('[','').replace(']','').split('
')                      
Out[398]: [' 0  1  2  3', '  4  5  6  7', '  8  9 10 11']
In [399]: [i.split() for i in _]                                                
Out[399]: [['0', '1', '2', '3'], ['4', '5', '6', '7'], ['8', '9', '10', '11']]
In [400]: np.array(_, int)                                                      
Out[400]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

No guarantee that that's the prettiest cleanest parsing, but it gives an idea of the work you have to do. I'm reinventing the wheel, but searching for a duplicate was taking too long.

If possible try to avoid saving such a dataframe as csv. csv format is meant for a clean 2d table, simple consistent columns separated by a delimiter.

And for the most part avoid dataframes/series like this. A Series can have object dtype. And each object element can be complex, such as a list, dictionary, or array. But I don't think pandas has special functions to handle those cases.

numpy also has object dtypes (as my arr), but a list is often just as good, if not better. Constructing such an array can be tricky. Math on such an array is hit or miss. Iteration on an object array is slower than iteration on a list.

===

re might work as well. For example replacing whitespace with comma:

In [408]: re.sub('s+',',',astr)                                                
Out[408]: '[[,0,1,2,3],[,4,5,6,7],[,8,9,10,11]]'

Still not quite right. There are leading commas that will choke eval.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...