When using the image dataset input layer (with either lmdb
or leveldb
backend) caffe only supports one integer label per input image.
If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.
In python you can use h5py
package to create hdf5 files.
import h5py, os
import caffe
import numpy as np
SIZE = 224 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
# you may apply other input transformations here...
# Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
# for example
# transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img
y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
L.write( 'train.h5' ) # list all h5 files you are going to use
Once you have all h5
files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt
:
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
Clarification:
When I say "caffe only supports one integer label per input image" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset
tool.
At closer inspection, it seems like caffe stores data of type Datum
in leveldb/lmdb and the "label" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…