Hoping to get a hand with a weird CNN training issue.
I am training a Resnet classifier to predict 4 classes of image from a ~10k image dataset. The code is pretty simple. Here's the Resnet/CNN setup part:
####################################
########### LOAD RESNET ############
####################################
device = torch.device("cuda" if torch.cuda.is_available()
else "cpu")
model = models.resnet50(pretrained=True)
#
for param in model.parameters():
param.requires_grad = False
#
model.fc = nn.Sequential(nn.Linear(2048, 512),
nn.ReLU(),
#nn.Dropout(0.2),
nn.Linear(512, 10),
nn.LogSoftmax(dim=1))
#
criterion = nn.NLLLoss()
#
optimizer = optim.Adam(model.fc.parameters(), lr=0.003)
# move model to gpu
model.to(device)
And here is the training stage (it batches the data in 500 images and shuffles the test datasets) and some accuracy results after some epochs:
trainloader, testloader, n_batches = make_trainloader(all_data,
vals,
batch_size=500,
randomize=True)
...
for inputs, labels in trainloader:
...
inputs, labels = inputs.to(device), labels.to(device)
...
# PREDICT;
outputs = model(inputs)
...
epoch #: 12
Loss: 0.1689 Acc: 0.9400
labels: tensor([0, 0, 1, 0, 3, 0, 0, 2, 1, 2], device='cuda:0')
predictions: tensor([0, 0, 1, 0, 3, 0, 0, 2, 1, 2], device='cuda:0')
So the weird thing is that I can't seem to predict well on single images but only on large batches of data with mixed classes. For example, if I provide 500 images from class 1, the prediction is random, but if I provide 500 images mixed from the 4 classes (much like during training), the prediction is great (just like during training).
It seems that I'm confused about how to use the ResNet classifier on single images even though it does seem to learn to predict the individual labels of the input data (see labels and prediction output above). Or that my classifier isn't learning single images, but groups of images, not sure.
Any help or direction is appreciated (I can provide more code, but didn't want to make too long of a message). Here's the prediction code:
# Predict
randomize = False
# load data from above
inputs = test_data[:2000]
vals_inputs = test_vals[:2000]
print ("test data size: ", vals_inputs.shape)
trainloader, testloader, n_batches = make_trainloader(inputs,
vals_inputs,
batch_size=500,
randomize=randomize)
for inputs, labels in trainloader:
# load to device
inputs, labels = inputs.to(device), labels.to(device)
# PREDICT;
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
print ("prediction: ", preds[:10])
print ("labels: ", labels[:10])
...
test data size: torch.Size([2000])
prediction: tensor([1, 1, 2, 1, 2, 3, 2, 3, 2, 3], device='cuda:0')
labels: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
0 Loss: 3.2936 Acc: 0.1420
prediction: tensor([1, 3, 3, 3, 3, 1, 2, 1, 1, 2], device='cuda:0')
labels: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='cuda:0')
0 Loss: 2.1462 Acc: 0.2780
prediction: tensor([3, 3, 1, 2, 0, 1, 2, 1, 3, 2], device='cuda:0')
labels: tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2], device='cuda:0')
0 Loss: 2.1975 Acc: 0.2560
Versus when I simply shuffle the data the accuracy is very high:
# Predict
randomize = True
...
test data size: torch.Size([2000])
prediction: tensor([0, 0, 3, 2, 0, 2, 0, 3, 0, 2], device='cuda:0')
labels: tensor([0, 0, 3, 2, 0, 2, 0, 3, 0, 2], device='cuda:0')
0 Loss: 0.1500 Acc: 0.9580
prediction: tensor([0, 3, 3, 3, 0, 0, 3, 2, 3, 3], device='cuda:0')
labels: tensor([0, 2, 3, 0, 0, 0, 3, 2, 0, 3], device='cuda:0')
0 Loss: 0.1714 Acc: 0.9340
prediction: tensor([3, 3, 2, 2, 3, 1, 3, 0, 2, 2], device='cuda:0')
labels: tensor([3, 3, 2, 2, 3, 1, 3, 0, 2, 2], device='cuda:0')
0 Loss: 0.1655 Acc: 0.9400