Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
694 views
in Technique[技术] by (71.8m points)

svm - One-class Support Vector Machine Sensitivity Drops when the number of training sample increase

I am using One-Class SVM for outlier detections. It appears that as the number of training samples increases, the sensitivity TP/(TP+FN) of One-Class SVM detection result drops, and classification rate and specificity both increase.

What's the best way of explaining this relationship in terms of hyperplane and support vectors?

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The more training examples you have, the less your classifier is able to detect true positive correctly.

It means that the new data does not fit correctly with the model you are training.

Here is a simple example.

Below you have two classes, and we can easily separate them using a linear kernel. The sensitivity of the blue class is 1.

enter image description here

As I add more yellow training data near the decision boundary, the generated hyperplane can't fit the data as well as before.

As a consequence we now see that there is two misclassified blue data point. The sensitivity of the blue class is now 0.92

enter image description here

As the number of training data increase, the support vector generate a somewhat less optimal hyperplane. Maybe because of the extra data a linearly separable data set becomes non linearly separable. In such case trying different kernel, such as RBF kernel can help.

EDIT: Add more informations about the RBF Kernel:

In this video you can see what happen with a RBF kernel. The same logic applies, if the training data is not easily separable in n-dimension you will have worse results.

You should try to select a better C using cross-validation.

In this paper, the figure 3 illustrate that the results can be worse if the C is not properly selected :

More training data could hurt if we did not pick a proper C. We need to cross-validate on the correct C to produce good results


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...