Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.5k views
in Technique[技术] by (71.8m points)

r - Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor X has new levels

I did a logistic regression:

 EW <- glm(everwrk~age_p + r_maritl, data = NH11, family = "binomial")

Moreover, I want to predict everwrk for each level of r_maritl.

r_maritl has the following levels:

levels(NH11$r_maritl)
 "0 Under 14 years" 
 "1 Married - spouse in household" 
 "2 Married - spouse not in household"
 "3 Married - spouse in household unknown" 
 "4 Widowed"                               
 "5 Divorced"                             
 "6 Separated"                             
 "7 Never married"                        
 "8 Living with partner"  
 "9 Unknown marital status"  

So I did:

predEW <- with(NH11,
expand.grid(r_maritl = c( "0 Under 14 years", "1 Married - 
spouse in household", "2 Married - spouse not in household", "3 Married - 
spouse in household unknown", "4 Widowed", "5 Divorced", "6 Separated", "7 
Never married", "8 Living with partner", "9 Unknown marital status"),
age_p = mean(age_p,na.rm = TRUE)))

cbind(predEW, predict(EW, type = "response",
                        se.fit = TRUE, interval = "confidence",
                        newdata = predEW))

The Problem is I get the following response:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 0 Under 14 years, Married - spouse in household unknown

Sample Data:

str(NH11$age_p)
num [1:33014] 47 18 79 51 43 41 21 20 33 56 ...

str(NH11$everwrk)
Factor w/ 2 levels "2 No","1 Yes": NA NA 2 NA NA NA NA NA 2 2 ...

str(NH11$r_maritl)
Factor w/ 10 levels "0 Under 14 years",..: 6 8 5 7 2 2 8 8 8 2 ...
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
 
Best answer

tl;dr it looks like you have some levels in your factor that are not represented in your data, that get dropped from the factors used in the model. In hindsight this isn't terribly surprising, since you won't be able to predict responses for these levels. That said, it's mildly surprising that R doesn't do something nice for you like generate NA values automatically. You can solve this problem by using levels(droplevels(NH11$r_maritl)) in constructing your prediction frame, or equivalently EW$xlevels$r_maritl.

A reproducible example:

maritl_levels <- c( "0 Under 14 years", "1 Married - spouse in household", 
  "2 Married - spouse not in household", "3 Married - spouse in household unknown", 
  "4 Widowed", "5 Divorced", "6 Separated", "7 Never married", "8 Living with partner", 
 "9 Unknown marital status")
set.seed(101)
NH11 <- data.frame(everwrk=rbinom(1000,size=1,prob=0.5),
                 age_p=runif(1000,20,50),
                 r_maritl = sample(maritl_levels,size=1000,replace=TRUE))

Let's make a missing level:

NH11 <- subset(NH11,as.numeric(NH11$r_maritl) != 3)

Fit the model:

EW <- glm(everwrk~r_maritl+age_p,data=NH11,family=binomial)
predEW <- with(NH11,
  expand.grid(r_maritl=levels(r_maritl),age_p=mean(age_p,na.rm=TRUE)))
predict(EW,newdata=predEW)

Success!

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 2 Married - spouse not in household

predEW <- with(NH11,
           expand.grid(r_maritl=EW$xlevels$r_maritl,age_p=mean(age_p,na.rm=TRUE)))
predict(EW,newdata=predEW)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...