I wrote a function to draw forest plots of CIs from regression results.
I feed to the function a data.frame with predictor label ($label), estimates ($coef), low and high CIs ($ci.low, $ci.high), style ($style):
structure(list(label = structure(c(9L, 4L, 8L, 2L, 6L, 10L, 3L,
7L, 1L, 5L), .Label = c(" - frattura esposta", " - frattura esposta 2",
" - lembo di perone vs lembo corticoperiostale", " - lembo di perone vs lembo corticoperiostale 2",
" - sesso maschile vs femminile", " - sesso maschile vs femminile 2",
" - trauma bassa energia", " - trauma bassa energia 2",
"Tempo di guarigione 2:", "Tempo di guarigione:"), class = "factor"),
coef = c(NA, 0.812, 0.695, 1.4, 0.682, NA, 0.812, 0.695,
1.4, 0.682), ci.low = c(NA, 0.405, 0.31, 1.26, 0.0855, NA,
0.405, 0.31, 1.26, 0.0855), ci.high = c(NA, 1.82, 0.912,
2.94, 1.01, NA, 1.82, 0.912, 2.94, 1.01), style = structure(c(1L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("bold", "plain"
), class = "factor")), .Names = c("label", "coef", "ci.low",
"ci.high", "style"), class = "data.frame", row.names = c(NA,
-10L))
I wanted to display CIs around the estimates and if possible to group the predictors. For the first aim I flipped the axis and used error bars; for the latter I created rows in the data frame that has labels but not values. And it worked out:
First problem:
As you can see the grouping label is bold and doesn't have any data associated.
The style (normal or bold) is defined in the style column (I plan to automatize this). The problem is that this works only if all the labels are different (notice that I added "2" to every label in the first graph to make them different); rows with repeated labels are simply shown as empty space:
I removed the 2 from "trauma bassa energia" label and it disappeared. (also the style is messed).
I want to find a solution for grouping, even quite different from my implementation but without the problem of same names.
Second problem:
As you can see in both images, the lower CI bar crosses the zero, which being Odds Ratios (and given the numbers in the dataframe I used) it's impossible.
Here's my code:
forest.plot <- function(d, xlab = "Coefficients", ylab = "", exp = T, bars = T, lims = NULL){
require(ggplot2)
boundary <- 0
text.pos <- -1.5
if(is.null(lims)) lims <- c(min(d$ci.low, na.rm = T), max(d$ci.high, na.rm = T))
p <- ggplot(d, aes(x=label, y=coef), environment = environment()) +
coord_flip()
if (exp == T){
p <- p + scale_y_log10(labels = round)
boundary <- 1
if(xlab == 'Coefficients') xlab <- 'Odds Ratios'
}
p <- p + geom_hline(yintercept = boundary, lty=2, col = 'darkgray', lwd = 1)
if (bars == T) {
text.pos <- -2
p <- p +
geom_bar(aes(fill = coef > boundary), stat = "identity", width = .3) +
geom_errorbar(aes(ymin = ci.low, ymax = ci.high, lwd = .5), colour = "dodgerblue4", width = 0.05)
}
else p <- p + geom_errorbar(aes(colour = coef > boundary, ymin = ci.low, ymax = ci.high, width = .05, lwd = .5))
if (!is.null(d$style)) style <- d[['style']] else style <- rep('plain', nrow(d))
p <- p + geom_point(colour = 'dodgerblue4', aes(size = 2)) +
scale_x_discrete(limits=rev(d$label)) +
geom_text(aes(label = coef, vjust = text.pos)) +
theme_bw() +
theme(axis.text.x = element_text(color = 'gray30', size = 16),
axis.text.y = element_text(face = rev(style), color = 'gray30', size = 14, hjust=0, angle=0),
axis.title.x = element_text(size = 20, color = 'gray30', vjust = 0),
axis.ticks = element_blank(),
legend.position="none",
panel.border = element_blank()) +
geom_vline(xintercept = 0, lwd = 2) +
ylab(xlab) +
xlab(ylab)
return(p)
}
See Question&Answers more detail:
os