Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
320 views
in Technique[技术] by (71.8m points)

r - What is the difference between geoms and stats in ggplot2?

Both geoms and stats can be used to make plots in the R package ggplot2, and they often give similar results (e.g., geom_area and stat_bin). They also often have slightly different arguments, e.g. in 2-D density plots:

geom_density_2d(mapping = NULL, data = NULL, stat = "density2d",
  position = "identity", ..., lineend = "butt", linejoin = "round",
  linemitre = 1, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)

stat_density_2d(mapping = NULL, data = NULL, geom = "density_2d",
  position = "identity", ..., contour = TRUE, n = 100, h = NULL, na.rm =
  FALSE, show.legend = NA, inherit.aes = TRUE)

Are there any fundamental differences between the two types of objects?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is only meant to supplement the accepted answer.

According to Hadley Wickkam, the author of ggplot2, in his book 'ggplot2: Elegant Graphics for Data Analysis' (link here) on p. 91, of section 5.2 'Building a plot layer by layer' :

You only need to set one of stat and geom: every geom has a default stat, and every stat has a default geom.

The accepted answer above explains well why the two are different. This is meant to explain why they are difficult to distinguish in practice -- whenever you use a geom layer, you are also implicitly using a stat layer (even if it is just the identity transformation); likewise, whenever you use a stat layer, you are also implicitly using a geom layer.

If you are fine with the defaults used by either layer, then it would be redundant to state explicitly both layers. Even if you are not fine with the defaults provided by either layer, you can modify the defaults as parameters to each layer (i.e. you can modify the default geom as a parameter to pass to any stat_* function, and you can modify the default stat as a parameter to pass to any geom_* function). In the words of Hadley Wickham (same source as above):

You can pass params in ... (in which case stat and geom parameters are automatically teased apart)

This is kind of difficult to understand conceptually, which is why I have had this question as well. In his paper about the philosophy underlying ggplot2, found here, in Section 4, a 'Hierarchy of Defaults', Hadley Wickham explains the practical considerations behind this default behavior in terms of simplifying code which would otherwise unnecessarily long.

For example, without default specifications, and using the grammar of graphics alone, the code for a simple scatter plot might look like:

ggplot() +
layer(
data = diamonds, mapping = aes(x = carat, y = price),
geom = "point", stat = "identity", position = "identity"
) +
scale_y_continuous() +
scale_x_continuous() +
coord_cartesian()

Using defaults for the scales and coordinates, we can write something instead like:

ggplot(data = Diamonds, aes(x = carat, y = price)) + 
layer(
geom = "point", stat = "identity", position = "identity"
)

But this is still annoyingly long of course, since the values of stat and position are just "identity", which basically means 'do nothing' -- so why have to say that explicitly?

However, the layer() function does not have default values for stat or position -- they need to be specified explicitly in a call to the layer() function.

To get around this, Hadley made the geom_* functions as well as the stat_* functions as wrappers to the layer() function which have default values for both the geom and stat parameter. The difference between the stat_* and geom_* functions is which parameter has an immutable (unchangeable) default value, stat or geom.

Source: http://ggplot2.tidyverse.org/reference/layer.html

So for the geom_* functions you can change the default value of the stat parameter but not the default value of the geom parameter, while for the stat_* functions you can change the default value of the geom parameter but not the default value of the stat parameter.

A layer is a combination of data, stat and geom with a potential position adjustment. Usually layers are created using geom_* or stat_* calls but it can also be created directly using this function [the layer() function].


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...