Say you have a column which contains categorical variables / factors. Then How can we quickly and confidently identify the differences between different categories?
A very commonly used dataset ‘mpg’ from the package ‘ggplot2’ contains some categorical variables that could easily get started. There is a column in mpg called ‘cty’ which is the miles per gallon for a car while driving in city. And also another column ‘manufacturer’ which contain categorical variable the different manufacturers.
library(ggplot2)
data(mpg)
bymedian = with(mpg, reorder(manufacturer, -cty, median))
boxplot(cty ~ bymedian, data = mpg, varwidth = TRUE)
result = aov(formula = mpg$cty ~ as.factor(mpg$manufacturer))
TukeyHSD(result)