R – aov Plus TukeyHSD to quickly identify difference between categories

Say you have a column which contains categorical variables / factors. Then How can we quickly and confidently identify the differences between different categories?

A very commonly used dataset ‘mpg’ from the package ‘ggplot2’ contains some categorical variables that could easily get started. There is a column in mpg called ‘cty’ which is the miles per gallon for a car while driving in city. And also another column ‘manufacturer’ which contain categorical variable the different manufacturers.

library(ggplot2)
data(mpg)
bymedian = with(mpg, reorder(manufacturer, -cty, median))
boxplot(cty ~ bymedian, data = mpg, varwidth = TRUE)

result = aov(formula = mpg$cty ~ as.factor(mpg$manufacturer))
TukeyHSD(result)

aovmpgmanufacturer

mpgboxplotmanufacturer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s