R Workshop Day 3 (pdf)
Data manipulation with dplyr and the tidyverse
Practice
- Enhance the graph we generated for
femalesByState
by adding a reference line at the nationwide female percentage. To compute that, you will need to do some more work using summarize
on the femalesByState
dataset, to compute the total female population as well as the total population, then divide the two.
- Add a text next to the reference line from the previous part, indicating the value.
- Repeat a similar analysis by using the high school graduation percentage instead of the female percentage.
- Compute both black and female population percentages per state, then draw an scatterplot comparing these two variables.
- Also compute the nationwide black percentage and female percentage and add appropriate horizontal and vertical reference lines at those locations (preferably red).
- Add state labels for those states that are below the nationwide average in both.
- Arrange the states dataset in order of female percentage, then use
gf_line
to join all the points together (maybe with some alpha).
- Also include a computation of high school graduation percentages, and color the points (only the points) using that.