forked from anjiecao/tidyverse-tutorial
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathtidyverse_examples_solution.Rmd
More file actions
118 lines (83 loc) · 2.91 KB
/
tidyverse_examples_solution.Rmd
File metadata and controls
118 lines (83 loc) · 2.91 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "Tidyverse Examples"
author: "Psych 251 Staff"
date: "10/2/2019"
output: html_document
---
```{r setup, include=FALSE}
library(tidyverse)
```
# Manipulating data with dplyr
Let's use `mtcars`, a built in dataset of cars and their miles/gallon (mpg), number of cylinders (cyl), displacement (disp), gross horsepower (hp), etc.
```{r}
mtcars
```
**Exercise**: First, summarise the average miles/gallon (mpg) across the entire dataset.
```{r}
mtcars %>%
summarise(mean = mean(mpg))
```
**Exercise**: A car can either have 4, 6, or 8 cylinders (cyl). Summarise the average mpg, broken down by the number of cylinders. Hint: You may want to "group" by cyl in order to do this.
```{r}
mtcars %>%
group_by(cyl) %>%
summarise(mean = mean(mpg))
```
**Exercise**: In addition to the means, add standard deviations to this summary (still grouped by cyl).
```{r}
mtcars %>%
group_by(cyl) %>%
summarise(mean = mean(mpg),
sd = sd(mpg))
```
**BONUS**: Let's visualize! Use ggplot (included in the tidyverse package) to make a scatter plot of mpg by horsepower. If you are feeling extra fancy, you can add a smoothing line. (Hint: Google "geom_smooth() scatterplot".)
```{r}
ggplot(mtcars,
aes(x = hp, y = mpg)) +
geom_point() +
geom_smooth()
```
# Reshaping with tidyr
## From long to wide and back again
We will first use a built-in table in package `tidyr`: table3. We can use `help(table3)` to find its information.
```{r}
table3
help(table3)
```
`table3` is in tidy format. Make this into wide data.
```{r}
table3_wide <- table3 %>%
spread(year, rate)
```
Now make it back into tidy data.
```{r}
table3_long <- table3_wide %>%
gather(year, rate, `1999`:`2000`)
```
Here are examples of more recently published functions for wide to long or long to wide. These two functions have more straightforward names and argument names, which makes them easier to use.
```{r}
table3_wide <- table3 %>%
pivot_wider(names_from = year, values_from = rate)
table3_long <- table3_wide %>%
pivot_longer(cols = `1999`:`2000`, names_to = "year", values_to = "rate")
```
## From wide to long without seeing the tidy version
These are pre-post data on children's arithmetic scores from a RCT (Randomized Controlled Trial) in which they were assigned either to CNTL (control) or MA (mental abacus math intervention). They were tested twice, once in 2015 and once in 2016. The paper can be found at https://jnc.psychopen.eu/article/view/106.
```{r}
majic <- read_csv("data/majic.csv")
```
Make these tidy.
```{r}
majic_long <- majic %>%
gather(year, score, `2015`, `2016`)
#The new way, using pivot_longer!
majic_long <- majic %>%
pivot_longer(cols = c(`2015`, `2016`), names_to = "year", values_to = "score")
```
**OPTIONAL**: make these back to wide format.
```{r}
majic_wide <- majic_long %>%
spread(year, score)
majic_wide <- majic_long %>%
pivot_wider(names_from = year, values_from = score)
```