Automate All the Things

Erin M. Buchanan

2023-04-15

Outline

What is Markdown?

Flavors? Is this Ice Cream?

Why should I use it?

Why should I use it?

Why should I use programming markdown?

How it works

What do I need?

install.packages("tinytex")
tinytex::install_tinytex()

Other Packages

install.packages(c("rmarkdown", "knitr", "flextable", "dplyr", 
                   "rio", "ggplot2", "ggthemes", "treemapify"))

Let’s Get Started!

Let’s Get Started!

Rmd Document Parts - YAML

---
title: "Untitled" 
author: "Erin M. Buchanan"
date: "2023-04-15"
output: html_document
---

Rmd Document Parts - YAML

Knitting

HTML - YAML

output:
  html_document:
    toc: true
    toc_depth: 2

HTML - YAML

output:
  html_document:
    toc: true
    toc_float: true
output:
  html_document:
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false

HTML - YAML

output:
  html_document:
    number_sections: true

HTML - YAML

output:
  html_document:
    theme: united
    highlight: tango

PDF - YAML

Word - YAML

output:
  word_document:
    reference_docx: my-styles.docx

Rmd Document Parts - Narrative

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Rmd Document Parts - Narrative

Narrative - Headers/Blocks

# First-level header

## Second-level header

### Third-level header

Narrative - Text Styles

Narrative - Lists

- one item
- one item
- one item
    - one more item
    - one more item
    - one more item

Narrative - Citations

@Manual{R-base,
  title = {R: A Language and Environment for Statistical
    Computing},
  author = {{R Core Team}},
  organization = {R Foundation for Statistical Computing},
  address = {Vienna, Austria},
  year = {2017},
  url = {https://www.R-project.org/},
}

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Citations

---
output: html_document
bibliography: references.bib
csl: biomed-central.csl
---

Narrative - Tabbed Sections

## Quarterly Results {.tabset}

### By Product

(tab content)

### By Region

(tab content)
## Quarterly Results {.tabset .tabset-fade .tabset-pills}

Rmd Document Parts - Code Chunks

Code Chunks

Code Chunks

Code Chunks - Options

Code Chunks - Options

Code Chunks - Options

Code Chunks - Options

Figures

Tables

# load some data
data("mtcars")
# load some libraries 
library(flextable)
library(knitr)
library(dplyr)

Tables - Kable

# make a kable 
kable(mtcars[1:6, ], # data 
      caption = "mtcars Strikes Again") # table caption 
mtcars Strikes Again
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Tables - Flextable

# make a ft
flextable(mtcars[1:6, ]) %>% # data 
  set_caption("mtcars Strikes Again") # caption
mtcars Strikes Again

mpg

cyl

disp

hp

drat

wt

qsec

vs

am

gear

carb

21.0

6

160

110

3.90

2.620

16.46

0

1

4

4

21.0

6

160

110

3.90

2.875

17.02

0

1

4

4

22.8

4

108

93

3.85

2.320

18.61

1

1

4

1

21.4

6

258

110

3.08

3.215

19.44

1

0

3

1

18.7

8

360

175

3.15

3.440

17.02

0

0

3

2

18.1

6

225

105

2.76

3.460

20.22

1

0

3

1

# note that how you add them is different 

Referencing Figures/Tables Inline

Inline Code Chunks

# calculate the mean
M <- mean(mtcars$mpg)
# print it out
M
## [1] 20.09062

Other Useful RStudio Features

Import Data

# library to import all the things 
library(rio)

# data with times and counts
DF <- import("trap_data_08-11-2022.csv")

# what's in the data 
flextable(head(DF))

Grower

Farm

Crop

Variety

Trap

Pest

Latitude

Longitude

6/17/2022

6/18/2022

6/19/2022

6/20/2022

6/21/2022

6/22/2022

6/23/2022

6/24/2022

6/25/2022

6/26/2022

6/27/2022

6/28/2022

6/29/2022

6/30/2022

7/1/2022

7/2/2022

7/3/2022

7/4/2022

7/5/2022

7/6/2022

7/7/2022

7/8/2022

7/9/2022

7/10/2022

7/11/2022

7/12/2022

7/13/2022

7/14/2022

7/15/2022

7/16/2022

7/17/2022

7/18/2022

7/19/2022

7/20/2022

7/21/2022

7/22/2022

7/23/2022

7/24/2022

7/25/2022

7/26/2022

7/27/2022

7/28/2022

7/29/2022

7/30/2022

7/31/2022

8/1/2022

8/2/2022

8/3/2022

8/4/2022

8/5/2022

8/6/2022

8/7/2022

8/8/2022

8/9/2022

8/10/2022

8/11/2022

8/12/2022

8/13/2022

8/14/2022

A. Groskopf

AG

Dry Bean

GN

1

WBC

41.96022

103.6856

-

-

-

0

-

0

-

0

-

-

-

0

-

2

-

-

-

-

1

-

1

-

-

-

18

-

48

-

52

-

-

-

678

-

325

-

266

95

92

64

57

58

60

-

76

15

24

12

6

9

4

3

3

1

1

2

A. Groskopf

AG

Dry Bean

GN

2

WBC

41.95665

103.6844

-

-

-

0

-

0

-

0

-

-

-

0

-

0

-

-

-

-

1

-

1

-

-

-

18

-

29

-

29

-

-

-

321

-

248

-

153

99

62

31

44

44

42

-

98

24

29

6

8

4

5

1

12

6

2

7

A. Groskopf

AG

Dry Bean

GN

3

WBC

41.95672

103.6780

-

-

-

0

-

0

-

0

-

-

-

0

-

0

-

-

-

-

0

-

0

-

-

-

36

-

26

-

93

-

-

-

469

-

221

-

226

45

46

19

26

46

12

-

25

11

6

9

8

12

6

5

4

2

2

3

A. Groskopf

AG

Dry Bean

GN

4

WBC

41.96020

103.6781

-

-

-

0

-

0

-

0

-

-

-

0

-

1

-

-

-

-

2

-

5

-

-

-

60

-

110

-

86

-

-

-

464

-

518

-

448

169

221

239

165

214

117

-

195

82

52

31

60

31

26

15

5

12

9

6

J. Jenkins

JJ

Dry Bean

Pinto

1

WBC

41.99753

103.7723

-

-

-

0

-

0

-

0

-

-

-

0

-

0

-

-

-

-

0

-

0

-

-

-

2

-

5

-

6

-

-

-

333

-

709

-

213

236

107

84

41

63

53

-

30

20

33

20

16

4

7

2

11

9

2

7

J. Jenkins

JJ

Dry Bean

Pinto

2

WBC

41.99753

103.7754

-

-

-

0

-

0

-

0

-

-

-

0

-

0

-

-

-

-

0

-

0

-

-

-

4

-

10

-

13

-

-

-

609

-

581

-

360

162

101

47

28

94

22

-

44

23

31

6

18

4

10

8

14

7

5

8

Plots

How ggplot2 works

# most important library load 
library(ggplot2)

How ggplot2 works

# library to pivot
library(tidyr)

# cleaning up the data into long format
# cleaning up columns to graph 
DF_long <- DF %>% 
  select(Farm, Variety, Trap, Latitude:`8/14/2022`) %>% 
  mutate(across(`6/17/2022`:`8/14/2022`, as.character)) %>% 
  pivot_longer(cols = -c(Farm, Variety, Trap, Latitude, Longitude), 
               names_to = "Date", 
               values_to = "Count") %>% 
  mutate(Count = as.numeric(Count)) %>% 
  filter(!is.na(Count)) %>% 
  mutate(Month = substr(Date, 1, 1)) %>% 
  mutate(Month = factor(Month, 
                        levels = c(6,7,8),
                        labels = c("June", "July", "August"))) %>% 
  mutate(Date = as.Date(Date, format = "%m/%d/%Y"))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Count = as.numeric(Count)`.
## Caused by warning:
## ! NAs introduced by coercion
# what's in the data 
flextable(head(DF_long))

Farm

Variety

Trap

Latitude

Longitude

Date

Count

Month

AG

GN

1

41.96022

103.6856

2022-06-20

0

June

AG

GN

1

41.96022

103.6856

2022-06-22

0

June

AG

GN

1

41.96022

103.6856

2022-06-24

0

June

AG

GN

1

41.96022

103.6856

2022-06-28

0

June

AG

GN

1

41.96022

103.6856

2022-06-30

2

June

AG

GN

1

41.96022

103.6856

2022-07-05

1

July

Bar Graphs

# first layer is what's x, y, color 
ggplot(DF_long, # data
       aes(x = Trap, # x axis
           y = Count, # y axis 
           fill = Farm)) + # fill, color, shape
  # calculate the averages from the data and add bars 
  stat_summary(fun = mean, # what function to do you want to calculate
               geom = "bar", # what "geom" do you want to graph
               position = "dodge") + # don't let the bars overlap
  # calculate the confidence intervals for the means 
  stat_summary(fun.data = mean_cl_normal, # confidence limits
               geom = "errorbar", # error bars or whiskers
               position = position_dodge(width = .9), # don't overlap
               width = .2) + # make them smaller than the bar 
  # make it not an ugly gray graph 
  theme_classic() + 
  # x axis label
  xlab("Trap Number") + 
  # y axis label 
  ylab("Average Count of Pest") + 
  # put the legend on the bottom
  theme(legend.position = "bottom")

Line Graphs

# cool library of themes
library(ggthemes)
# first layer of data 
ggplot(DF_long, # data
       aes(x = Date, # x axis
           y = Count, # y axis
           color = Farm, 
           fill = Farm)) + # both color AND fill for this graph
  # add a line that averages across time by Farm 
  geom_smooth(method = "loess") + 
  # add the points to see the actual data
  geom_point() + 
  # add a silly theme
  theme_wsj() + 
  # notice how this is ignored 
  xlab("Month") + 
  ylab("Number of Pests")
## `geom_smooth()` using formula = 'y ~ x'

Stacked Bar Charts

# first layer of data
ggplot(DF_long %>% 
         group_by(Farm, Month) %>% 
         mutate(Count = sum(Count)) %>% 
         select(Month, Count, Farm) %>% 
         unique(), # manipulate the data by summing across months
       aes(x = Month, # x axis
           y = Count, # y axis
           fill = Farm)) + # fill by Farm 
  # different silly theme
  theme_fivethirtyeight() + 
  # position = fill makes this a percent chart 
  geom_bar(position = "fill", 
           stat = "identity") +   
  # notice again, these are ignored 
  xlab("Month") + 
  ylab("Total Number of Pests")

# first layer of data
ggplot(DF_long %>% 
         group_by(Farm, Month) %>% 
         mutate(Count = sum(Count)) %>% 
         select(Month, Count, Farm) %>% 
         unique(), # manipulate the data by summing across months
       aes(x = Month, # x axis
           y = Count, # y axis
           fill = Farm)) + # fill by Farm 
  # different silly theme
  theme_fivethirtyeight() + 
  # position = stack makes this a raw count 
  geom_bar(position = "stack", 
           stat = "identity") +   
  # notice again, these are ignored 
  xlab("Month") + 
  ylab("Total Number of Pests")

Mosaic Charts

# add on library to ggplot
library(treemapify)
# get the data
DF_pts <- import("processed_data/pts_summary.csv")
# view the data
flextable(head(DF_pts))

country_map

n

n_binned

un_region_sub

un_region

AD

1

< 100

Southern Europe

Europe

AE

20

< 100

Western Asia

Asia

AL

3

< 100

Southern Europe

Europe

AM

1,020

1000-1999

Western Asia

Asia

AR

285

100-999

Latin America and the Caribbean

Americas

AT

1,048

1000-1999

Western Europe

Europe

Mosaic Charts

# make the blank plot 
ggplot(DF_pts, # data frame
       aes(area = n, # how big should the box be
           fill = n_binned, # what should we color by
           # note mine are separate because binning helped visually 
           # usually those are the same 
           label = country_map, # label the boxes
           subgroup = un_region_sub)) + # group by second variable 
  geom_treemap() + # make a treemap 
  # add a border by subgroup
  geom_treemap_subgroup_border(colour = "white", size = 5) +
  # where to put the text and what color 
  geom_treemap_text(colour = "white", place = "centre",
                    size = 15, grow = FALSE) + 
  # make it journal friendly by picking grays 
  scale_fill_manual(name = "Sample Size",
                    values = c("#c8c8c8", "#969696", "#646464", "#323232")) 

Other ggplot2 addons I like

Fin

Buchanan, Erin Michelle, Kelly Cuccolo, Nicholas Alvaro Coles, Tom Heyman, Aishwarya Iyer, Neil Anthony Lewis, Kim Olivia Peters, et al. 2021. “Measuring the Semantic Priming Effect Across Many Languages,” December. https://doi.org/10.31219/osf.io/q4fjy.
Hutchison, Keith A., David A. Balota, James H. Neely, Michael J. Cortese, Emily R. Cohen-Shikora, Chi-Shing Tse, Melvin J. Yap, Jesse J. Bengson, Dale Niemeyer, and Erin Buchanan. 2013. “The Semantic Priming Project.” Behavior Research Methods 45 (4): 1099–1114. https://doi.org/10.3758/s13428-012-0304-z.

  1. This is a footnote↩︎