Automate All the Things

Erin M. Buchanan

2023-04-15

Outline

What and Why Markdown
The parts of a Markdown doc
Options to tweak documents
Options for output

What is Markdown?

“Markdown is a lightweight markup language for creating formatted text using a plain-text editor”
“Markdown is a lightweight markup language that you can use to add formatting elements to plain text text documents.”
“R Markdown provides an authoring framework for data science.”
In short, it’s a set of simple codes that help us transform text into formatted documents.

Flavors? Is this Ice Cream?

Markdown has multiple “flavors”
A good analogy is language dialect
R Markdown is for R, GitHub uses similar version and so on

Why should I use it?

You can create many types of things in markdown:
- Websites
- Books
- Academic/scientific reports
- Presentations (like this one!)
- and more!

Why should I use it?

Portable: since it is text, you can open it in many ways
Platform independent: technically since it’s raw text
Lots of people use it, so the skill transfers to unexpected places (Reddit, Slack)

Why should I use programming markdown?

Things like R markdown, Jupyter notebooks, Quarto, …
You can integrate text and code into the same document!
You can create reproducible reports
You can annotate your notes with reminders for later
Put everything in one place!
Have more control over documents
Cut and paste errors minimized

How it works

Some Useful Rmd Links

What do I need?

R, RStudio (software)
LaTeX for PDF documents

install.packages("tinytex")
tinytex::install_tinytex()

Word or something like it (Libre, OpenOffice) for DOC(X)
Powerpoint for PPT(X)

Other Packages

For today:

install.packages(c("rmarkdown", "knitr", "flextable", "dplyr", 
                   "rio", "ggplot2", "ggthemes", "treemapify"))

Specific goals:
- papaja is great for APA journal articles, and rticles has many journal article templates
- bookdown and blogdown for building open source text books and websites with markdown
- officedown and officer for Word document formatting

Let’s Get Started!

Let’s create a markdown template to show off the components
File > New File > Rmarkdown

Let’s Get Started!

Select your desired output (start with HTML)
Enter a title
Pick a template if you want
Create!

Rmd Document Parts - YAML

yet another markup language
Header of your document
The way you can control the type and options for the overall document

---
title: "Untitled" 
author: "Erin M. Buchanan"
date: "2023-04-15"
output: html_document
---

Rmd Document Parts - YAML

A note about YAMLs: it is very picky about indentation
YAML depends on the output options you pick
Let’s try a few of them out
“Knit” the document in html format to get an idea of what this looks like when you start

Knitting

The most important thing to remember: when you hit knit, everything you have in your environment is ignored
- You start with a blank slate
- You must load the libraries in the markdown
- The order matters!
Other things learned the hard way
- Don’t put View() in a markdown
- Don’t install packages in a markdown
- Watch out for funky symbols

HTML - YAML

Add a table of contents

output:
  html_document:
    toc: true
    toc_depth: 2

HTML - YAML

Floating table of contents

output:
  html_document:
    toc: true
    toc_float: true

output:
  html_document:
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false

HTML - YAML

Number your sections

output:
  html_document:
    number_sections: true

HTML - YAML

theme: here are the options default, bootstrap, cerulean, cosmo, darkly, flatly, journal, lumen, paper, readable, sandstone, simplex, spacelab, united, and yeti
highlight: default, tango, pygments, kate, monochrome, espresso, zenburn, haddock, breezedark, and textmate
code_folding: hide or code_folding: show allow you to show your code but enable people to hide it

output:
  html_document:
    theme: united
    highlight: tango

PDF - YAML

Many of the same features still work
Tables of contents, code highlighting
Options for the Pandoc engines (aka how it knits)

Word - YAML

Many of the same features still work
Tables of contents, code highlighting
Best feature: creating a custom style

output:
  word_document:
    reference_docx: my-styles.docx

Rmd Document Parts - Narrative

Any section you write raw text
You can add the official markdown here
You can add LaTeX to create fancy equations and symbols $\beta$
You can use raw LaTeX or HTML that will be converted
And more!

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Rmd Document Parts - Narrative

A note about spacing: you have to leave a blank line between items if you want to make a new paragraph
What are the most common things we can add?
Text styling, links, citations and more

Narrative - Headers/Blocks

# First-level header

## Second-level header

### Third-level header

Narrative - Text Styles

italic: _text_or *text
Bold: *text*
M_Group: M~Group~
R²: R^2^
Links: [text](link)
Images: ![alt text or image title](path/to/image)
- Or use the knitr package: include_graphics("path/to/image")
Footnotes¹: ^[This is a footnote.]

Narrative - Lists

You can make numbered or bulleted lists

- one item
- one item
- one item
    - one more item
    - one more item
    - one more item

Narrative - Citations

There are a few ways to enter citations but generally are listed as @citationname
This @citationname is found stored in a separate .bib. file that might have an entry like this:

@Manual{R-base,
  title = {R: A Language and Environment for Statistical
    Computing},
  author = {{R Core Team}},
  organization = {R Foundation for Statistical Computing},
  address = {Vienna, Austria},
  year = {2017},
  url = {https://www.R-project.org/},
}

Narrative - Citations

How can I create the bib file or organize the citations?
You can use a free/paid manager like Zotero or Endnote
You can export them from the journal website
You can let RStudio do it for you with the Visual editor

Narrative - Citations

You can also just start typing into the text in the Visual editor

Narrative - Citations

Here’s an example of how the citations print:
I think the Semantic Priming Project is cool (Hutchison et al. 2013).
So cool that Buchanan et al. (2021) is working on a super replication of it.
Include [] around the citation to make it in text, while no [] will integrate it into the sentence.

Narrative - Citations

Default citation style is to put them in to Chicago at the end of the document
To change this, use the YAML and add a .csl file
Put this file into the same folder as the markdown
You can find a ton of them at https://www.zotero.org/styles

---
output: html_document
bibliography: references.bib
csl: biomed-central.csl
---

Narrative - Tabbed Sections

I asked the internet what they thought was cool - tabsets!
If you use the {.tabset} class, you can organize output into tabs that fall under that subheader

## Quarterly Results {.tabset}

### By Product

(tab content)

### By Region

(tab content)

Two other cool features:

## Quarterly Results {.tabset .tabset-fade .tabset-pills}

Rmd Document Parts - Code Chunks

You can add code in the middle of narrative sections as a chunk or as an inline section.
Chunks have three backticks (`) with the code language next to it: `{r}`
Inline have one backtick with the code language next to it: `r`
You can add them where ever you need

Code Chunks

Code chunks don’t have to be just R
You can connect to other software engines like Python, SQL, Stan, Bash…
You can use them to talk to R sessions
For example, I teach courses that use R and Python interactively in the same documents

Code Chunks

Code chunks can have names - this means you can reference them in the document
- Great for referencing figures and tables
- This also works nicely in the visual editor
{r chunkname}
- You can use spaces, but wouldn’t recommend
- Learn from my pain, do not use _ in a chunk name, use - instead
- You don’t have to name them but helps with navigation
- You can’t use the same name twice

Code Chunks - Options

After the name of the chunk and inside the {}
These options control the behavior of that specific chunk
You can also set these options globally
- knitr::opts_chunk$set(echo = FALSE)

Code Chunks - Options

Use option = TRUE or option = FALSE
eval: Whether to evaluate a code chunk.
- Turning this off can help show code without running it
echo: Whether to print the source code in the output document
include: Whether to include anything from a code chunk in the output document.
- include = FALSE, this whole code chunk is excluded in the output

Code Chunks - Options

collapse: Whether to merge text output and source code into a single code block in the output.
- collapse = TRUE makes the output more compact
- Default collapse = FALSE means R expressions and their text output are separated into different blocks
warning, message, and error: Whether to show warnings, messages, and errors in the output document.
cache: Whether to enable caching. (useful but be careful)

Code Chunks - Options

results:
- hide - text output is hidden
- asis - text output is written “as is”
Figures:
- fig.width and fig.height: The size of R plots in inches.
- out.width and out.height: The output size of R plots in the output document - you can use percentages here
- fig.align: The alignment of plots. It can be 'left', 'center', or 'right'.
- fig.cap: The figure caption.
Multiple Rmds:
- child: You can include a Rmd document in the main document.

Figures

Figures are normally autonumbered
They usually print right at the spot you used the code block to make them (depends on the template)
We will do some ggplot2 figures after tables and show you cowplot to help organize multiple figures together

Tables

kable() in knitr and the kableExtra package are awesome for tables
flextable is a very flexible table maker that prints nicely across formats as well

# load some data
data("mtcars")
# load some libraries 
library(flextable)
library(knitr)
library(dplyr)

Tables - Kable

# make a kable 
kable(mtcars[1:6, ], # data 
      caption = "mtcars Strikes Again") # table caption

mtcars Strikes Again
	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

Tables - Flextable

# make a ft
flextable(mtcars[1:6, ]) %>% # data 
  set_caption("mtcars Strikes Again") # caption

mtcars Strikes Again
mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

# note that how you add them is different

Referencing Figures/Tables Inline

\@ref(type:label) where label is tab, fig, or eq
Table @ref(tab:flex-cars)
Not available in all formats

Inline Code Chunks

Inline code chunks allow you to add code output right in the middle of a sentence
Great for reporting exact numbers or other parameters from saved output

# calculate the mean
M <- mean(mtcars$mpg)
# print it out
M

## [1] 20.09062

The mean MPG was 20.090625 (` r M `)
You can also clean this up 20.09
- ` r format(M, digits = 2, nsmall = 2) `

Other Useful RStudio Features

Outline
Bottom outline
Visual editor

Import Data

As long as you put the data in the same folder as the Rmd, it will assume you want to find the file there
Look into making .Rprojects as well to make handling files easier

# library to import all the things 
library(rio)

# data with times and counts
DF <- import("trap_data_08-11-2022.csv")

# what's in the data 
flextable(head(DF))

Grower	Farm	Crop	Variety	Trap	Pest	Latitude	Longitude	6/17/2022	6/18/2022	6/19/2022	6/21/2022	6/23/2022	6/25/2022	6/26/2022	6/27/2022	6/29/2022	6/30/2022	7/1/2022	7/2/2022	7/3/2022	7/4/2022	7/5/2022	7/6/2022	7/7/2022	7/8/2022	7/9/2022	7/10/2022	7/11/2022	7/12/2022	7/13/2022	7/14/2022	7/15/2022	7/16/2022	7/17/2022	7/18/2022	7/19/2022	7/20/2022	7/21/2022	7/22/2022	7/23/2022	7/24/2022	7/25/2022	7/26/2022	7/27/2022	7/28/2022	7/29/2022	7/30/2022	7/31/2022	8/1/2022	8/2/2022	8/3/2022	8/4/2022	8/5/2022	8/6/2022	8/7/2022	8/8/2022	8/9/2022	8/10/2022	8/11/2022
A. Groskopf	AG	Dry Bean	GN	1	WBC	41.96022	103.6856	-	-	-	-	-	-	-	-	-	2	-	-	-	-	1	-	1	-	-	-	18	-	48	-	52	-	-	-	678	-	325	-	266	95	92	64	57	58	60	-	76	15	24	12	6	9	4	3	3	1	1	2
A. Groskopf	AG	Dry Bean	GN	2	WBC	41.95665	103.6844	-	-	-	-	-	-	-	-	-	0	-	-	-	-	1	-	1	-	-	-	18	-	29	-	29	-	-	-	321	-	248	-	153	99	62	31	44	44	42	-	98	24	29	6	8	4	5	1	12	6	2	7
A. Groskopf	AG	Dry Bean	GN	3	WBC	41.95672	103.6780	-	-	-	-	-	-	-	-	-	0	-	-	-	-	0	-	0	-	-	-	36	-	26	-	93	-	-	-	469	-	221	-	226	45	46	19	26	46	12	-	25	11	6	9	8	12	6	5	4	2	2	3
A. Groskopf	AG	Dry Bean	GN	4	WBC	41.96020	103.6781	-	-	-	-	-	-	-	-	-	1	-	-	-	-	2	-	5	-	-	-	60	-	110	-	86	-	-	-	464	-	518	-	448	169	221	239	165	214	117	-	195	82	52	31	60	31	26	15	5	12	9	6
J. Jenkins	JJ	Dry Bean	Pinto	1	WBC	41.99753	103.7723	-	-	-	-	-	-	-	-	-	0	-	-	-	-	0	-	0	-	-	-	2	-	5	-	6	-	-	-	333	-	709	-	213	236	107	84	41	63	53	-	30	20	33	20	16	4	7	2	11	9	2	7
J. Jenkins	JJ	Dry Bean	Pinto	2	WBC	41.99753	103.7754	-	-	-	-	-	-	-	-	-	0	-	-	-	-	0	-	0	-	-	-	4	-	10	-	13	-	-	-	609	-	581	-	360	162	101	47	28	94	22	-	44	23	31	6	18	4	10	8	14	7	5	8

Plots

Bar Graphs
Line Graphs
Stacked Bar Charts
Mosaic Charts
Note: ggplot2 can be overwhelming!
- Make accidental-aRt
- Lots of googling

How `ggplot2` works

First you define the basic structure of a plot you want.
Next, you can add options to create new layers on the graph.
They used to say “imagine it’s a projector with transparency sheets” but … that reference is getting old!
Mostly, I just think this plus this plus that

# most important library load 
library(ggplot2)

How `ggplot2` works

Notably, ggplot2 generally wants data in long format
- Wide: one row per “participant”
- Long: one column per variable
Data restructuring isn’t that much fun
See code below on how I did this, but beyond the scope today

# library to pivot
library(tidyr)

# cleaning up the data into long format
# cleaning up columns to graph 
DF_long <- DF %>% 
  select(Farm, Variety, Trap, Latitude:`8/14/2022`) %>% 
  mutate(across(`6/17/2022`:`8/14/2022`, as.character)) %>% 
  pivot_longer(cols = -c(Farm, Variety, Trap, Latitude, Longitude), 
               names_to = "Date", 
               values_to = "Count") %>% 
  mutate(Count = as.numeric(Count)) %>% 
  filter(!is.na(Count)) %>% 
  mutate(Month = substr(Date, 1, 1)) %>% 
  mutate(Month = factor(Month, 
                        levels = c(6,7,8),
                        labels = c("June", "July", "August"))) %>% 
  mutate(Date = as.Date(Date, format = "%m/%d/%Y"))

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Count = as.numeric(Count)`.
## Caused by warning:
## ! NAs introduced by coercion

# what's in the data 
flextable(head(DF_long))

Farm	Variety	Trap	Latitude	Longitude	Date	Count	Month
AG	GN	1	41.96022	103.6856	2022-06-20	0	June
AG	GN	1	41.96022	103.6856	2022-06-22	0	June
AG	GN	1	41.96022	103.6856	2022-06-24	0	June
AG	GN	1	41.96022	103.6856	2022-06-28	0	June
AG	GN	1	41.96022	103.6856	2022-06-30	2	June
AG	GN	1	41.96022	103.6856	2022-07-05	1	July

Bar Graphs

Are there average differences in trap by farm?

# first layer is what's x, y, color 
ggplot(DF_long, # data
       aes(x = Trap, # x axis
           y = Count, # y axis 
           fill = Farm)) + # fill, color, shape
  # calculate the averages from the data and add bars 
  stat_summary(fun = mean, # what function to do you want to calculate
               geom = "bar", # what "geom" do you want to graph
               position = "dodge") + # don't let the bars overlap
  # calculate the confidence intervals for the means 
  stat_summary(fun.data = mean_cl_normal, # confidence limits
               geom = "errorbar", # error bars or whiskers
               position = position_dodge(width = .9), # don't overlap
               width = .2) + # make them smaller than the bar 
  # make it not an ugly gray graph 
  theme_classic() + 
  # x axis label
  xlab("Trap Number") + 
  # y axis label 
  ylab("Average Count of Pest") + 
  # put the legend on the bottom
  theme(legend.position = "bottom")

Line Graphs

What are the values across time?

# cool library of themes
library(ggthemes)
# first layer of data 
ggplot(DF_long, # data
       aes(x = Date, # x axis
           y = Count, # y axis
           color = Farm, 
           fill = Farm)) + # both color AND fill for this graph
  # add a line that averages across time by Farm 
  geom_smooth(method = "loess") + 
  # add the points to see the actual data
  geom_point() + 
  # add a silly theme
  theme_wsj() + 
  # notice how this is ignored 
  xlab("Month") + 
  ylab("Number of Pests")

## `geom_smooth()` using formula = 'y ~ x'

Stacked Bar Charts

Are there differences in farms by month?

# first layer of data
ggplot(DF_long %>% 
         group_by(Farm, Month) %>% 
         mutate(Count = sum(Count)) %>% 
         select(Month, Count, Farm) %>% 
         unique(), # manipulate the data by summing across months
       aes(x = Month, # x axis
           y = Count, # y axis
           fill = Farm)) + # fill by Farm 
  # different silly theme
  theme_fivethirtyeight() + 
  # position = fill makes this a percent chart 
  geom_bar(position = "fill", 
           stat = "identity") +   
  # notice again, these are ignored 
  xlab("Month") + 
  ylab("Total Number of Pests")

# first layer of data
ggplot(DF_long %>% 
         group_by(Farm, Month) %>% 
         mutate(Count = sum(Count)) %>% 
         select(Month, Count, Farm) %>% 
         unique(), # manipulate the data by summing across months
       aes(x = Month, # x axis
           y = Count, # y axis
           fill = Farm)) + # fill by Farm 
  # different silly theme
  theme_fivethirtyeight() + 
  # position = stack makes this a raw count 
  geom_bar(position = "stack", 
           stat = "identity") +   
  # notice again, these are ignored 
  xlab("Month") + 
  ylab("Total Number of Pests")

Mosaic Charts

There are an unbelievable number of add on packages for ggplot2
Since a lot of your data is categorical, consider treemaps or mosiac plots

# add on library to ggplot
library(treemapify)
# get the data
DF_pts <- import("processed_data/pts_summary.csv")
# view the data
flextable(head(DF_pts))

country_map	n	n_binned	un_region_sub	un_region
AD	1	< 100	Southern Europe	Europe
AE	20	< 100	Western Asia	Asia
AL	3	< 100	Southern Europe	Europe
AM	1,020	1000-1999	Western Asia	Asia
AR	285	100-999	Latin America and the Caribbean	Americas
AT	1,048	1000-1999	Western Europe	Europe

Mosaic Charts

# make the blank plot 
ggplot(DF_pts, # data frame
       aes(area = n, # how big should the box be
           fill = n_binned, # what should we color by
           # note mine are separate because binning helped visually 
           # usually those are the same 
           label = country_map, # label the boxes
           subgroup = un_region_sub)) + # group by second variable 
  geom_treemap() + # make a treemap 
  # add a border by subgroup
  geom_treemap_subgroup_border(colour = "white", size = 5) +
  # where to put the text and what color 
  geom_treemap_text(colour = "white", place = "centre",
                    size = 15, grow = FALSE) + 
  # make it journal friendly by picking grays 
  scale_fill_manual(name = "Sample Size",
                    values = c("#c8c8c8", "#969696", "#646464", "#323232"))

Other `ggplot2` addons I like

Rcolorbrewer
GGally
ggmap
gganimate
ggthemes
ggpubr

Fin

You have learned all the things!

Buchanan, Erin Michelle, Kelly Cuccolo, Nicholas Alvaro Coles, Tom Heyman, Aishwarya Iyer, Neil Anthony Lewis, Kim Olivia Peters, et al. 2021. “Measuring the Semantic Priming Effect Across Many Languages,” December. https://doi.org/10.31219/osf.io/q4fjy.

Hutchison, Keith A., David A. Balota, James H. Neely, Michael J. Cortese, Emily R. Cohen-Shikora, Chi-Shing Tse, Melvin J. Yap, Jesse J. Bengson, Dale Niemeyer, and Erin Buchanan. 2013. “The Semantic Priming Project.” Behavior Research Methods 45 (4): 1099–1114. https://doi.org/10.3758/s13428-012-0304-z.

This is a footnote↩︎

Automate All the Things

Outline

What is Markdown?

Flavors? Is this Ice Cream?

Why should I use it?

Why should I use it?

Why should I use programming markdown?

How it works

Some Useful Rmd Links

What do I need?

Other Packages

Let’s Get Started!

Let’s Get Started!

Rmd Document Parts - YAML

Rmd Document Parts - YAML

Knitting

HTML - YAML

HTML - YAML

HTML - YAML

HTML - YAML

PDF - YAML

Word - YAML

Rmd Document Parts - Narrative

Rmd Document Parts - Narrative

Narrative - Headers/Blocks

Narrative - Text Styles

Narrative - Lists

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Citations

Narrative - Tabbed Sections

Rmd Document Parts - Code Chunks

Code Chunks

Code Chunks

Code Chunks - Options

Code Chunks - Options

Code Chunks - Options

Code Chunks - Options

Figures

Tables

Tables - Kable

Tables - Flextable

Referencing Figures/Tables Inline

Inline Code Chunks

Other Useful RStudio Features

Import Data

Plots

How ggplot2 works

How ggplot2 works

Bar Graphs

Line Graphs

Stacked Bar Charts

Mosaic Charts

Mosaic Charts

Other ggplot2 addons I like

Fin

How `ggplot2` works

How `ggplot2` works

Other `ggplot2` addons I like