Wonderland
1
Introduction to R
1.1
Intro to basics
1.1.1
How it works
1.1.2
Arithmetic with R
1.1.3
Variable assignment
1.1.4
Variable assignment (2)
1.1.5
Variable assignment (3)
1.1.6
Apples and oranges
1.1.7
Basic data types in R
1.1.8
What’s the data type?
1.2
Vectors
1.2.1
Create a vector
1.2.2
Create a vector (2)
1.2.3
Create a vector (3)
1.2.4
Naming a vector
1.2.5
Naming a vector (2)
1.2.6
Calculating total winnings
1.2.7
Calculating total winnings (2)
1.2.8
Calculating total winnings (3)
1.2.9
Comparing total winnings
1.2.10
Vector selection: the good times
1.2.11
Vector selection: the good times (2)
1.2.12
Vector selection: the good times (3)
1.2.13
Vector selection: the good times (4)
1.2.14
Selection by comparison - Step 1
1.2.15
Selection by comparison - Step 2
1.2.16
Advanced selection
1.3
Matrices
1.3.1
What’s a matrix?
1.3.2
Analyze matrices, you shall
1.3.3
Naming a matrix
1.3.4
Calculating the worldwide box office
1.3.5
Using dimnames()
1.3.6
Adding a column for the Worldwide box office
1.3.7
Adding a row
1.3.8
The total box office revenue for the entire saga
1.3.9
Selection of matrix elements
1.3.10
A little arithmetic with matrices
1.3.11
A little arithmetic with matrices (2)
1.4
Factors
1.4.1
What’s a factor and why would you use it?
1.4.2
What’s a factor and why would you use it? (2)
1.4.3
What’s a factor and why would you use it? (3)
1.4.4
Factor levels
1.4.5
Summarizing a factor
1.4.6
Battle of the sexes
1.4.7
Ordered factors
1.4.8
Ordered factors (2)
1.4.9
Comparing ordered factors
1.5
Data frames
1.5.1
What’s a data frame?
1.5.2
Quick, have a look at your data set
1.5.3
Have a look at the structure
1.5.4
Creating a data frame
1.5.5
Creating a data frame (2)
1.5.6
Selection of data frame elements
1.5.7
Selection of data frame elements (2)
1.5.8
Only planets with rings
1.5.9
Only planets with rings (2)
1.5.10
Only planets with rings but shorter
1.5.11
Sorting
1.5.12
Sorting your data frame
1.6
Lists
1.6.1
Lists, why would you need them?
1.6.2
Lists, why would you need them? (2)
1.6.3
Creating a list
1.6.4
Creating a named list
1.6.5
Creating a named list (2)
1.6.6
Selecting elements from a list
1.6.7
Creating a new list for another movie
1.7
Challenge
1.8
Solutions
2
Intermediate R
2.1
Conditionals and Control Flow
2.1.1
Video: Relational Operators
2.1.2
Equality
2.1.3
Greater and less than
2.1.4
Compare vectors
2.1.5
Compare matrices
2.1.6
Video: Logical Operators
2.1.7
& and |
2.1.8
& and | (2)
2.1.9
Question: Reverse the result: !
2.1.10
Blend it all together
2.1.11
Video: Conditional Statements
2.1.12
The if statement
2.1.13
Add an else
2.1.14
Customize further: else if
2.1.15
Question: Else if 2.0
2.1.16
Take control!
2.2
Loops
2.2.1
Video: While loop
2.2.2
Write a while loop
2.2.3
Throw in more conditionals
2.2.4
Stop the while loop: break
2.2.5
Build a while loop from scratch
2.2.6
Video: For loop
2.2.7
Loop over a list
2.2.8
Loop over a matrix
2.2.9
Mix it up with control flow
2.2.10
Next, you break it
2.2.11
Build a for loop from scratch
2.3
Functions
2.3.1
Video: Introduction to functions
2.3.2
Function documentation
2.3.3
Use a function
2.3.4
Use a function (2)
2.3.5
Use a function (3)
2.3.6
Functions inside functions
2.3.7
Question: Required, or optional?
2.3.8
Video: Writing functions
2.3.9
Write your own function
2.3.10
Write your own function (2)
2.3.11
Write your own function (3)
2.3.12
Question: Function scoping
2.3.13
Question: R passes arguments by value
2.3.14
R you functional?
2.3.15
R you functional? (2)
2.3.16
Video: R packages
2.3.17
Load an R Package
2.4
The apply family
2.4.1
Use lapply with a built-in R function
2.4.2
Use lapply with your own function
2.4.3
lapply and anonymous functions
2.4.4
Use lapply with additional arguments
2.4.5
Apply functions that return NULL
2.4.6
Video: sapply()
2.4.7
How to use sapply
2.4.8
sapply with your own function
2.4.9
sapply with function returning vector
2.4.10
sapply can’t simplify, now what?
2.4.11
sapply with functions that return NULL
2.4.12
Reverse engineering sapply
2.4.13
Video: vapply
2.4.14
Use vapply
2.4.15
Use vapply (2)
2.4.16
From sapply to vapply
2.5
Utilities
2.5.1
Video: Useful functions
2.5.2
Mathematical utilities
2.5.3
Find the error
2.5.4
Data Utilities
2.5.5
Find the error (2)
2.5.6
Beat Gauss using R
2.5.7
Video: Regular expressions
2.5.8
grepl & grep
2.5.9
grepl & grep (2)
2.5.10
sub & gsub
2.5.11
sub & gsub (2)
2.5.12
Video: Times & Dates
2.5.13
Right here, right now
2.5.14
Create and format dates
2.5.15
Create and format times
2.5.16
Calculations with Dates
2.5.17
Calculations with Times
2.5.18
Time is of the essence
3
R Markdown
3.1
Getting started with R Markdown
3.1.1
Video: Introduction to R Markdown
3.1.2
Creating your first R Markdown file
3.1.3
Adding code chunks to your file
3.1.4
Video: Adding and formatting text
3.1.5
Question: Formatting text
3.1.6
Adding sections to your report
3.1.7
Question: Including links and images
3.1.8
Video: The YAML header
3.1.9
Editing the YAML header
3.1.10
Formatting the date
3.2
Adding Analyses and Visualizations
3.2.1
Video: Analyzing the data
3.2.2
Filtering for a specific country
3.2.3
Filtering for a specific year
3.2.4
Referencing code results in the report
3.2.5
Video: Adding plots
3.2.6
Visualizing the Investment Annual Summary data
3.2.7
Visualizing all projects for one country
3.2.8
Visualizing all projects for one country and year
3.2.9
Video: Plot options
3.2.10
Setting chunk options globally
3.2.11
Setting chunk options locally
3.2.12
Adding figure captions
3.3
Improving the Report
3.3.1
Video: Organizing the report
3.3.2
Creating a bulleted list
3.3.3
Creating a numbered list
3.3.4
Adding a table
3.3.5
Video: Code chunk options
3.3.6
Question: Comparing code chunk options
3.3.7
Collapsing blocks in the knit report
3.3.8
Modifying the report using include and echo
3.3.9
Video: Warnings, messages, and errors
3.3.10
Excluding messages
3.3.11
Excluding warnings
3.4
Customizing the Report
3.4.1
Video: Adding a table of contents
3.4.2
Adding the table of contents
3.4.3
Specifying headers and number sectioning
3.4.4
Adding table of contents options
3.4.5
Video: Creating a report with a parameter
3.4.6
Adding a parameter to the report
3.4.7
Creating a new report using a parameter
3.4.8
Video: Multiple parameters
3.4.9
Adding multiple parameters to the report
3.4.10
Creating a new report using multiple parameters
3.4.11
Video: Customizing the report
3.4.12
Customizing the report style
3.4.13
Customizing the header and table of contents
3.4.14
Customizing the title, author, and date
3.4.15
Referencing the CSS file
3.4.16
Video: Congratulations!
4
Data Manipulation with dplyr
4.1
Transforming Data with dplyr
4.1.1
Video: The counties dataset
4.1.2
Question: Understanding your data
4.1.3
Selecting columns
4.1.4
Video: The filter and arrange verbs
4.1.5
Arranging observations
4.1.6
Filtering for conditions
4.1.7
Filtering and arranging
4.1.8
Video: Mutate
4.1.9
Calculating the number of government employees
4.1.10
Calculating the percentage of women in a county
4.1.11
Select, mutate, filter, and arrange
4.2
Aggregating Data
4.2.1
Video: The count verb
4.2.2
Counting by region
4.2.3
Counting citizens by state
4.2.4
Mutating and counting
4.2.5
Video: The group by, summarize and ungroup verbs
4.2.6
Summarizing by state
4.2.7
Summarizing by state and region
4.2.8
Video: The top_n verb
4.2.9
Selecting a county from each region
4.2.10
Finding the highest-income state in each region
4.2.11
Using summarize, top_n, and count together
4.3
Selecting and Transforming Data
4.3.1
Video: Selecting
4.3.2
Selecting columns
4.3.3
Select helpers
4.3.4
Video: The rename verb
4.3.5
Renaming a column after count
4.3.6
Renaming a column as part of a select
4.3.7
Video: The transmute verb
4.3.8
Question: Choosing among verbs
4.3.9
Using transmute
4.3.10
Question: Matching verbs to their definitions
4.3.11
Choosing among the four verbs
4.4
Case Study: The babynames Dataset
4.4.1
Video: The babynames data
4.4.2
Filtering and arranging for one year
4.4.3
Using top_n with babynames
4.4.4
Visualizing names with ggplot2
4.4.5
Video: Grouped mutates
4.4.6
Finding the year each name is most common
4.4.7
Adding the total and maximum for each name
4.4.8
Visualizing the normalized change in popularity
4.4.9
Video: Window functions
4.4.10
Using ratios to describe the frequency of a name
4.4.11
Biggest jumps in a name
4.4.12
Video: Contratulations!
4.5
Challenge
4.6
Solutions
4.6.1
Question 1
4.6.2
Question 2
4.6.3
Question 3
4.6.4
Question 4
4.6.5
Question 5
4.6.6
Question 6
4.6.7
Question 7
4.6.8
Question 8
4.6.9
Question 9
5
Cleaning Data in R
5.1
1: Common Data Problems
5.1.1
Video: Data type constraints
5.1.2
Question: Common data types
5.1.3
Converting data types
5.1.4
Trimming strings
5.1.5
Video: Range constraints
5.1.6
Ride duration constraints
5.1.7
Back to the future
5.1.8
Video: Uniqueness constraints
5.1.9
Full duplicates
5.1.10
Removing partial duplicates
5.1.11
Aggregating partial duplicates
5.2
2: Categorical and Text Data
5.2.1
Video: Checking membership
5.2.2
Question: Members only
5.2.3
Not a member
5.2.4
Video: Categorical data problems
5.2.5
Identifying inconsistency
5.2.6
Correcting inconsistency
5.2.7
Collapsing categories
5.2.8
Video: Cleaning text data
5.2.9
Detecting inconsistent text data
5.2.10
Replacing and removing
5.2.11
Invalid phone numbers
5.3
3: Advanced Data Problems
5.3.1
Date uniformity
5.3.2
Currency uniformity
5.3.3
Video: Cross field validation
5.3.4
Validating totals
5.3.5
Validating age
5.3.6
Video: Completeness
5.3.7
Question: Types of missingness
5.3.8
Visualizing missing data
5.3.9
Treating missing data
5.4
4: Record Linkage
5.4.1
Calculating distance
5.4.2
Small distance, small difference
5.4.3
Fixing typos with string distance
5.4.4
Video: Generating and comparing pairs
5.4.5
Exercise: Link or join?
5.4.6
Pair blocking
5.4.7
Comparing pairs
5.4.8
Video: Scoring and linking
5.4.9
Score then select or select then score?
5.4.10
Putting it together
5.4.11
Video: Congratulations!
6
Introduction to Data Visualization with ggplot2
7
Exploratory Data Analysis in R
7.1
Comics
7.1.1
Video: Exploring categorical data
7.1.2
Question: Bar chart expectations
7.1.3
Contingency table review
7.1.4
Dropping levels
7.1.5
Side-by-side barcharts
7.1.6
Question: Bar chart interpretation
7.1.7
Video: Counts vs. proportions
7.1.8
Question: Conditional proportions
7.1.9
Counts vs. proportions (2)
7.1.10
Video: Distribution of one variable
7.1.11
Marginal barchart
7.1.12
Conditional barchart
7.1.13
Improve piechart
7.2
Cars
7.2.1
Video: Exploring numerical data
7.2.2
Faceted histogram
7.2.3
Boxplots and density plots
7.2.4
Compare distribution via plots
7.2.5
Video: Distribution of one variable
7.2.6
Marginal and conditional histograms
7.2.7
Question: Marginal and conditional histograms interpretation
7.2.8
Three binwidths
7.2.9
Question: Three binwidths interpretation
7.2.10
Video: Box plots
7.2.11
Box plots for outliers
7.2.12
Plot selection
7.2.13
Video: Visualization in higher dimensions
7.2.14
3 variable plot
7.2.15
Question: Interpret 3 var plot
7.3
Gapminder
7.3.1
Video: Measures of center
7.3.2
Question: Choice of center measure
7.3.3
Calculate center measures
7.3.4
Video: Measures of variability
7.3.5
Calculate spread measures
7.3.6
Choose measures for center and spread
7.3.7
Video: Shape and transformations
7.3.8
Describe the shape
7.3.9
Transformations
7.3.10
Video: Outliers
7.3.11
Identify outliers
7.4
Email
7.4.1
Video: Introducing the data
7.4.2
Spam and num_char
7.4.3
Spam and num_char interpretation
7.4.4
Spam and !!!
7.4.5
Spam and !!! interpretation
7.4.6
Video: Check-in 1
7.4.7
Collapsing levels
7.4.8
Question: Image and spam interpretation
7.4.9
Data Integrity
7.4.10
Answering questions with chains
7.4.11
Video: Check-in 2
7.4.12
What’s in a number?
7.4.13
What’s in a number interpretation
7.4.14
Video: Conclusion
7.5
Challenge
7.6
Solutions 1: Data cleaning and summarizing with dplyr
7.6.1
Video: The United Nations Voting Dataset
7.6.2
Filtering rows
7.6.3
Adding a year column
7.6.4
Adding a country column
7.6.5
Video: Grouping and summarizing
7.6.6
Summarizing the full dataset
7.6.7
Summarizing by year
7.6.8
Summarizing by country
7.6.9
Video: Sorting and filtering summarized data
7.6.10
Sorting by percentage of “yes” votes
7.6.11
Filtering summarized output
7.7
Solutions 2: Visualization with ggplot2
7.7.1
Video: Visualization with ggplot2
7.7.2
Choosing an aesthetic
7.7.3
Other ggplot2 layers
7.7.4
Video: Visualizing by country
7.7.5
Summarizing by year and country
7.7.6
Plotting just the UK over time
7.7.7
Plotting multiple countries
7.7.8
Video: Faceting by country
7.7.9
Faceting the time series
7.7.10
Faceting with free y-axis
7.7.11
Choose your own countries
7.8
Solutions 3: Tidy modeling with broom
7.8.1
Video: Linear regression
8
Correlation and Regression in R
9
Intermediate Data Visualization with ggplot2
9.1
Statistics
9.1.1
Video: Stats with geoms
9.1.2
Smoothing
10
Tree-Based Models in R
11
End-of-semester project
Published with bookdown
R Programming for Business
Module 10
Tree-Based Models in R