class: center, title-slide # STAT 302: Lecture 0 ## Course Overview ### Peter Gao (adapted from slides by Bryan Martin) --- # Outline 1. Course Overview 2. Introductions 3. What is R? What is statistical computing? 4. R Basics 5. Short Lab 0 --- # Syllabus .middler[.large[ [Link to syllabus](https://peteragao.github.io/STAT302-AUT2021/syllabus.html) ]] --- # Introductions * What should we call you? * Why are you taking this class? * One goal for the school year --- # Collaboration * You may discuss problems, approaches, and solutions with your **classmates**. * You must credit anyone with whom you worked on each assignment. * All submitted work must be your own; you should not submit code or answers copied from any resource including your classmates. --- # Piazza Discussion * Worth up to 2% extra credit on your final grade! * Substantive and helpful questions and answers -- .pull-left[ ### Bad questions: * How do you do problem 2? * Here's my code and it's broken. How do I fix it? ] .pull-right[ ### Good questions: * Here's a snippet of code I used for problem 2: <br/>`formatted code snippet` <br/>It returned the following error: <br/>`formatted error message` <br/>Does anyone know why? I already tried... * I don't understand the concept from Slide 18 today. Could anyone elaborate on why... ] --- # Piazza Discussion * Worth up to 2% extra credit on your final grade! * Substantive and helpful questions and answers .pull-left[ ### Bad answers: * this is sooooo easy, here's my solution ] .pull-right[ ### Good answers: * This error message occurs because your variable is a string instead of a numeric. Have you tried checking... * I think you have a bug in line 3 of the code you posted. You have more left parentheses than right parentheses so the line is incomplete. ] --- # What is statistical computing? -- .middler[.large[statistics + computing]] --- # What is statistical computing? In this class we will discuss some basic computer science concepts, but we will emphasize skills for utilizing computers to aid in data analysis. -- Advances in computation have enabled advances at every step of the data analysis pipeline: * Data collection, storage, and sharing * Exploratory data analysis and visualization * Statistical inference and prediction * Simulation * Communication and distribution of results --- # Why R? R is a programming language designed for statistical analysis. -- * open-source * free * large and active community of developers and users * great analysis tools * great visualization tools -- * great user interface... --- # Why RStudio? RStudio is an integrated development environment (IDE) designed to make your life easier. -- * Organizes scripts, files, plots, code console, ... * Highlights syntax * Helpful interactive graphical interface * Will make an efficient, reproducible workflow *much* easier -- * R Markdown integration... --- # Why R Markdown? * Combine code, output, and writing * Self-contained analyses * Creates HTML, PDF, slides (like these!), webpages, ... -- * Required for your labs! --- class: inverse .middler[.huge[Part 1: Introduction to R Utilities]] --- # Operators ```r # Addition 6 + 3 ``` ``` ## [1] 9 ``` ```r # Subtraction 6 - 3 ``` ``` ## [1] 3 ``` ```r # Multiplication 6 * 3 ``` ``` ## [1] 18 ``` ```r # Division 6 / 3 ``` ``` ## [1] 2 ``` --- # Comparison Operators ```r # Greater than 6 > 3 ``` ``` ## [1] TRUE ``` ```r # Less than 6 < 3 ``` ``` ## [1] FALSE ``` ```r # Equal to 6 == 3 ``` ``` ## [1] FALSE ``` ```r 6 == 3 + 3 ``` ``` ## [1] TRUE ``` --- # Comparison Operators ```r # Not equal to 6 != 3 ``` ``` ## [1] TRUE ``` ```r 6 < 6 ``` ``` ## [1] FALSE ``` ```r # Less than or equal to 6 <= 6 ``` ``` ## [1] TRUE ``` --- # Logical Operators ```r # and (6 < 3) & (1 < 3) ``` -- ``` ## [1] FALSE ``` -- ```r # and (2 < 3) & (1 < 3) ``` -- ``` ## [1] TRUE ``` -- ```r # or (6 < 3) | (1 < 3) ``` -- ``` ## [1] TRUE ``` -- ```r # a bit harder... (6 < 3) | (1 < 3) & (6 < 3) ``` -- ``` ## [1] FALSE ``` --- # Object Types ```r class(7) ``` ``` ## [1] "numeric" ``` ```r class("7") ``` ``` ## [1] "character" ``` ```r is.numeric(7) ``` ``` ## [1] TRUE ``` ```r is.numeric("7") ``` ``` ## [1] FALSE ``` --- # Object Types ```r is.character(7) ``` ``` ## [1] FALSE ``` ```r is.character("7") ``` ``` ## [1] TRUE ``` ```r is.na(7) ``` ``` ## [1] FALSE ``` ```r is.na(0/0) ``` ``` ## [1] TRUE ``` --- # Object Types ```r as.character(7) ``` ``` ## [1] "7" ``` ```r as.numeric("7") ``` ``` ## [1] 7 ``` ```r as.numeric("7") + 3 == 10 ``` ``` ## [1] TRUE ``` ```r "7" + 3 == 10 ``` ``` ## Error in "7" + 3: non-numeric argument to binary operator ``` --- # Assigning Variables ```r x <- 7 x ``` ``` ## [1] 7 ``` ```r x + 3 ``` ``` ## [1] 10 ``` ```r x == 7 ``` ``` ## [1] TRUE ``` ```r as.character(x) ``` ``` ## [1] "7" ``` ```r y <- 3 x + y ``` ``` ## [1] 10 ``` --- # Workspaces ```r # List all defined objects ls() ``` ``` ## [1] "x" "y" ``` ```r # Remove an object rm("x") ls() ``` ``` ## [1] "y" ``` ```r x ``` ``` ## Error in eval(expr, envir, enclos): object 'x' not found ``` --- # Workspaces ```r x <- 7 ls() ``` ``` ## [1] "x" "y" ``` ```r # Use with caution! This erases everything! rm(list = ls()) ls() ``` ``` ## character(0) ``` --- layout:false class: inverse .middler[.huge[Part 2: Using RStudio and R Markdown]] --- # RStudio Interface By default... * *Top left*: Editor pane. Browse and edit scripts and data with tabs * *Top right*: List of objects in your Environment (recall `ls()`), code History * *Bottom left*: Console for running R code line-by-line (`>` prompt) * *Bottom right*: Files, plots, packages, help files --- # Editor * Your workflow should be contained here (**not** your console) * Primarily used for writing and editing .R scripts -- * Try opening a file now using *File > New File > R Script*, write two lines of simple code * Click `Run` in the bar above your script. What happens? * Click on one of the lines of code. Press `Ctrl`/`⌘` + `Enter`. What happens? -- .center[**Important:** Every part of your R workflow belongs in this window!] --- layout: true # Environment & History * If you didn't already, define a variable in your R Script and run it * What happens in your Environment tab? -- * Type `install.packages("palmerpenguins")` in your Console. * Now add `library(palmerpenguins)` and `data(penguins)` to your script and run it. * What happens if you click on this in your Environment tab? * Note: We will delve deeper into data later! -- * Remove one of your variables and see what happens. --- * Click on the History tab to see what it contains. Try searching! -- * Select a line from your history and click `To Source`. What happens? -- * Useful for adding lines that you tested in your Console to your scripts -- .pushdown[.center[**Summary:** Useful to quickly browse what you have defined in your environment]] --- layout: false layout: true # Console --- * The quick and easy way to run individual lines of code * Nothing you do here is saved as part of your workflow! -- * Useful for debugging, testing code, iterating a plot until you like it ... * Once you get what you were looking for, add it to your script files! * **Never** manipulate your data in the console. Your workflow should always be **reproducible!** --- ## Incomplete Code What if we start a command, but do not finish it? ```r > 5 - + ``` Two options: * Press `Esc` to exit and *not* execute the line * Complete the command --- layout: false # Files, Plots, Packages, Help * We will explore this tab more as we get into functions and visualization * Files is used to browse the files on your computer * Useful for opening files/data, moving files you are working with * *Use caution!* Changing files here is the same as changing them on your computer. If you delete something, it's gone! * Plots are used to display plots you create in R * Help is used to browse help files of functions. You can explore these by preceding a function name with `?`. Try `?sqrt` to see. * Packages shows all the packages you currently have installed (we will get more into this later!) --- class: inverse .middler[.huge[Brief Intermission: File Organization]] --- layout:true # File Names Matter --- .pull-left[ ## Bad * `newfinal2actualFINALnew.docx` * `asdfasdf.R` * `analy$i$ functions!.R` * `stuff.R` * Cluttered * Uninformative * Spaces * Special characters other than `_` and `-` ] .pull-right[ ## Good * `stat302_lab1.Rmd` * `analysis_functions.R` * `analysisFunctions.R` * `2020-01-08_labWriteup.Rmd` * Meaningful * Concise * camelCase or using `_` to distinguish words * Machine sortable ] --- ## Summary * Machine readable * Human readable * Plays well with default ordering -- * `01_draft.Rmd`, `02_draft.Rmd` , ... , `11_draft.Rmd` * `2018-05-05_resume.docx`, `2019-02-17_resume.docx`, `2020-01-08_resume.docx` --- layout: false layout: true # File Organization Matters --- Easier to start with best practice rather than fix things later! .middler[] --- 1. Somewhere on your computer, create the folder `STAT302` 2. Within that folder, create the subfolders `short_labs`<sup>1</sup>, `labs`, `projects` 3. Within your Short Labs folder, create a subfolder `short_lab_1`<sup>2</sup> 4. Put your both of short lab files from Monday into that folder 5. Within your Labs folder, create a folder for Lab 1 that follows the filename guide .footnote[[1] or `shortLabs`, `ShortLabs`, `Short_Labs`, ... (just follow the rules for file names!) [2] or `shortLab1`, `short_lab1`, ... ] -- .pushdown[May seem excessive for now, but this will come in handy when labs start including extra files such as data and figures!] --- layout: false # All done! For now... .middler[] --- layout: true # R Markdown --- Let's try making an R Markdown file: 1. Choose *File > New File > R Markdown...* 2. Make sure *HTML Output* is selected and click OK 3. Save the file in your new folder, call it `stat302_Lab1.Rmd` * *Hint:* Follow along, because this will become your Lab 1 submission! 4. Click the *Knit HTML* button * After it is done, browse to the file location using the `Files` tab. What do you notice? * Click *Open in Browser* to view the full HTML --- ## R Markdown Headers The header of .Rmd files is YAML (YAML Ain't Markup Language) code 5. Change `title` to "Lab 1" 6. Change `author` to your name in quotes 7. Change `date` to the due date in quotes -- Congrats! You have a functional .Rmd that will soon be your Lab 1 submissions! --- ## R Markdown Syntax (Thanks to Charles Lanfear, UW Sociology, for this very concise summary) --- .pull-left[ ## Output **bold/strong emphasis** *italic/normal emphasis* .forcehead[Header] ## Subheader ### Subsubheader ] .pull-right[ ## Syntax <pre> **bold/strong emphasis** *italic/normal emphasis* # Header ## Subheader ### Subsubheader </pre> ] --- .pull-left[ ## Output 1. Ordered lists 1. Are real easy 1. Even with sublists 1. Or when lazy with numbering * Unordered lists * Are also real easy + Also even with sublists [URLs are trivial](http://www.uw.edu)  ] .pull-right[ ## Syntax <div style="width:400px;overflow:auto"> <pre> 1. Ordered lists 1. Are real easy 1. Even with sublists 1. Or when lazy with numbering * Unordered lists * Are also real easy + Also even with sublists [URLs are trivial](http://www.uw.edu)  </div> </pre> ] --- .pull-left[ ## Output You can put some math `\(y= \left( \frac{2}{3} \right)^2\)` right up in there. `$$\frac{1}{n} \sum_{i=1}^{n} x_i = \bar{x}_n$$` Or a sentence with `code-looking font`. Or a block of code: ``` y <- 1:5 z <- y^2 ``` ] .pull-right[ ## Syntax <div style="width:400px;overflow:auto"> <pre> You can put some math $y= \left(\frac{2}{3} \right)^2$ right up in there `$$\frac{1}{n} \sum_{i=1}^{n} x_i = \bar{x}_n$$` Or a sentence with `code-looking font`. Or a block of code: ``` y <- 1:5 z <- y^2 ``` </pre> ] </div> --- ## Helpful Links * [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) * [R Markdown Cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) --- ## R Code within R Markdown As you saw in Short Lab 1, we can run and execute R code within R Markdown. To do so encase your code as follows. ```{r, eval = TRUE, echo = TRUE} # Your code goes here! ``` You can click the green triangle in the corner to evaluate that code chunk to preview the results without compiling the entire document --- ## Useful Code Chunk Parameters Parameters go into the opening brackets `{r}` and are separated by commas. Here are some you might find useful (checkout the guide links above for more): * `echo=FALSE`: Hide R code but keep results * `eval=FALSE`: Do not execute the R code * `include=FALSE`: Hides all output (useful to load packages at the beginning of your document) * `cache=TRUE`: Stores the results of the chunk, and only re-runs if the chunk is changed. Useful for files that take a while to compile * `fig.height=5, fig.width=5`: modify the dimensions of any plots that are generated in the chunk (units are in inches) --- ## In-Line R Code You can also include and execute R code directly in the text of your .Rmd! For example, say we define a variable ```r x <- 7 ``` If I want to reference this variable in text, I can do so directly by writing using ticks and starting with r. So if I type: The variable I want to reference is `r x`. what will appear is: The variable I want to reference is 7. --- ## In-Line R Code * This allows you to easily see where your values came from! * This prevents any typos in translating coding results to text! * This allows you to modify your analysis without needing to copy and paste updated results into your text!