Part 1. R Package Development and Documentation (25 points)

Instructions

For this portion of Project 3, you are being asked to develop a well-documented, well-tested, and well-explained R package.

This R package should include the following functions we’ve written throughout the class:

  • my_t.test
  • my_lm
  • my_knn_cv

Follow the instruction on Lecture Slides 9 to set up the skeleton of your package (5 points for proper setup). Your package should include the following:

  1. (10 points) Complete documentation for each of your functions and data. Each function must include @examples in the documentation.

  2. (15 points) A detailed vignette with examples and the use of all of these functions with the penguins data from the palmerpenguins package. You must add and document the penguins data to your own package and export it as the object my_penguins (with proper credit in the documentation!). Specifically, the vignette should have the following parts:

    1. A tutorial for my_t.test testing the hypothesis that the mean body_mass_g of Adelie penguins is equal to 4000 (where the alternative is that the mean is less than 4000). Carefully interpret the results using a p-value cut-off of \(\alpha = 0.05\).

    2. A tutorial for my_lm using flipper_length_mm as the independent variable and body_mass_g as the dependent variable. Carefully interpret the flipper_length_mm coefficient describe the hypothesis test associated with the flipper_length_mm coefficient, and carefully interpret the results the flipper_length_mm hypothesis test using a p-value cut-off of \(\alpha = 0.05\).

    3. A tutorial for my_knn_cv using my_penguins. Predict output class species using covariates bill_length_mm, bill_depth_mm, flipper_length_mm, and body_mass_g.

      • Use \(5\)-fold cross validation (k_cv = 5).

      • Iterate from k_nn\(= 1,\ldots, 10\). For each value of k_nn, record the training misclassification rate and the CV misclassification rate (output from your function).

      • State which model you would choose based on the training misclassification rates and which model you would choose based on the CV misclassification rates. Discuss which model you would choose in practice and why.

Notes

  • Your package directory should be submitted as a .zip file to Canvas.
  • All code and documentation should follow the style guidelines outlined in class.

Part 2. Self-Assessment and Reflection (5 points)

Submit answers to the following questions as a .pdf file to Canvas. A brief paragraph for each will be enough.

  1. What was the hardest assignment for you and why? Did you learn anything from the experience? Is there anything you would change about the assignment to help your learning?

  2. What are two areas in which you think you did well this quarter? What are two areas in which you could have improved?

  3. Is there anything you are glad we covered this quarter? Is there anything you wish had been covered in this course? What programming and computing skills would you like to learn in the future?