For this portion of Project 3, you are being asked to develop a well-documented, well-tested, and well-explained R package.
This R package should include the following functions we’ve written throughout the class:
my_t.test
my_lm
my_knn_cv
Follow the instruction on Lecture Slides 9 to set up the skeleton of your package (5 points for proper setup). Your package should include the following:
(10 points) Complete documentation for each of your functions and data. Each function must include @examples
in the documentation.
(15 points) A detailed vignette with examples and the use of all of these functions with the penguins data from the palmerpenguins
package. You must add and document the penguins
data to your own package and export it as the object my_penguins
(with proper credit in the documentation!). Specifically, the vignette should have the following parts:
A tutorial for my_t.test
testing the hypothesis that the mean body_mass_g
of Adelie penguins is equal to 4000 (where the alternative is that the mean is less than 4000). Carefully interpret the results using a p-value cut-off of \(\alpha = 0.05\).
A tutorial for my_lm
using flipper_length_mm
as the independent variable and body_mass_g
as the dependent variable. Carefully interpret the flipper_length_mm
coefficient describe the hypothesis test associated with the flipper_length_mm
coefficient, and carefully interpret the results the flipper_length_mm
hypothesis test using a p-value cut-off of \(\alpha = 0.05\).
A tutorial for my_knn_cv
using my_penguins
. Predict output class species
using covariates bill_length_mm
, bill_depth_mm
, flipper_length_mm
, and body_mass_g
.
Use \(5\)-fold cross validation (k_cv = 5
).
Iterate from k_nn
\(= 1,\ldots, 10\). For each value of k_nn
, record the training misclassification rate and the CV misclassification rate (output from your function).
State which model you would choose based on the training misclassification rates and which model you would choose based on the CV misclassification rates. Discuss which model you would choose in practice and why.
Submit answers to the following questions as a .pdf file to Canvas. A brief paragraph for each will be enough.
What was the hardest assignment for you and why? Did you learn anything from the experience? Is there anything you would change about the assignment to help your learning?
What are two areas in which you think you did well this quarter? What are two areas in which you could have improved?
Is there anything you are glad we covered this quarter? Is there anything you wish had been covered in this course? What programming and computing skills would you like to learn in the future?