Lab 3

If you collaborated with anyone, you must include “Collaborated with: FIRSTNAME LASTNAME” at the top of your lab!

Part 1. t-test (10 points)

Write a function my_t.test() that performs a one sample t-test in R. Your code should not use the t.test function in it.

Your function should have the following parameters:

x a numeric vector of data.
alternative a character string specifying the alternative hypothesis. This should only accept "two.sided", "less", or "greater". Otherwise, your function should throw an informative error.
mu a number indicating the null hypothesis value of the mean.

Your function should return a list with elements:

test_stat: the numeric test statistic.
df: the degrees of freedom.
alternative: the value of the parameter alternative.
p_val: the numeric p-value.

You should use the following information:

To get the standard error in a one sample t-test, take the standard deviation (sd()) of your input and divide it by the square root of the sample size.
Use pt() to get the area under the curve for a t-distribution. Be sure to use the parameter lower.tail!
The degrees of freedom for this test (df within pt()) is equal to the sample size - 1.
The p-value should never be less than 0 or greater than 1.

(Hint: Be careful about whether you use lower.tail = TRUE or lower.tail = FALSE in the two-sided test! One safe option is to use lower.tail = FALSE with the absolute value (abs()) of your test statistic.)

To prove it works, load the data below (description at https://www.openintro.org/data/index.php?data=helium). THe air column represents the distance traveled by an air-filled ball whereas the helium column is the same for a helium-filled ball. Use this data for a two-sided t-test to test the hypothesis that the population mean of helium is different than 20 using both my_t.test() and t.test(). The results should match. Do the same for a one-sided t-test testing that the population mean of helium is greater than 20.

helium_data <- read.csv("https://www.openintro.org/data/csv/helium.csv")

Part 2. Linear model (10 points)

Write a function my_lm() that fits a linear model in R.

Your function should have the following parameters:

formula: a formula class object, similar to lm().
data: input data frame.

Your function should return a table similar to the coefficent table from summary() with rows for each coefficient (including the (Intercept)!) and columns for the Estimate, Std. Error, t value, and Pr(>|t|). There should be row and column names.

You may find the following information helpful:

Use model.matrix() to extract the model matrix \(\mathbf{X}\). It takes as input parameters a formula and data.
Use model.response() to extract the model response \(\mathbf{Y}\). It takes as input a model frame object.
Use model.frame() to extract a model frame object. It takes as input parameters a formula and data.
You can solve for linear regression coefficients using the formula \[ \hat{\beta} = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\mathbf{Y}. \]
Recall matrix operations solve(), t(), and %*%.
The degrees of freedom (df) you should use for your tests are equal to your sample size minus the number of covariates (including intercept!).
You can estimate \(\sigma^2\) using the formula \[ \hat{\sigma}^2 = \sum_i \dfrac{(Y_i - \mathbf{X}_i \beta)^2}{\text{df}}. \]
You can estimate the standard error for your coefficients using the formula \[ se(\hat{\beta}_j) = \sqrt{\left[\hat{\sigma}^2(\mathbf{X}^T\mathbf{X})^{-1}\right]_{[j,j]}}. \] Note that within the brackets of the right-hand side is a matrix and \([j,j]\) represents the element of the matrix located at the \(j\)^th row and \(j\)^th column indices.
Use diag() to extract diagonal components from a matrix.
Pr(>|t|) comes from the two-sided t test. \[ \begin{align} H_0: \beta_j &= 0\\ H_a: \beta_j &\neq 0 \end{align} \]
Use pt() to get the area under the curve for a t-distribution. Because the distribution is symmetric, you can multiply this value by \(2\) to get the two-sided test output.

(Hint: As before, be careful about whether you use lower.tail = TRUE or lower.tail = FALSE! One safe option is to use lower.tail = FALSE with the absolute value (abs()) of your test statistic. Make sure you never end up with a p-value greater than 1!)

To prove it works, Use the code below to read in data from a survey of 55 Duke University students about their study habits and grades. You can read more about this data at https://www.openintro.org/data/index.php?data=gpa.

grades_data <- read.csv("https://www.openintro.org/data/csv/gpa.csv")

Use this data to regress gpa upon studyweek using both my_lm() and lm(). The results of my_lm() should match the coefficient table from the summary of your lm() output.

Lab 3

STAT 302

Due Date Here

Part 1. t-test (10 points)

Part 2. Linear model (10 points)