If you collaborated with anyone, you must include “Collaborated with: FIRSTNAME LASTNAME” at the top of your lab!
Write a function my_t.test() that performs a one sample t-test in R. Your code should not use the t.test function in it.
Your function should have the following parameters:
x a numeric vector of data.alternative a character string specifying the alternative hypothesis. This should only accept "two.sided", "less", or "greater". Otherwise, your function should throw an informative error.mu a number indicating the null hypothesis value of the mean.Your function should return a list with elements:
test_stat: the numeric test statistic.df: the degrees of freedom.alternative: the value of the parameter alternative.p_val: the numeric p-value.You should use the following information:
sd()) of your input and divide it by the square root of the sample size.pt() to get the area under the curve for a t-distribution. Be sure to use the parameter lower.tail!df within pt()) is equal to the sample size - 1.(Hint: Be careful about whether you use lower.tail = TRUE or lower.tail = FALSE in the two-sided test! One safe option is to use lower.tail = FALSE with the absolute value (abs()) of your test statistic.)
To prove it works, load the data below (description at https://www.openintro.org/data/index.php?data=helium). THe air column represents the distance traveled by an air-filled ball whereas the helium column is the same for a helium-filled ball. Use this data for a two-sided t-test to test the hypothesis that the population mean of helium is different than 20 using both my_t.test() and t.test(). The results should match. Do the same for a one-sided t-test testing that the population mean of helium is greater than 20.
helium_data <- read.csv("https://www.openintro.org/data/csv/helium.csv")
Write a function my_lm() that fits a linear model in R.
Your function should have the following parameters:
formula: a formula class object, similar to lm().data: input data frame.Your function should return a table similar to the coefficent table from summary() with rows for each coefficient (including the (Intercept)!) and columns for the Estimate, Std. Error, t value, and Pr(>|t|). There should be row and column names.
You may find the following information helpful:
model.matrix() to extract the model matrix \(\mathbf{X}\). It takes as input parameters a formula and data.model.response() to extract the model response \(\mathbf{Y}\). It takes as input a model frame object.model.frame() to extract a model frame object. It takes as input parameters a formula and data.solve(), t(), and %*%.diag() to extract diagonal components from a matrix.Pr(>|t|) comes from the two-sided t test. \[
\begin{align}
H_0: \beta_j &= 0\\
H_a: \beta_j &\neq 0
\end{align}
\]pt() to get the area under the curve for a t-distribution. Because the distribution is symmetric, you can multiply this value by \(2\) to get the two-sided test output.(Hint: As before, be careful about whether you use lower.tail = TRUE or lower.tail = FALSE! One safe option is to use lower.tail = FALSE with the absolute value (abs()) of your test statistic. Make sure you never end up with a p-value greater than 1!)
To prove it works, Use the code below to read in data from a survey of 55 Duke University students about their study habits and grades. You can read more about this data at https://www.openintro.org/data/index.php?data=gpa.
grades_data <- read.csv("https://www.openintro.org/data/csv/gpa.csv")
Use this data to regress gpa upon studyweek using both my_lm() and lm(). The results of my_lm() should match the coefficient table from the summary of your lm() output.