Creating Fake Data Sets To Explore Hypotheses

1. Expected pattern for a given hypothesis:

If a given signaling molecule is upregulating the transcription of another protein in cells, you would expect a greater amount of mRNA for the protein in cells treated with the signaling molecule.

Data set represents relative mRNA levels detected by rtPCR for protein X in cell populations.

2 & 3. Creating Dataframe with Data that supports hypothesis

groupWith <- rnorm(n=15, mean = 0.72 , sd = 0.1)
groupWithOut <- rnorm(n=15, mean = 0.40, sd = 0.12)

hist(groupWith)

hist(groupWithOut)

DataFrame <- data.frame(1:15, groupWith, groupWithOut)

4. Creating a Random Data Frame (for ANOVA)

Followed ANOVA example used in class during Lecture 12

n_group <- 2
n_name <- c("with", "without")
n_size <- c(15,15)
n_mean <- c(0.72,0.40)
n_sd <- c(0.1,0.12)

ID <- 1:sum(n_size)

proportions <- c(rnorm(n=n_size[1],mean=n_mean[1],sd=n_sd[1]),
             rnorm(n=n_size[2],mean=n_mean[2],sd=n_sd[2]))

trt_group <- rep(n_name,n_size)

ano_data <- data.frame(ID,trt_group,proportions)
head(ano_data)
##   ID trt_group proportions
## 1  1      with   0.5691867
## 2  2      with   0.7670769
## 3  3      with   0.6565821
## 4  4      with   0.6461398
## 5  5      with   0.5931066
## 6  6      with   0.6211351

Statistical Analysis & GG Plot

ano_model <- aov(proportions~trt_group,data=ano_data)
print(ano_model)
## Call:
##    aov(formula = proportions ~ trt_group, data = ano_data)
## 
## Terms:
##                 trt_group Residuals
## Sum of Squares  0.5820412 0.3950596
## Deg. of Freedom         1        28
## 
## Residual standard error: 0.1187825
## Estimated effects may be unbalanced
z <- summary(ano_model)
print(z)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## trt_group    1 0.5820  0.5820   41.25 5.91e-07 ***
## Residuals   28 0.3951  0.0141                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
flat_out <- unlist(z)
ano_stats <- list(f_ratio <- unlist(z)[7],
                  f_pval <- unlist(z)[9])
print(ano_stats)
## [[1]]
## F value1 
## 41.25239 
## 
## [[2]]
##      Pr(>F)1 
## 5.912164e-07
library(ggplot2)
ano_plot <- ggplot(ano_data) +
            aes(x=trt_group,y=proportions) +
            geom_boxplot()
print(ano_plot)

6. Adjusting Mean and Effect sizes for Data Set

With a sample size of 15, any effect size smaller than .12 won’t reliably produce a significant result.

7. Adjusting Sample Sizes:

With an effect size of .32 between groups, you would need a sample size of at least 5 to consistently get a significant result.