Course Objectives
-
When faced with data, how should we organize our reporting of results and interpret the results of others
-
A handful of mathematical models describe an astonishing range of measurable phenomena. We think of probability theory as our method for articulating how likely it is that observed data would be generated from theoretical assumptions.
-
Our probability models make predictions that are determined by one or two parameters (often called the mean, the variance, the shape, or the rate of the probability distribution family). More than reporting a most likely parameter value, we would like to report intervals that express our uncertainty.
-
There is often some default (or null) hypothesis about a given model. “The mean is zero.” “There is no difference between these groups.” Hypothesis testing gives us a framework to reject (or fail to reject) these default claims.
The objective of this course is to provide a practical overview of the statistical methods and models most likely to be encountered by scientists in practical research applications. Specific course topics include discrete and continuous probability distributions; sampling methods and descriptive statistics; the Central Limit Theorem and its applications; estimation methods; confidence intervals; hypothesis testing; and linear regression.
Mathematics Skill Checks
Calculus is listed as a prerequisite for this course, but students are reasonably unclear on what skills they should practice throughout the course. In this space I will provide some “skill check” sheets (and placeholders for future sheets) that you can use to make sure you feel comfortable with necessary material.
Calculus: Integration through geometry and area
Calculus: Graphical understanding of derivatives and integrals
Set Theory: Notation
Set Theory and inequalities
Probability: First principles
Statistics Strategy Sheets
While the homework is useful for being introduced to concepts one at a time, the reality of doing homework is that because of the section the problem appears in, you know what technique to use. Over half of the challenge in statistics is identifying what model or technique to use; and the textbook does not do a good job of challenging you in this regard.
What follows are a set of “strategy sheets” and flow charts. Feel free to study these, or even better, make your own! If you are not spending time challenging yourself in the use of decision tree thinking, then you will not be prepared for statistical thinking (and the tests!).
StatStrat-DiscreteRV
Modeling with discrete random variables. A two-page flowchart for deciding among the Binomial, Hypergeometric, Geometric, Negative Binomial, and Poisson distributions for your data model.StatStrat-Arrival
If your data consists of arrival times or number of arrivals in a given window of time, you are likely going to want to choose among the Exponential, Gamma, and Poisson distributions.StatStrat-BigPicture
Point estimation, confidence intervals, hypothesis testing: how do these all fit together?StatStrat-AhaAlpha
Why is it that, in constructing confidence intervals, we went out of our way to make tables that report area to the right of a number? And why did we use the same greek letter $\alpha$ to signify $100(1-\alpha)$% confidence and also the significance level of a hypothesis test? And why is this also about false positives? These two pages try to pull it together for your “Aha!” moment when it comes to $\alpha$.StatStrat-z-t-chi
When should you use a $z$-score, versus a $t$-score, versus a chi-squared distribution? Here’s a little cheat sheet to walk you through it.
Python Notebooks
Visualization is key to probabilistic and statistical thinking. Below are a set of Python notebooks that can help you walk through and visualize certain topics. Click on the links and in the Google Colab interface, you can run the program by pressing the play button in the code blocks. If you want to make changes to the code (and you should!), then create your own blank notebook and copy and paste from these examples.
Discrete RVs, when do we use one distribution approximation another?
Binomial approximation of the hypergeometric (when the population size is large). Poisson approximation of the binomial (when the success probability is rare). And what is the connection between the geometric and negative binomial distributions?
Practice Problems
As the exams come up, you will need to do some more focused practice in working on problems where you do not know in advance which technique will be necessary. Feel free to use these problems. They may not perfectly align with the exams each semester so be sure to pay attention to your instructor’s list of materials for each exam.
Important note: I will NOT provide solutions to these. This is part of the challenge! Work with partners and reach out to fellow students on discussion boards. The debate that stems from two people thinking they are right but ending up with difference answers is some of most fertile ground for learning. Providing solutions would deprive you of this great opportunity.