AI and fundamental stats

In my upcoming introductory stat class I am planning to include "critical evaluation of AI analyses" as a core learning objective. Starting to explore how ChatGPT 4.0 handles some of my favorite questions and it turns out that it does rather well.


I flipped a coin 500 times and 300 times it came up heads. Is this substantial evidence that it is not a fair coin?

Visualization of the problem I posed to ChatGPT and that I introduce on the first day of class each year.

ChatGPT 4.0: To determine whether this is substantial evidence that the coin is not fair, we could conduct a hypothesis test.

The null hypothesis (H0) is that the coin is fair, meaning it has a 0.5 probability of landing heads. The alternative hypothesis (H1) is that the coin is not fair, meaning it has a probability of landing heads that is not 0.5.

If we model this as a binomial distribution, the expected value for 500 flips of a fair coin is 0.5*500 = 250 heads. You observed 300 heads, which is more than expected under H0.

However, to decide if this difference is statistically significant, you'd want to calculate a p-value, which is the probability of observing a result as extreme as you did (or more so) under the null hypothesis.

In general, a p-value of less than 0.05 is considered statistically significant in many fields.

In this case, you would find the standard deviation, or sqrt(n_p_(1-p)), where n is the number of flips (500) and p is the probability of heads (0.5). This comes out to sqrt(500_0.5_0.5) = 11.18.

The z-score, or (observed - expected)/standard deviation, is (300 - 250)/11.18 = 4.47. This z-score is quite large, indicating that your result is highly unlikely to have occurred by chance if the coin were fair. The corresponding p-value for a z-score of 4.47 is extremely small, way less than 0.05, providing strong evidence to reject the null hypothesis that the coin is fair.

Please note this calculation assumes a normal approximation of the binomial distribution, which is reasonable here due to the large number of flips. Also note that statistical significance doesn't imply practical significance.

Suppose that 10% of a population is infected with a disease. I have a test that is 90% accurate. If I test positive for the disease, what is the probability that I actually have the disease?

ChatGPT 4.0: To answer this question, we can use Bayes' theorem, which helps us update our beliefs based on new evidence. Here, the test being "90% accurate" means that it has both a 90% sensitivity (it correctly identifies 90% of infected individuals) and a 90% specificity (it correctly identifies 90% of non-infected individuals).

Let's denote:

  • P(D) = Probability of having the disease = 0.1 (10% of the population is infected)
  • P(~D) = Probability of not having the disease = 0.9
  • P(T|D) = Probability of testing positive given the disease = 0.9 (sensitivity)
  • P(T|~D) = Probability of testing positive given no disease = 0.1 (1 - specificity)

We want to find P(D|T), the probability of having the disease given a positive test. By Bayes' theorem:

P(D|T) = [P(T|D) P(D)] / [P(T|D) P(D) + P(T|~D) * P(~D)]

Plugging in the values:

P(D|T) = [0.9 0.1] / [(0.9 0.1) + (0.1 * 0.9)]

P(D|T) = 0.09 / (0.09 + 0.09)

P(D|T) = 0.09 / 0.18

P(D|T) = 0.5

So, if you test positive for the disease, the probability that you actually have the disease is 50%.

How can a positive result on a 90% accurate test imply that I only have a 50% chance of actually having the disease? The importance of understanding how general screening (with no corroborating evidence of positivity) skews the interpretation of test results.

Next
Next

The Great Gatsby Curve