At TMC14 last week, one huge theme in many of the sessions and "My Favorites" was getting students to talk about math. The example that stood out to me was Chris Luzniak (@PIspeak)'s first session on debating in math class. He focused on structured arguments. When students raised their hand to share, he asked them to make a claim (statement) and then give a warrant (reason). He builds a culture of listening, clearly stating a view, and backing up that view with sound logic. Chris uses mostly informal debate (more like a classroom discussion), but for the purposes of this new unit I think I want it to be more of a prepared debate.
I am trying to envision a unit where hypothesis testing is taught in multiple ways and students, as their unit project, are left in teams to defend their preferred method for making decisions with data. I see two big questions that should be very approachable by my students:. The first is whether to use simulation or a probability model to obtain a p-value:
- The AP Statistics curriculum, along with nearly every other stats course, uses a probability model (the normal curve) as the basis for inference. Because of the Central Limit Theorem, even the craziest of distributions have a very normal looking sampling distribution if the sample size is large enough (often >30). This method for doing inference is clean and analytical (could be worked out by hand). The downside is that, like all models, there are some assumptions that must be made that are not always valid, especially with smaller samples.
- The preferred method of many programmers (including me) is to create a very simplistic model in software and repeat it thousands of times to create a sampling distribution. This method is called bootstrapping. Even though programming may not be accessible to most students, tools with clean interfaces such as StatKey make it easy to perform the calculations. This method for inference is very easy to understand and explain (drawing values from a hat), and since computers are fast, it is just repeated many thousands of times. It also does not require any assumptions about the size or shape of the sample's distribution.
- If you understand Bayesian hypothesis testing, you could have kids argue for the need to use likelihood ratios based on meaningful prior probabilities. Despite spending half of the day researching it, I really don't understand it well enough to explain it to anyone else, so this option is out for me. If you want to learn, one awesome option is a free book one of my college computing professors wrote called "Think Bayes".
The second question is why we need p-values at all:
- The traditional hypothesis test starts with a default claim and seeks evidence that suggests it to be unlikely. The statistician starts with a threshold level, such as 5%, and tries to show that there is less than that probability of finding the data they found when he assumes the null hypothesis to be true. The actual probability, the p-value, is often reported along with the decision to reject or fail to reject the null. This is seen in nearly all science publications.
- Another viewpoint held by a minority group of stats folks (including me) is to not use p-values at all. Instead, a correctly computed confidence interval could be compared to a null hypothesis to see if the null mean is captured by the interval or not. For example, a soda pop can of Coke (covered all my regions there...) claims to have 12oz of liquid inside. I am trying to prove that it is less than 12oz in a one-sided test at the threshold of 5%. Instead of finding a p-value, I could create a confidence interval around my data. Since I want each tail of the interval to have 5%, the interval would be the middle 90%. If my 90% confidence interval was 11.5oz to 11.9oz, then I could conclude that I was being ripped off. The reason I like this approach, despite its apparent complexity, is that it doesn't just tell you that you are getting ripped off -- it tells you how much pop you can expect to get in the can.
The point is not for the teacher to ride their high horse about their answers to each question. The fact is that there is a group of intelligent people who support all of these answers, so they could all be defended in front of the class. In order to make a good defense, each group needs to understand their opponent's responses well enough to counter their argument. The normal-curve based model will be fairly non-intuitive to my students since I plan to introduce them to inference using bootstrapped confidence intervals in a unit on infographics. They will need to understand enough about the Central Limit Theorem to argue why the normal curve is a good model and know the assumptions that are made to use it. They will want to demonstrate how easy calculation can be with a TI-83 and how easy it is to explain with a sketch of the normal curve. The simulator groups will need to understand when every simulation yields a different answer and why increasing the number of bootstrapped samples converges the answer to a more precise value. They need to understand why they don't need to check any assumptions (other than having a SRS) like the other groups need to do. There are a similarly large number of key ideas that need to be well understood about the second question to properly debate it as well.
The big graded tasks in the unit would probably be giving and receiving feedback in mini-debates between paired teams, a final debate between groups in front of the entire class, and a write-up after the debates picking a personal stance and communicating all that they learned in the process.
The unit would be kicked off with the big question of "how can you use data to make decisions"? From there, we would generate a list of topics we needed to know more about in order to get ready for the debate. Some of these big ideas I would lecture on for the class, while less conceptual ideas would be left for students to watch in my videos. I would also give short quizzes to make sure students were grasping the basics of hypothesis testing and the different approaches so I could target struggling students and their groups for quicker intervention.
So...I need help. I'm sure there are a ton of holes in this concept or things I did not clarify that really matter. I would love to not only turn this into an awesome unit for my classes, but for everyone who would want to use it, so please poke holes as ruthlessly as if it were your own curriculum. Thanks!