Comments on Fail Early and Often: A half-baked unit idea for hypothesis testing

Thanks David. As you look at your stats curriculu...

2014-08-07T23:35:44.631-07:00

Thanks David. As you look at your stats curriculum for this year, feel free to look at how I organize the course (http://failearlyandoften.blogspot.com/2014/07/how-i-organize-statistics.html) or my actual class resources (mrpethan.com). I hope they can spark some ideas even if the actual content isn't helpful for you.

As for the accessibility curve for students getting into any deep comparison or analysis, I agree with you. Instead of being a barrier, it sounds like a late-night coding project -- if I could convert the key components of Tim's program into JavaScript and get it online so students could play with parameters in the browser (and hopefully not crash their computers), they could benefit from 80% of the learning with 20% of the time and skills.

Thank you also for the link to Chris's video -- I'm not sure if I would want a stand-up informal debate like the video demonstrated, a more formal debate activity, or just a written justification comparing and contrasting methods.

Your blog post evaluating bootstrap intervals is a...

2014-08-07T23:19:50.447-07:00

Your blog post evaluating bootstrap intervals is awesome -- I just commented with some more thoughts. Specifically relating to this, I struggle with how the randomization test for a single mean actually works. From playing with StatKey, my best guess was that it created a sampling distribution using resampling with replacement, and then shifted that distribution so the mean lined up with the null hypothesis. I hope I'm wrong because I don't get why this is a good thing to do (other than the fact that we do it in a similar way with traditional inference). If you can explain why I'm wrong or why being this way is okay, that would help me a lot. All of the two-sample randomization tests make sense, but the one-sample stuff is just weird to me.

I just found your blog through the TMC14 website (...

2014-08-05T17:00:59.032-07:00

I just found your blog through the TMC14 website (I went to TMC13 but haven't been too active recently in the MTBoS) and I feel like I could not have found it at a better time! I’m currently in the process of revamping my stats course in a similar direction. I went that direction a couple of years ago using StatKey and Lock5, but the execution was flawed enough that I went back to the old way; now I think I’m more ready for it. So if you want to bounce ideas around on those topics, hit me up!

Anyway, I really like the idea of a debate, but one of my concerns would be that many of the general high level arguments for and against the different methods may not be accessible. For example, the examination of bootstrap CIs that bestcase linked to above would be a nontrivial coding exercise for many kids I think. Regardless, I think it is an EXCELLENT idea to have students defend not just their conclusion, but their choice of method. For example, when comparing class heart rates before and after running up and down the stairs (always a popular activity), I used to have them construct a CI for difference in means, but I’m sure the activity would be improved by giving them a choice of CI vs HT, simulation vs not, and defending that choice.

Anyway, Chris probably already showed this to you, but if not, I think it’s helpful to see even a short video of his debate protocols in the classroom. I went to a PD he ran in NYC and the classroom footage made the ideas much more concrete. http://www.pbslearningmedia.org/resource/mtc13.pd.math.deb/encouraging-debate/

For some reason I do testing—using randomization—b...

2014-08-04T11:48:55.098-07:00

For some reason I do testing—using randomization—before doing the bootstrap. But I actually get to teach stats again in the Spring and I will think about your order. As to the bootstrap itself, and whether it "works," I've done a little experimenting. Very little. It's not much. But I posted it here: http://wp.me/pWRZj-jx.

The upshot is that with small samples and the three simple source distributions I looked at, you can't make good probability statements about whether a bootstrap interval includes the population statistic. That doesn't actually bother me, but it might bother you! Larger samples (N > 50, say) no problem.

Also, a terminology thing that may not be an issue: In computing and simulation, doing things a jillion times, "bootstrap" the word is broad, covering all sorts of using randomness and Monte Carlo techniques. In stats, I think we use "bootstrap" more narrowly to refer to the procedure where you sample n times with replacement from your sample in order to get a sample statistic (usually the mean) that you collect into a sampling distribution, with the goal of getting an interval estimate—as you describe—and NOT a P-value. A hypothesis-testing procedure (e.g., scrambling group assignments and looking at difference of means, and doing that a jillion times) where you get P-values isn't called a "bootstrap," though: that's a randomization (or sometimes "permutation") test.

Great post! I think I have some useful comments I&...

2014-08-04T06:58:39.740-07:00

Great post! I think I have some useful comments I'm too rushed (and hungry) to make right now, but I gotta say THANKS A MILLION!!! for the link to Think Bayes! The first few pages tell me I have to read the rest ASAP.

Thanks Bob. Great idea to post to AP list -- I wi...

2014-08-03T00:47:18.429-07:00

Thanks Bob. Great idea to post to AP list -- I will have to figure out how to do that, but I can't imagine it's too hard. I also looked up Ruth -- she actually teaches it the same way I did last year, so that shouldn't be hard to incorporate :).

As for having to teach p-values anew later, I agree. I guess I should clarify that I would have students not only learn one method, but that they would learn all of them and, in a debate setting, need to formally defend their method as the best method. I will have to figure out more about how the AP readers decide what makes a well constructed (but unusual) response, but I would want to assess their ability to present a conclusion in each format with all of them checking out as valid.

So much going on in this post. I would suggest you...

2014-08-02T06:16:11.905-07:00

So much going on in this post. I would suggest you post it to the AP Stats message board as well, and I am confident you will receive many measured responses.

The "big idea" for students in Stats is to reach a defensible conclusion through data. As you note, this can be done in many ways. As a reader, I see how many students have been taught to approach testing. A majority use a "canned" hypothesis test approach: memorized lines, hokey acronyms, without clear understanding of what is happening. But the fun in reading papers is seeing breakthroughs in pedagogy evident on the paper; well-contructed responses which demonstrate infusion of stats into the context. I continue to use the tradition p-value in my classroom, but have increased the expectation for clear communication of the meaning of p-value, and have also worked to keep CI's and P-value approaches from feeling like different worlds.

I agree with the confidence interval approach, as it provides information on an estimate, as well as some basis for a hypothesis test conclusion. The problem here is that we would then need to teach P-values as new when we get to scenarios which don't lend themselves to CI's, like Chi-Squared. Ruth Carver has done a lot of work with simulation tests and swears by them. Not sure how much she has available online, but try a Google search for her.