Excuse this possibly silly question but:
Why is optional stopping in sample collection a problem, given the optional stopping theorem? (Or is it still considered a problem?)
@kinozhao A couple of cites on the Bayesian hypothesis testing side:
@devezer Thanks! I guess my question was mostly: why is it a problem for frequentism at all? I know there are simulation results showing that optional stopping leads to inflated type 1 error rate (under NHST), but I was confused as to how the optional stopping theorem doesn't mean that this result is impossible.
After much, much discussion yesterday, I think the answer is that an NHST analysis on IID samples isn't a martingale. But I haven't quite figured out how. (My knowledge on martingales is fuzzy)
@kinozhao I don't know much about martingales either but the way I see it is that frequentist error control depends on certain assumptions. When conditional decisions are made, they change the sampling distribution and valid inference needs to be made under the conditional model. Unconditional inference will inflate errors. The section on Claim 2 in this paper is relevant I believe. https://royalsocietypublishing.org/doi/10.1098/rsos.200805
My problem is that the optional stopping theorem suggests that you can't win a fair bet with betting strategies that take into consideration all past information (but none future). E.g., the doubling strategy where you double your bet whenever you lose won't work.
I'm having a hard time seeing how optional stopping in study design isn't in a similar situation. Assuming IID, the researcher can't cheat the system without access to future information.
(fwiw I think my confusion is that IID doesn't guarantee the bet is fair, though I still don't yet quite understand how)
@kinozhao That's interesting and I'll need to think about this formulation some more to wrap my head around it. Will share if I get anywhere :)
@kinozhao I guess this may not directly address your question but may help with intuition a bit more. He (Christian Robert) has probably written more on this so that's something to look for. https://xianblog.wordpress.com/?s=Stopping+rule&submit=Search
@kinozhao gah wrong link. I meant the second one. https://xianblog.wordpress.com/2014/05/09/stopping-rule-impact/
@kinozhao @devezer is the error in how you are mapping “win” and “lose” onto the NHST question? if I took “win” to mean “true positive” wouldn’t your theorem imply precisely that I *can’t* “win”. Or did I totally misunderstand the question? (for what it’s worth, to me, the obvious problem had also always been that the distribution for the test statistic no longer applies because what you are generating is conditional errors
@UlrikeHahn @kinozhao I think with Ukrike's comment, I have a better sense of what you're asking and I agree that it's still true that you can't win if winning is defined as accurately predicting a future occurrence. The reason that it becomes a problem is that using the wrong sampling distribution, you can pretend to have found an effect based on a certain cutoff and cheat the academic system rather than the "bet" itself. Because you still won't have won that game.
@UlrikeHahn @devezer No. I’m assuming the samples are drawn from a null distribution, which at 0.05 alpha should mean 5% false positives. This is how the one simulation study I found coded it.
“Winning” is entirely arbitrary. The phrase “can’t win in the long term” just means won’t deviate from the expectation (because the fair price of the gamble is defined by the expectation). So, in this case, we can define “winning” as committing exactly 5% false positives.
@kinozhao @devezer so, I think you would need to be able to show that the “sampling distribution” stays the same (via simulation or analytically) with optional stopping: I’m pretty sure it can’t be, because the point about optional stopping is that it was used selectively, ie *dependent on outcome*. So, ‘running a few more people until result turned significant’ reflects a distribution where I sample randomly, look at result, and then, conditional on outcome, sample again if need be.
@UlrikeHahn @devezer Yeah it's a simulation I'm thinking of. It's a paper titled "When decision heuristics and science collide" by Yu et al. In the simulation they're holding the distribution constant.
I wonder if we are speaking at cross purposes: in my coin analogy, the probability at each toss remains .5 (and in that sense the distribution remains constant), but your actual probability of obtaining heads is different.
I'm trying to say that the effective distribution for whatever NHST statistic you are looking at will no longer match the assumed one...
@kinozhao see https://lakens.github.io/statistical_inferences/errorcontrol.html#optionalstopping, and then to do it right, which is called sequential analysis, see https://lakens.github.io/statistical_inferences/sequential.html
@kinozhao For work on optional stopping theorem and martingales specifically, see Peter Grunwald's work on safe testing https://arxiv.org/abs/1906.07801
@kinozhao so optional stopping is a problem when it is not done well. This is regrettably how most scientists do it in practice.
Thanks! After much, much discussion yesterday, I think my confusion is that I assumed an NHST analysis on IID samples is a martingale (because it's IID, and I thought it would be analogous to CLT-style Monte Carlo resampling). My knowledge on martingales is pretty fuzzy because the concept of expectation is extremely difficult for me. Thanks for the references!
@kinozhao You might also like the work by the awesome Judith Ter Schure (I am biased as a co-supervisor). She has some accessible blog posts explaining martingales as betting.
Fediphilosophy is a place for current researchers (including graduate students) and teachers whose work engage with philosophy to network and relax.
@kinozhao Shouldn't be a problem in Bayesian stats. I guess problems may arise due to misapplications? Like when it's indexed on statistical significance in a frequentist setup, for example. https://statmodeling.stat.columbia.edu/2018/05/02/continuously-increased-number-animals-statistical-significance-reached-support-conclusions-think-not-bad-actually/