Why Ice Bath Studies Disagree

Ice bath research doesn’t disagree because the science is broken. It disagrees because the studies are answering different questions. Here’s how to read the contradictions — and make better decisions because of them.

Macro view of a dark water surface where two ripple patterns collide, representing why ice bath studies can reach different conclusions.

One meta-analysis of 55 randomised controlled trials concludes that 10–15 minutes at 5–10°C produces the best ice bath protocol for neuromuscular recovery. A separate meta-analysis, drawing on a comparably large evidence base, finds that water temperature and immersion duration are “rarely exposure moderators” — dose barely matters. A well-designed RCT from 2014 tested multiple durations and temperatures and found minimal differences between any of them. All three are peer-reviewed. All three are credible. And if you’ve read them in sequence, you’ve probably concluded either that the science is broken or that nobody knows anything.

Neither is true. The studies disagree because they are answering different questions about different outcomes, in different populations, against different comparisons. Once you see that, the confusion resolves into something more useful than any single best ice bath protocol could offer: a framework for evaluating every recommendation you encounter.

What follows explains why the evidence contradicts itself, what each contradiction actually tells you, and how to make good decisions in the space between competing findings.

Three studies, three verdicts, one subject

To understand why the cold water immersion literature looks like a mess, examine three landmark findings side by side.

Glasgow 2014: “Dose doesn’t matter.” Paul Glasgow, a sports rehabilitation researcher at Ulster University, and colleagues randomised 50 participants across five conditions — varying CWI temperature, duration, and dosage — after eccentric exercise designed to induce muscle soreness. Their conclusion was blunt: altering dose parameters had minimal effect on delayed-onset muscle soreness outcomes. Taken at face value, this suggests you could sit in any cold water for any length of time and get roughly the same result. But 50 people across five groups means 10 per arm — a thin dataset from which to detect subtle dose-response differences.

Moore 2023: “Dose rarely moderates.” A decade later, Shona Halson, a recovery physiologist and co-author of multiple CWI meta-analyses, and her colleagues published a large meta-regression comparing CWI against other recovery modalities: passive rest, active recovery, and compression garments. Their finding echoed Glasgow: water temperature and exposure duration were rarely significant moderators of CWI’s advantage over other techniques.

Two independent research groups, years apart, reaching the same conclusion. Case closed? Not quite. Moore 2023 was asking a specific question: does CWI beat other recovery methods, and does the dose change how much it wins by? That is not the same as asking whether different doses produce different results within CWI itself.

Wang 2025: “Dose clearly matters — depending on what you measure.” The 2025 network meta-analysis by Wang, Wang and Pan, exercise scientists writing in Frontiers in Physiology, pooled 55 RCTs and compared CWI protocols against control conditions rather than against other modalities. Their findings were strikingly specific. For creatine kinase reduction and neuromuscular recovery, 10–15 minutes at cold therapy temperature thresholds of 5–10°C was most effective. For perceived muscle soreness, 10–15 minutes at a warmer 11–15°C performed best.

Dose matters, but the optimal dose shifts depending on which outcome you care about. Wang’s pooled dataset had the statistical power to detect differences that Glasgow’s 50-participant study could not. And because Wang compared different CWI doses against a passive control rather than against compression or active rest, the comparison was calibrated to reveal dose-response relationships that Moore’s design was never looking for.

Machado and colleagues in 2016 had already found a dose-response for soreness — 11–15°C for 11–15 minutes performing best, consistent with Wang — but also reported the effect size: roughly 5% improvement. Statistically significant. Practically modest. Whether that justifies a specific protocol choice depends on what you consider meaningful.

Three studies. Three conclusions. All defensible within their own frames. The disagreement isn’t a failure of science. It’s a consequence of asking different questions, with different tools, at different scales. And the pattern runs deeper than these three papers.

Why the evidence keeps disagreeing

They measure different outcomes

Outcome choice is the single biggest source of apparent disagreement. Muscle soreness is a subjective rating on a visual analogue scale. Creatine kinase is a blood marker of muscle damage. Countermovement jump height is a functional performance measure. A protocol that reduces soreness may not reduce CK levels, and a protocol that restores jump performance may do neither.

Wang 2025 made this explicit: the optimal dose for soreness reduction (11–15°C) was not the optimal dose for CK clearance (5–10°C). Any study that picks one endpoint and generalises from it will seem to conflict with studies that chose a different one.

Measurement timing compounds the problem. A 2025 review by Cain and colleagues found that CWI increases markers of inflammation acutely but reduces stress markers at the 12-hour mark. The same intervention, measured at two different time points, produces what looks like two contradictory findings about the same protocol.

They use different exercise stimuli — and different comparators

Cold water immersion after a marathon is not the same intervention as cold water immersion after a single bout of bicep curls, even if the water temperature is identical. The preceding exercise changes what the body is recovering from, which changes what CWI is being asked to do. A 2022 meta-analysis by Moore and colleagues identified exercise type as one of the key variables in CWI efficacy, and Yu and colleagues in 2026 found that effective CWI parameters shift depending on the exercise modality. Endurance exercise, resistance exercise, and team-sport simulation all create different recovery demands, so studies testing CWI after different exercise types can produce different protocol recommendations without either being wrong.

Comparator choice is subtler but just as consequential. When Moore 2023 found that dose “rarely moderates,” that finding was relative to a comparison between CWI and other recovery techniques: does 5°C CWI beat compression garments by more than 12°C CWI does? When Wang 2025 found clear dose-response effects, the comparison was between different CWI protocols and a passive control: does 5°C beat doing nothing by more than 12°C does? If CWI broadly outperforms passive rest regardless of dose, but the advantage over compression is similar across doses, both findings can be true simultaneously. The evidence isn’t contradicting itself. The studies are measuring different gaps.

Population and statistical design fill in the rest

Trained athletes and untrained university students do not respond to exercise or recovery in the same way. Much of the CWI literature relies on young, healthy, male university populations. Studies that include trained athletes tend to find smaller effects, partly because trained bodies are already more efficient at recovery and partly because a laboratory exercise protocol may not challenge them meaningfully. Female participants are nearly absent across the field, which means we have reason to expect different dose-response curves by sex but not enough data to map them.

On the statistical side, a 50-person RCT, a traditional meta-analysis, a meta-regression, and a network meta-analysis are all doing different work with different power and different assumptions. Network meta-analysis can compare protocols never directly tested against each other through indirect evidence chains — a powerful tool, but one whose conclusions depend on the comparability of included trials. When two statistical methods disagree, the method with more data and fewer assumptions usually deserves more weight, which is why Wang 2025’s 55-trial network meta-analysis currently represents the most informative single piece of evidence available.

The complication nobody wants to discuss

One finding in the CWI literature makes dose-response debates feel slightly beside the point, and most protocol articles ignore it entirely.

In 2014, James Broatch, an exercise physiologist at Victoria University, and colleagues ran a placebo-controlled trial in which participants either sat in cold water or sat in water they were told contained a “recovery-enhancing” thermoneutral additive. Participants in the placebo condition recovered as effectively as those in genuine CWI on several markers after high-intensity cycling.

What this implies is uncomfortable but important: if a substantial portion of CWI’s perceived benefits are psychologically mediated, dose-response studies may be mapping a perceptual curve as much as a physiological one. A person who believes colder water is more effective may report lower soreness from colder water not because vasoconstriction is greater, but because expectation modulates perception.

This doesn’t mean CWI is “just placebo.” The physiological mechanisms — reduced nerve conduction velocity, hydrostatic pressure effects from ice bath depth, altered blood flow — are well-documented. But in any study relying on subjective outcomes like perceived soreness, the boundary between physiological effect and expectation effect is blurry. Almost no CWI studies include a genuine placebo condition, which means the entire dose-response evidence base has this uncertainty embedded in it.

Broatch’s finding doesn’t invalidate the research. It does explain why dose-response relationships for subjective outcomes tend to be small and hard to replicate, and why objective markers like creatine kinase sometimes tell a different story from the participant’s pain rating.

What the real world looks like

If the research can’t agree on a single dose, you’d expect real-world practice to be messy. It is.

Across our installations, we see the same variability the evidence describes, shaped by factors the research rarely accounts for. The W Hotel runs its cold plunge at 6°C. Latitude Zero, a surf resort in the Mentawais, sits between 6°C and 8°C. NXT Fit, a performance training facility, runs separate pools at 4°C and 10°C for different use cases. Rekoop, a recovery studio, ranges from 4°C to 10°C depending on the client. Our app data mirrors this spread: roughly 30% of sessions are logged at 3–4°C, with the remaining 70% at warmer temperatures.

That gap is telling. The largest meta-analyses converge on 11–15°C as the best-supported range for soreness reduction. Most hospitality and performance facilities run meaningfully colder. That gap isn’t ignorance — it’s a difference in what’s being optimised. A boutique recovery studio is optimising for perceived intensity and experiential distinctiveness as much as for creatine kinase clearance. A surf resort is offering sensation, ritual, a moment of contrast after warm equatorial water. What counts as best depends entirely on what you’re optimising for, and that’s the research dilemma expressed in commercial terms.

Making decisions when the evidence disagrees

No single answer is coming from this evidence base, because the question is more complex than any single-number protocol allows. But the decision process can be sharper than the data.

Start with the largest, most recent synthesis. Wang 2025 pooled 55 trials and disaggregated by outcome. For soreness: 10–15 minutes at 11–15°C. For muscle damage markers and neuromuscular performance: 10–15 minutes at 5–10°C. These are starting points, not commandments.

Then define what you’re actually optimising for. A team-sport athlete recovering between tournament matches may care more about perceived soreness than CK levels. A strength athlete in a hypertrophy block may want to avoid aggressive CWI altogether because of its potential to blunt adaptation. A hotel guest seeking sensory contrast after a sauna has an experiential goal that no dose-response curve was designed to measure. Almost every protocol recommendation skips this step: asking what outcome matters to you before looking at which protocol best serves it.

For specific temperature and duration ranges backed by the current evidence, with context for each goal, [our protocol guide breaks those down in detail](/ice-bath-protocol). You now understand why those numbers come with caveats — and why that understanding is worth more than the numbers themselves.

That contested dose isn’t a problem waiting for a better study. It’s what happens when a simple question meets a complex system. Anyone who grasps why the evidence disagrees will make a better decision about how cold water affects their nervous system than someone who memorises the “right” number from whoever sounded most confident.