Research Notes: Reproducibility as a Habit

Reproducibility problems are mostly not tools problems. They're naming and memory problems. A future reviewer — or you, in six months — should be able to run your analysis without talking to you, and most of the cost of making that true is paid in small habits before the interesting work starts.

Name things as if you're reading them in six months

File names should carry intent: 2026-01_triage-cohort_v2.csv beats data_final_FINAL.csv. Column names should be legible to a collaborator without a lookup table. Variable names in scripts should say what they are, not what they were called in the original spreadsheet.

The return on naming is entirely in the future. That's why it feels optional in the moment. It's not.

Log what you decided and why

Each analysis gets a one-page README: the question, the data sources, the inclusion/exclusion decisions you made, and the version of anything that changes (code, dataset, definitions). When a reviewer asks “why did you drop those cases?” the answer is a line in the README, not a memory.

Decision logs feel like overhead until the first time one saves you a day of archaeology. After that, they feel cheap.

None of this is a methodology; it's a habit. Small, unglamorous, and compounding.