07. Regression Metrics, Scale, and Interpretation
Motivation
Regression results are often reported as a single number:
- RMSE = 3.38
- MAE = 4.80
- R² = 0.737
These numbers look precise, but without context they are frequently meaningless.
This tutorial explains what regression metrics do and do not tell us, and how guarded resampling changes their interpretation.
Metrics encode assumptions
Regression metrics are not interchangeable.
Each metric encodes an implicit assumption about error structure.
- RMSE penalizes large errors more heavily
- MAE treats all errors linearly
- R² measures variance explained relative to a baseline
Choosing a metric is a modeling decision, not a reporting formality.
RMSE: Scale Dependence
RMSE is expressed in the same units as the outcome.
For medv, this means:
- RMSE is measured in thousands of dollars,
- an RMSE of 3.38 may be acceptable or unacceptable depending on context,
RMSE values cannot be compared across datasets unless the outcome scale is identical.
MAE vs RMSE
MAE and RMSE often move together, but they answer different questions.
- MAE reflects typical error magnitude.
- RMSE emphasizes large deviations.
If RMSE is much larger than MAE, large errors dominate performance.
This distinction matters in clinical and economic settings where extreme errors may carry disproportionate cost.
Why “Better RMSE” Is Often Meaningless
Statements such as:
Model A has lower RMSE than Model B
are incomplete unless accompanied by:
- variability estimates,
- the outcome scale,
- resampling details,
- practical relevance.
Guarded resampling ensures valid estimation, but interpretation remains a human responsibility.
What fastml Guarantees — and What It Does Not
fastml guarantees that:
- metrics are computed on unseen data,
- preprocessing does not leak information,
- model comparisons are fair.
fastml does not guarantee that:
- metric differences are meaningful,
- one model should be preferred in practice,
- regression performance implies causal validity.
Responsible Regression Reporting
A defensible regression report should include:
- the outcome definition and units,
- the chosen metric(s) and rationale,
- the resampling scheme,
- aggregated performance estimates,
- variability across folds,
- a discussion of practical relevance.
Single-number summaries are insufficient.
Summary
Regression metrics are scale-dependent.
RMSE, MAE, and R² answer different questions.
Aggregated metrics hide variability.
Guarded resampling ensures validity, not interpretability.
Interpretation requires domain context and judgment.
R² and Its Limitations
R² is frequently misinterpreted.
An R² of 0.8 does not mean that:
- the model is “80% accurate”,
- predictions are close in absolute terms,
- the model is clinically useful.
R² depends on outcome variance.
High R² can coexist with large absolute errors.
What comes next
08. Penalized Regression and High-Dimensional Data
Penalized regression and high-dimensional settings.