06. Regression and Continuous Outcomes

Motivation

Regression problems involve continuous outcomes rather than class labels.

This difference is not cosmetic.
It changes how performance is measured, how preprocessing affects results, and how evaluation errors arise.

Many applied workflows treat regression as a simpler case than classification.
In practice, regression is often more fragile under improper evaluation.

Why regression requires separate treatment

In regression:

  • outcomes have a natural scale
  • errors are measured in outcome units
  • preprocessing directly affects metric values
  • comparisons across datasets are rarely meaningful

Metrics such as RMSE and MAE are scale-dependent.
A “good” RMSE cannot be interpreted without context.

Leakage risks in regression workflows

Regression pipelines are particularly sensitive to leakage because:

  • scaling affects both predictors and outcome interpretation
  • imputation alters the outcome–predictor relationship
  • feature engineering often uses outcome-adjacent information
  • residual structure is easily distorted

As in earlier tutorials, leakage arises from when information is learned, not from model choice.

Data

We use a medical dataset with a continuous outcome.

library(fastml)
library(mlbench)
library(dplyr)

data(BostonHousing, package = "mlbench")

reg_data <- BostonHousing %>%
  select(medv, everything())

head(reg_data)
  medv    crim zn indus chas   nox    rm  age    dis rad tax ptratio      b
1 24.0 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90
2 21.6 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90
3 34.7 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83
4 33.4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63
5 36.2 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90
6 28.7 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12
  lstat
1  4.98
2  9.14
3  4.03
4  2.94
5  5.33
6  5.21

The outcome medv represents median home value and is treated here as a continuous response for regression illustration.

Defining a regression task

Regression is specified explicitly in fastml.

fit <- fastml(
  data       = reg_data,
  label      = "medv",
  algorithms = c("linear_reg", "rand_forest"),
)

This declaration ensures that:

  • regression-specific loss functions are used,
  • metrics are computed on continuous predictions,
  • evaluation respects guarded resampling.

Regression Metrics

Unlike classification, regression metrics quantify the magnitude of prediction error.

Typical metrics include:

  • RMSE (root mean squared error),
  • MAE (mean absolute error),
  • R² (variance explained).
fit$resampling_results$`linear_reg (lm)`$aggregated
# A tibble: 3 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 mae     standard       3.38 
2 rmse    standard       4.80 
3 rsq     standard       0.737

Each metric answers a different question and encodes different assumptions about error structure.

Scale Dependence and Interpretation

RMSE is expressed in the same units as the outcome.

As a result:

  • RMSE values cannot be compared across datasets,
  • small numerical differences may be practically irrelevant,
  • preprocessing choices directly affect metric magnitude.

This makes contextual interpretation essential.

Fold-Level Variability

As in previous tutorials, aggregated metrics hide variability.

fit$resampling_results$`linear_reg (lm)`$folds
# A tibble: 30 × 4
   fold  .metric .estimator .estimate
   <chr> <chr>   <chr>          <dbl>
 1 1     rmse    standard       4.17 
 2 1     rsq     standard       0.757
 3 1     mae     standard       3.09 
 4 2     rmse    standard       3.41 
 5 2     rsq     standard       0.822
 6 2     mae     standard       2.51 
 7 3     rmse    standard       5.96 
 8 3     rsq     standard       0.713
 9 3     mae     standard       3.66 
10 4     rmse    standard       4.59 
# ℹ 20 more rows

Fold-level results reveal:

  • instability due to limited sample size,
  • sensitivity to data partitioning,
  • overlap between competing models.

Regression performance often varies more across folds than classification accuracy.

Model Comparison in Regression

Comparing regression models requires:

  • identical resampling splits,
  • identical preprocessing,
  • identical outcome scaling.

fastml enforces these conditions automatically.

Observed performance differences therefore reflect model behavior rather than evaluation artifacts.

What fastml Does Not Claim

fastml does not:

  • identify a “true” model,
  • guarantee optimal predictive accuracy,
  • justify causal interpretation of coefficients,
  • standardize outcomes for cross-study comparison.

Regression metrics describe predictive error under a specific evaluation design, nothing more.

Responsible Reporting

For regression analyses, a defensible report should include:

  • the outcome definition and scale,
  • the chosen evaluation metrics,
  • the resampling scheme,
  • aggregated performance estimates,
  • variability across folds.

Reporting a single RMSE without context is insufficient.

Summary

Regression introduces scale-dependent evaluation.
Metrics are sensitive to preprocessing and leakage.
Guarded resampling is essential for valid regression evaluation.
Variability across folds is informative.

Interpretation requires domain context.

What comes next

07. Regression Metrics, Scale, and Interpretation
Regression metrics, scale, and why RMSE is often misleading.