05. Survival Analysis and Time-to-Event Outcomes

Motivation

Time-to-event outcomes introduce a new layer of complexity that is absent in standard classification and regression.

In survival analysis:

outcomes are partially observed
follow-up times differ across individuals
censoring is informative for evaluation
many common ML workflows fail silently

These properties make survival analysis especially vulnerable to evaluation errors and leakage.

Why survival analysis is different

In earlier tutorials, each observation contributed a fully observed outcome.

In survival settings, each observation consists of:

a time-to-event
an event indicator (event vs censoring)

Ignoring this structure leads to invalid evaluation.

In particular, treating survival outcomes as binary labels discards timing information and introduces bias.

Leakage risks unique to survival analysis

Survival workflows introduce additional leakage pathways:

using full follow-up information during preprocessing
computing time-dependent features globally
evaluating survival models without respecting censoring
mixing risk sets across resampling splits

These errors often go undetected and produce optimistic results.

The principles from C1–C3 apply here with greater force.

Data

We use a classical survival dataset with right censoring.

library(fastml)
library(censored)
library(survival)
library(dplyr)

data(lung, package = "survival")

surv_data <- lung %>%
  filter(!is.na(time), !is.na(status)) %>%
  mutate(
    status = ifelse(status == 2, 1, 0),  # 1 = event, 0 = censored
    sex    = factor(sex, labels = c("male", "female"))
  )

head(surv_data)

  inst time status age  sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1    3  306      1  74 male       1       90       100     1175      NA
2    3  455      1  68 male       0       90        90     1225      15
3    3 1010      0  56 male       0       90        90       NA      15
4    5  210      1  57 male       1       90        60     1150      11
5    1  883      1  60 male       0      100        90       NA       0
6   12 1022      0  74 male       1       50        80      513       0

This dataset includes:

survival time (time),
event indicator (status),
baseline clinical covariates.

Defining a survival task

In fastml, survival analysis is specified explicitly.

fit <- fastml(
  data       = surv_data,
  label      = c("time", "status"),
  algorithms = c("cox_ph", "survreg"),
  impute_method = "remove"
)

This declaration ensures that:

censoring is respected,
models are trained under survival-specific assumptions,
evaluation uses survival-appropriate metrics.

Guarded Resampling for Survival Outcomes

Under guarded resampling:

preprocessing is learned within each training split,
risk sets are isolated by fold,
evaluation uses only information available at assessment time.

No observation contributes future information to training.

This is essential for valid survival evaluation.

Survival-Specific Metrics

Survival models cannot be evaluated using accuracy or ROC AUC.

Instead, fastml reports metrics such as:

concordance index (C-index),
time-dependent performance summaries,
integrated Brier score (when applicable).

fit$resampling_results$`cox_ph (survival)`$aggregated

# A tibble: 6 × 2
  .metric   .estimate
  <chr>         <dbl>
1 brier_t1      0.276
2 brier_t2      0.306
3 c_index       0.612
4 ibs           0.226
5 rmst_diff    68.6  
6 uno_c         0.591

These metrics account for censoring and timing.

Variability across folds

As with earlier tutorials, fold-level variability matters.

fit$resampling_results$`cox_ph (survival)`$folds

# A tibble: 60 × 3
   fold  .metric   .estimate
   <chr> <chr>         <dbl>
 1 1     c_index       0.730
 2 1     uno_c         0.724
 3 1     ibs           0.187
 4 1     rmst_diff   263.   
 5 1     brier_t1      0.231
 6 1     brier_t2      0.169
 7 2     c_index       0.415
 8 2     uno_c         0.274
 9 2     ibs           0.286
10 2     rmst_diff  -145.   
# ℹ 50 more rows

Survival metrics often exhibit higher variability due to:

censoring patterns,
limited numbers of events,
heterogeneous follow-up times.

This variability is informative, not pathological.

What fastml Does Not Allow Here

Consistent with C4. What fastml Deliberately Does Not Allow, fastml prevents users from:

converting survival outcomes to binary labels,
evaluating survival models with classification metrics,
preprocessing time-to-event data globally,
detaching evaluation from resampling.

These restrictions are necessary for validity.

Interpretation Cautions

Survival metrics describe ranking or calibration under censoring, not absolute risk.

They should not be interpreted as:

probabilities of survival at specific times,
causal effects,
guarantees of clinical utility.

Guarded resampling ensures valid evaluation, not clinical validity.

Summary

Survival outcomes require task-specific evaluation.
Censoring introduces new leakage risks.
Guarded resampling is essential in survival analysis.
Variability is expected and informative.

fastml is designed to encourage survival-appropriate workflows by constraining how outcomes, preprocessing, resampling, and evaluation are coordinated under supported execution paths.

What comes next

06. Regression and Continuous Outcomes
Regression and continuous outcomes under guarded resampling.