Motivation
Time-to-event outcomes introduce a new layer of complexity that is absent in standard classification and regression.
In survival analysis:
outcomes are partially observed
follow-up times differ across individuals
censoring is informative for evaluation
many common ML workflows fail silently
These properties make survival analysis especially vulnerable to evaluation errors and leakage .
Why survival analysis is different
In earlier tutorials, each observation contributed a fully observed outcome.
In survival settings, each observation consists of:
a time-to-event
an event indicator (event vs censoring)
Ignoring this structure leads to invalid evaluation.
In particular, treating survival outcomes as binary labels discards timing information and introduces bias.
Leakage risks unique to survival analysis
Survival workflows introduce additional leakage pathways:
using full follow-up information during preprocessing
computing time-dependent features globally
evaluating survival models without respecting censoring
mixing risk sets across resampling splits
These errors often go undetected and produce optimistic results.
The principles from C1–C3 apply here with greater force.
Data
We use a classical survival dataset with right censoring.
library (fastml)
library (censored)
library (survival)
library (dplyr)
data (lung, package = "survival" )
surv_data <- lung %>%
filter (! is.na (time), ! is.na (status)) %>%
mutate (
status = ifelse (status == 2 , 1 , 0 ), # 1 = event, 0 = censored
sex = factor (sex, labels = c ("male" , "female" ))
)
head (surv_data)
inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1 3 306 1 74 male 1 90 100 1175 NA
2 3 455 1 68 male 0 90 90 1225 15
3 3 1010 0 56 male 0 90 90 NA 15
4 5 210 1 57 male 1 90 60 1150 11
5 1 883 1 60 male 0 100 90 NA 0
6 12 1022 0 74 male 1 50 80 513 0
This dataset includes:
survival time (time),
event indicator (status),
baseline clinical covariates.
Defining a survival task
In fastml, survival analysis is specified explicitly.
fit <- fastml (
data = surv_data,
label = c ("time" , "status" ),
algorithms = c ("cox_ph" , "survreg" ),
impute_method = "remove"
)
This declaration ensures that:
censoring is respected,
models are trained under survival-specific assumptions,
evaluation uses survival-appropriate metrics.
Guarded Resampling for Survival Outcomes
Under guarded resampling:
preprocessing is learned within each training split,
risk sets are isolated by fold,
evaluation uses only information available at assessment time.
No observation contributes future information to training.
This is essential for valid survival evaluation.
Survival-Specific Metrics
Survival models cannot be evaluated using accuracy or ROC AUC.
Instead, fastml reports metrics such as:
concordance index (C-index),
time-dependent performance summaries,
integrated Brier score (when applicable).
fit$ resampling_results$ ` cox_ph (survival) ` $ aggregated
# A tibble: 6 × 2
.metric .estimate
<chr> <dbl>
1 brier_t1 0.276
2 brier_t2 0.306
3 c_index 0.612
4 ibs 0.226
5 rmst_diff 68.6
6 uno_c 0.591
These metrics account for censoring and timing.
Variability across folds
As with earlier tutorials, fold-level variability matters.
fit$ resampling_results$ ` cox_ph (survival) ` $ folds
# A tibble: 60 × 3
fold .metric .estimate
<chr> <chr> <dbl>
1 1 c_index 0.730
2 1 uno_c 0.724
3 1 ibs 0.187
4 1 rmst_diff 263.
5 1 brier_t1 0.231
6 1 brier_t2 0.169
7 2 c_index 0.415
8 2 uno_c 0.274
9 2 ibs 0.286
10 2 rmst_diff -145.
# ℹ 50 more rows
Survival metrics often exhibit higher variability due to:
censoring patterns,
limited numbers of events,
heterogeneous follow-up times.
This variability is informative, not pathological.
What fastml Does Not Allow Here
Consistent with C4. What fastml Deliberately Does Not Allow , fastml prevents users from:
converting survival outcomes to binary labels,
evaluating survival models with classification metrics,
preprocessing time-to-event data globally,
detaching evaluation from resampling.
These restrictions are necessary for validity.
Interpretation Cautions
Survival metrics describe ranking or calibration under censoring, not absolute risk.
They should not be interpreted as:
probabilities of survival at specific times,
causal effects,
guarantees of clinical utility.
Guarded resampling ensures valid evaluation, not clinical validity.
Summary
Survival outcomes require task-specific evaluation.
Censoring introduces new leakage risks.
Guarded resampling is essential in survival analysis.
Variability is expected and informative.
fastml is designed to encourage survival-appropriate workflows by constraining how outcomes, preprocessing, resampling, and evaluation are coordinated under supported execution paths.