Comparing machine learning models is a central goal of applied analysis.
However, many model comparisons are invalid—not because the models are wrong, but because they are evaluated under different data partitions, different preprocessing, or different sources of randomness.
As discussed in C3 — Guarded Resampling, fair comparison requires more than using the same metric.
What “fair comparison” means
A comparison is fair only if:
all models see the same resampling splits
preprocessing is learned independently within each split
metrics are computed on identical assessment sets
differences reflect models, not data partitioning
fastml enforces these conditions along its guarded resampling execution path, ensuring fair comparison when models are evaluated within a single fastml call.
Data
We use a complete-case version of the BreastCancer dataset after removing the identifier column and excluding observations with missing values. This complete-case strategy is used for simplicity in this tutorial; in real applications, missingness should be handled explicitly (e.g., via guarded imputation) when it is nontrivial or informative.
No hyperparameter tuning is performed.
Default engine settings are used intentionally. This isolates differences due to model structure rather than tuning effort, which is essential for a fair baseline comparison.
fit <-fastml(data = breastCancer,label ="Class",algorithms =c("logistic_reg", "rand_forest", "xgboost"))
Shared resampling plan
All models are evaluated under a single, shared resampling plan.