Subset Analysis

Subset analysis lets you assess multivariate normality separately in each level of a factor. This is useful when data structure or experimental design requires group-wise validation.


Example Data

library(MVN)

# Remove the 4th column, keep Species as grouping
iris_df <- iris[-4]
head(iris_df)
  Sepal.Length Sepal.Width Petal.Length Species
1          5.1         3.5          1.4  setosa
2          4.9         3.0          1.4  setosa
3          4.7         3.2          1.3  setosa
4          4.6         3.1          1.5  setosa
5          5.0         3.6          1.4  setosa
6          5.4         3.9          1.7  setosa

1. Running MVN by Group

Specify the subset argument in mvn():

# Henze–Zirkler test by species
subset_res <- mvn(
  data       = iris_df,
  subset     = "Species",
  mvn_test   = "hz"
)

2. Viewing Group-Specific Results

Extract multivariate normality for each group:

summary(subset_res, select = "mvn")
       Group          Test Statistic p.value      MVN
1     setosa Henze-Zirkler     0.524   0.831 ✓ Normal
2 versicolor Henze-Zirkler     0.714   0.326 ✓ Normal
3  virginica Henze-Zirkler     0.726   0.299 ✓ Normal

All species groups exhibit multivariate normality (p > 0.05). Group-wise analysis ensures that assumptions hold within each category.


3. Group-Wise Diagnostics

You can also generate diagnostic plots for each subset by subset_res object to plot():

# Mahalanobis Q–Q plots for each species
plot(
  subset_res,
  diagnostic = "multivariate",
  type       = "qq"
)


References

Korkmaz S, Goksuluk D, Zararsiz G. MVN: An R Package for Assessing Multivariate Normality. The R Journal. 2014;6(2):151–162. URL: https://journal.r-project.org/archive/2014-2/korkmaz-goksuluk-zararsiz.pdf