Multivariate Normality Tests

Multivariate normality is a foundational assumption for methods such as MANOVA, principal component analysis, and linear discriminant analysis. In this tutorial, we’ll apply and interpret the five main tests implemented in the MVN package, guided by the recommendations of Korkmaz et al. (2014).

Tip

The mvn() function also supports the use of bootstrap resampling for estimating p-values. By setting bootstrap = TRUE, you can obtain empirically derived p-values for the Mardia, Henze–Zirkler, Henze-Wagner, Royston and Doornik-Hansen tests. This is particularly useful when working with small sample sizes or when the assumptions underlying asymptotic distributions may not hold. The default setting is FALSE.


Example Data

We’ll use two numeric variables from the built-in iris dataset:

df <- iris[1:50, 1:2]
head(df)
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
5          5.0         3.6
6          5.4         3.9

Henze–Zirkler Test

The Henze–Zirkler test is recommended for its balanced control of Type I error and good power properties under moderate sample sizes; a p-value below 0.05 suggests departure from multivariate normality based on a log-normalized distance metric.

hz_result <- mvn(data = df, mvn_test = "hz")
summary(hz_result, select = "mvn")
── Multivariate Normality Test Results ─────────────────────────────────────────
           Test Statistic p.value     Method      MVN
1 Henze-Zirkler     0.286   0.915 asymptotic ✓ Normal

Henze–Wagner Test

The Henze–Wagner test is a statistical method for assessing multivariate normality. It is based on a non-negative functional distance that quantifies the discrepancy between the empirical and theoretical characteristic functions. The test statistic follows an approximate asymptotic distribution under the null hypothesis of multivariate normality. It is particularly suited for datasets where the number of variables p is less than the number of observations n (i.e., p<n). A p-value below 0.05 provides evidence against the assumption of multivariate normality.

hw_result <- mvn(data = df, mvn_test = "hw")
summary(hw_result, select = "mvn")
── Multivariate Normality Test Results ─────────────────────────────────────────
          Test Statistic p.value     Method      MVN
1 Henze-Wagner     0.136   0.939 asymptotic ✓ Normal

Royston Test

The Royston test aggregates transformed univariate Shapiro–Wilk statistics into a joint chi‐square test and is noted for reliable performance in small-to-moderate samples.

ro_result <- mvn(data = df, mvn_test = "royston")
summary(ro_result, select = "mvn")
── Multivariate Normality Test Results ─────────────────────────────────────────
     Test Statistic p.value     Method      MVN
1 Royston     2.698   0.245 asymptotic ✓ Normal

Mardia Test

Mardia’s skewness and kurtosis measures provide insight into specific aspects of distributional shape; examining both skewness and kurtosis p-values helps diagnose the nature of non-normality, though neither may alone signal departure when sample size is limited.

mardia_result <- mvn(data = df, mvn_test = "mardia")
summary(mardia_result, select = "mvn")
── Multivariate Normality Test Results ─────────────────────────────────────────
             Test Statistic p.value     Method      MVN
1 Mardia Skewness     0.760   0.944 asymptotic ✓ Normal
2 Mardia Kurtosis     0.093   0.926 asymptotic ✓ Normal

Doornik–Hansen Test

The Doornik–Hansen approach applies transformations to approximate a chi‐square distribution of combined moment statistics.

dh_result <- mvn(data = df, mvn_test = "doornik_hansen")
summary(dh_result, select = "mvn")
── Multivariate Normality Test Results ─────────────────────────────────────────
            Test Statistic df p.value     Method          MVN
1 Doornik-Hansen     11.57  4   0.021 asymptotic ✗ Not normal

Energy Test

Based on a nonparametric energy distance metric, this test is sensitive to any general deviation from normality. It provides a robust alternative especially when data exhibit heavy tails or multimodality.

energy_result <- mvn(data = df, mvn_test = "energy")
summary(energy_result, select = "mvn")
── Multivariate Normality Test Results ─────────────────────────────────────────
         Test Statistic p.value    Method      MVN
1 E-Statistic     0.527   0.783 bootstrap ✓ Normal

Tip

No single test is universally best; Korkmaz et al. (2014) recommend combining multiple numerical tests with graphical diagnostics to make a more reliable decision on multivariate normality.


References

Korkmaz S, Goksuluk D, Zararsiz G. MVN: An R Package for Assessing Multivariate Normality. The R Journal. 2014;6(2):151–162. URL: https://journal.r-project.org/archive/2014-2/korkmaz-goksuluk-zararsiz.pdf