Data transformations can help correct skewness and stabilize variance. MVN supports simple transforms (log, square root, square) via the transform argument, as well as optimal Box–Cox transformation. Additionally, MVN provides support for a broader class of power transformations through the power_family argument. These include the standard Box–Cox, its extension that accommodates non-positive values (Box–Cox with negatives allowed), and the Yeo–Johnson transformation. Each method automatically estimates optimal lambda values for transformation and can be used in conjunction with multivariate normality tests to evaluate improvements in distributional properties.
Example Data and Baseline Assessment
# Load package and example datalibrary(MVN)df <- iris[1:50, 1:4]# Baseline Henze–Zirkler test without transformationresult <-mvn(data = df, mvn_test ="hz")summary(result, select ="mvn")
── Multivariate Normality Test Results ─────────────────────────────────────────
Test Statistic p.value Method MVN
1 Henze-Zirkler 0.949 0.05 asymptotic ✗ Not normal
The untransformed data are marginally on the threshold of normality (p = 0.05), indicating potential departure from multivariate normality.
1. Basic Transformations
Basic transformations can be used to address issues like skewness and heteroscedasticity before assessing multivariate normality. The mvn() function allows simple transformations using the transform argument. Supported options include natural logarithm (“log”), square root (“sqrt”), and squaring the values (“square”). These methods are quick to apply and can help improve the suitability of data for multivariate analysis, although their effectiveness may vary depending on the distribution of each variable.
1.1 Log Transformation
The log transformation applies the natural logarithm to all variables, which is especially useful when dealing with right-skewed data.
── Multivariate Normality Test Results ─────────────────────────────────────────
Test Statistic p.value Method MVN
1 Henze-Zirkler 0.789 0.379 asymptotic ✓ Normal
In this case, applying the log transformation yields a p-value of 0.379 from the Henze-Zirkler test, indicating that the transformed data conform well to multivariate normality.
1.2 Square-root Transformation
The square-root transformation offers a less aggressive correction than the log and is suitable for moderately skewed data.
── Multivariate Normality Test Results ─────────────────────────────────────────
Test Statistic p.value Method MVN
1 Henze-Zirkler 0.829 0.253 asymptotic ✓ Normal
After applying this transformation, the p-value is 0.253, which still supports the assumption of multivariate normality. This method retains more of the original data structure while improving distributional properties.
1.3 Square Transformation
The square transformation raises each value to the power of two. While mathematically valid, this operation tends to amplify skewness and outliers.
── Multivariate Normality Test Results ─────────────────────────────────────────
Test Statistic p.value Method MVN
1 Henze-Zirkler 1.278 <0.001 asymptotic ✗ Not normal
In the given example, squaring the data results in a p-value less than 0.001, indicating strong deviation from normality. As such, squaring is generally not advisable when the goal is to achieve or maintain multivariate normality.
2. Power Transformation
Power transformations are commonly applied in multivariate normality testing to stabilize variance and make the data more normally distributed. The mvn() function supports various power transformation families, allowing automatic selection of optimal lambda values to transform each variable accordingly. Below are three supported methods—Box-Cox, Box-Cox with negatives allowed, and Yeo-Johnson—each applied with Henze-Zirkler’s test to assess multivariate normality.
2.1. Box-Cox Transformation
The Box–Cox transformation is applied first. It estimates the optimal lambda values to transform the variables, assuming all data are strictly positive.
The Box–Cox with negatives allowed transformation extends the original Box-Cox method by enabling transformations on variables that may include non-positive values.
── Multivariate Normality Test Results ─────────────────────────────────────────
Test Statistic p.value Method MVN
1 Henze-Zirkler 0.802 0.334 asymptotic ✓ Normal
When applied to the same data, it results in a p-value of 0.334, similarly suggesting a good fit to multivariate normality. Like the standard Box-Cox, it also automatically estimates and applies suitable lambda values.
2.3. Yeo-Johnson Transformation
The Yeo-Johnson transformation is suitable for both positive and negative data and does not require shifting the variable range.
── Multivariate Normality Test Results ─────────────────────────────────────────
Test Statistic p.value Method MVN
1 Henze-Zirkler 1.925 <0.001 asymptotic ✗ Not normal
In this example, however, it yields a p-value less than 0.001, indicating strong evidence against multivariate normality even after transformation. This suggests that, for this specific dataset, Yeo-Johnson may not be as effective as the Box-Cox-based alternatives in achieving multivariate normality.
References
Korkmaz S, Goksuluk D, Zararsiz G. MVN: An R Package for Assessing Multivariate Normality. The R Journal. 2014;6(2):151–162. URL: https://journal.r-project.org/archive/2014-2/korkmaz-goksuluk-zararsiz.pdf