reference – MVN

Function Reference

`mvn()`

Purpose: Conduct multivariate normality testing, descriptive statistics, outlier detection, and transformations in one wrapper.

Usage:

mvn(
  data,
  subset = NULL,
  mvn_test = "hz",
  use_population = TRUE,
  tol = 1e-25,
  alpha = 0.05,
  scale = FALSE,
  descriptives = TRUE,
  transform = "none",
  R = 1000,
  univariate_test = "AD",
  multivariate_outlier_method = "none",
  box_cox_transform = FALSE,
  box_cox_transform_type = "optimal",
  show_new_data = FALSE,
  tidy = TRUE
)

data: Data frame or matrix of numeric variables.
subset: (Optional) Grouping variable name for subset analyses.
mvn_test: One of "hz", "royston", "mardia", "doornik_hansen", "energy".
use_population: Logical; if TRUE, use population covariance matrix (default is TRUE).
tol: Numeric tolerance for matrix inversion via solve() (default 1e-25).
alpha: Significance level for ARW outlier cutoff if multivariate_outlier_method = "adj" (default 0.05).
scale: Logical; if TRUE, standardizes the variables before analysis.
descriptives: Logical; compute descriptive statistics if TRUE (default TRUE).
transform: One of "none", "log", "sqrt", or "square". Applies marginal transformation before analysis.
R: Number of bootstrap replicates for the "energy" test. Default is 1000.
univariate_test: One of "SW", "CVM", "Lillie", "SF", "AD". Default is "AD".
multivariate_outlier_method: One of "none", "quan", "adj".
box_cox_transform: Logical; if TRUE, applies Box-Cox transformation (default FALSE).
box_cox_transform_type: Either "optimal" or "rounded" lambda for Box-Cox (default "optimal").
show_new_data: Logical; if TRUE, include data with outliers removed (default FALSE).
tidy: Logical; if TRUE, returns tidy-format results with a Group column (default TRUE).

`summary.mvn()`

Purpose: Provide a structured summary of results from an object of class mvn, including multivariate and univariate test results, descriptive statistics, and outliers (if applicable).

Usage:

summary(object, select = c("mvn", "univariate", "descriptives", "outliers", "new_data"))

object: An object of class mvn, as returned by the mvn() function.
select: Character vector specifying which components to display. Must include one or more of "mvn", "univariate", "descriptives", "outliers", or "new_data". Defaults to all.
…: Additional arguments (currently unused).

`plot.mvn()`

Purpose: Generate diagnostic plots for objects of class mvn, including multivariate Q-Q plots, kernel density plots (3D or contour), univariate plots (Q-Q, histograms, boxplots), and multivariate outlier plots.

Usage:

plot(x, diagnostic = c("multivariate", "univariate", "outlier"), type = NULL, interactive = FALSE)

x: An object of class mvn, as returned by the mvn() function.
…: Additional arguments passed to internal plotting functions:
- diagnostic: Type of diagnostic plot to display — one of "multivariate", "univariate", "outlier".
- type: Specific plot type (e.g., "qq", "boxplot", "persp").
- interactive: Logical; if TRUE, uses interactive plotting with plotly (only for univariate plots).

`hz()`

Purpose: Perform Henze-Zirkler’s test for multivariate normality using a log-normal approximation of the test statistic.

Usage:

hz(data, use_population = TRUE, tol = 1e-25)

data: A numeric data frame or matrix where rows are observations and columns are variables.
use_population: Logical; if TRUE, uses population covariance matrix. Default is TRUE.
tol: Tolerance value for matrix inversion (solve()); default is 1e-25.

`mardia()`

Purpose: Perform Mardia’s skewness and kurtosis tests for assessing multivariate normality.

Usage:

mardia(data, use_population = TRUE, tol = 1e-25)

data: A numeric matrix or data frame with observations in rows and variables in columns.
use_population: Logical; if TRUE, uses population covariance matrix. Default is TRUE.
tol: Tolerance value used during matrix inversion with solve(). Default is 1e-25.

`royston()`

Purpose: Perform Royston’s multivariate normality test by combining univariate Shapiro-Wilk or Shapiro-Francia statistics and adjusting for variable correlations.

Usage:

royston(data, tol = 1e-25)

data: A numeric matrix or data frame with observations in rows and variables in columns.
tol: Numeric tolerance used for matrix inversion via solve(). Default is 1e-25.

`doornik_hansen()`

Purpose: Perform the Doornik-Hansen omnibus test for multivariate normality using transformed data to combine skewness and kurtosis measures.

Usage:

doornik_hansen(data)

data: A numeric matrix or data frame with observations in rows and variables in columns.

`energy()`

Purpose: Perform the E-statistic test (Energy test) for multivariate normality using parametric bootstrap to estimate the null distribution.

Usage:

energy(data, R = 1000, seed = 123)

data: A numeric matrix or data frame with observations in rows and variables in columns.
R: Integer; number of bootstrap replicates. Default is 1000.
seed: Optional integer for setting random seed to ensure reproducibility.

`mv_outlier()`

Purpose: Identify multivariate outliers using robust Mahalanobis distances with either a quantile-based or ARW-adjusted cutoff. Optionally generates a Q-Q plot.

Usage:

mv_outlier(
  data,
  outlier = TRUE,
  qqplot = TRUE,
  alpha = 0.05,
  method = "quan",
  label = TRUE,
  title = "Chi-Square Q-Q Plot"
)

data: A numeric matrix or data frame with rows as observations and at least two numeric columns.
outlier: Logical; if TRUE, includes Mahalanobis distances and outlier flags in the output. Default is TRUE.
qqplot: Logical; if TRUE, generates a chi-square Q-Q plot for visualizing outliers. Default is TRUE.
alpha: Numeric; significance level used for ARW-adjusted cutoff. Default is 0.05.
method: Outlier detection method. Must be either "quan" or "adj". Default is "quan".
label: Logical; if TRUE and qqplot = TRUE, labels outliers in the plot. Default is TRUE.
title: Character string for plot title. Default is "Chi-Square Q-Q Plot".

`multivariate_diagnostic_plot()`

Purpose: Generate Mahalanobis Q-Q plots or kernel density visualizations for two numeric variables to assess multivariate normality or bivariate distribution shape.

Usage:

multivariate_diagnostic_plot(data, type = "qq", tol = 1e-25, use_population = TRUE)

data: A numeric vector, matrix, or data frame. Must contain exactly two numeric variables. Non-numeric columns are dropped; incomplete rows are removed.
type: One of "qq", "persp", or "contour". Default is "qq".
"qq": Mahalanobis Q-Q plot
"persp": 3D KDE surface (interactive)
"contour": 2D KDE contour (interactive)
tol: Tolerance value used during matrix inversion. Default is 1e-25.
use_population: Logical; if TRUE, uses population covariance matrix. Default is TRUE.

`univariate_diagnostic_plot()`

Purpose: Generate diagnostic plots for univariate or multivariate numeric data, including Q-Q plots, histograms with density overlays, boxplots, and scatterplot matrices.

Usage:

univariate_diagnostic_plot(data, type = "qq", title = NULL, interactive = FALSE)

data: A numeric vector, matrix, or data frame with observations in rows and variables in columns.
type: Character string specifying the type of plot to create. One of:
- "qq": Q-Q plots
"histogram": Histograms with density curves
"boxplot": Boxplots
"scatter": Scatterplot matrix
title: Optional character string specifying a custom plot title.
interactive: Logical; if TRUE, returns an interactive plotly plot.

`test_univariate_normality()`

Purpose: Perform a univariate normality test on each numeric variable in a vector, matrix, or data frame.

Usage:

test_univariate_normality(data, test = "SW")

data: A numeric vector, matrix, or data frame. Non-numeric columns are removed with a warning.
test: Character string specifying the test to apply. Options include:
- "SW": Shapiro-Wilk
"SF": Shapiro-Francia
"AD": Anderson-Darling
"CVM": Cramér-von Mises
"Lillie": Lilliefors

`descriptives()`

Purpose: Compute descriptive statistics for each numeric variable in a data frame, matrix, or vector.

Usage:

descriptives(data)

data: A numeric vector, matrix, or data frame with observations in rows and variables in columns.

`box_cox_transform()`

Purpose: Apply Box-Cox power transformation to each numeric variable in the input data using either estimated or rounded lambda values.

Usage:

box_cox_transform(data, type = "optimal")

data: A numeric vector, matrix, or data frame with observations in rows and variables in columns.
type: Character; either "optimal" (use estimated lambda) or "rounded" (use rounded lambda). Default is "optimal".

Function Reference

mvn()

summary.mvn()

plot.mvn()

hz()

mardia()

royston()

doornik_hansen()

energy()

mv_outlier()

multivariate_diagnostic_plot()

univariate_diagnostic_plot()

test_univariate_normality()

"Lillie": Lilliefors

descriptives()

box_cox_transform()

`mvn()`

`summary.mvn()`

`plot.mvn()`

`hz()`

`mardia()`

`royston()`

`doornik_hansen()`

`energy()`

`mv_outlier()`

`multivariate_diagnostic_plot()`

`univariate_diagnostic_plot()`

`test_univariate_normality()`

`"Lillie"`: Lilliefors

`descriptives()`

`box_cox_transform()`