Function Reference
mvn()
Purpose: Conduct multivariate normality testing, descriptive statistics, outlier detection, and transformations in one wrapper.
Usage:
mvn(
data,
subset = NULL,
mvn_test = "hz",
use_population = TRUE,
tol = 1e-25,
alpha = 0.05,
scale = FALSE,
descriptives = TRUE,
transform = "none",
R = 1000,
univariate_test = "AD",
multivariate_outlier_method = "none",
box_cox_transform = FALSE,
box_cox_transform_type = "optimal",
show_new_data = FALSE,
tidy = TRUE
)- data: Data frame or matrix of numeric variables.
- subset: (Optional) Grouping variable name for subset analyses.
- mvn_test: One of
"hz","royston","mardia","doornik_hansen","energy". - use_population: Logical; if
TRUE, use population covariance matrix (default isTRUE). - tol: Numeric tolerance for matrix inversion via
solve()(default1e-25). - alpha: Significance level for ARW outlier cutoff if
multivariate_outlier_method = "adj"(default0.05). - scale: Logical; if
TRUE, standardizes the variables before analysis. - descriptives: Logical; compute descriptive statistics if
TRUE(defaultTRUE). - transform: One of
"none","log","sqrt", or"square". Applies marginal transformation before analysis. - R: Number of bootstrap replicates for the
"energy"test. Default is1000. - univariate_test: One of
"SW","CVM","Lillie","SF","AD". Default is"AD". - multivariate_outlier_method: One of
"none","quan","adj". - box_cox_transform: Logical; if
TRUE, applies Box-Cox transformation (defaultFALSE). - box_cox_transform_type: Either
"optimal"or"rounded"lambda for Box-Cox (default"optimal"). - show_new_data: Logical; if
TRUE, include data with outliers removed (defaultFALSE). - tidy: Logical; if
TRUE, returns tidy-format results with aGroupcolumn (defaultTRUE).
summary.mvn()
Purpose: Provide a structured summary of results from an object of class mvn, including multivariate and univariate test results, descriptive statistics, and outliers (if applicable).
Usage:
summary(object, select = c("mvn", "univariate", "descriptives", "outliers", "new_data"))- object: An object of class
mvn, as returned by themvn()function. - select: Character vector specifying which components to display. Must include one or more of
"mvn","univariate","descriptives","outliers", or"new_data". Defaults to all. - …: Additional arguments (currently unused).
plot.mvn()
Purpose: Generate diagnostic plots for objects of class mvn, including multivariate Q-Q plots, kernel density plots (3D or contour), univariate plots (Q-Q, histograms, boxplots), and multivariate outlier plots.
Usage:
plot(x, diagnostic = c("multivariate", "univariate", "outlier"), type = NULL, interactive = FALSE)- x: An object of class
mvn, as returned by themvn()function. - …: Additional arguments passed to internal plotting functions:
diagnostic: Type of diagnostic plot to display — one of"multivariate","univariate","outlier".type: Specific plot type (e.g.,"qq","boxplot","persp").interactive: Logical; ifTRUE, uses interactive plotting withplotly(only for univariate plots).
hz()
Purpose: Perform Henze-Zirkler’s test for multivariate normality using a log-normal approximation of the test statistic.
Usage:
hz(data, use_population = TRUE, tol = 1e-25)- data: A numeric data frame or matrix where rows are observations and columns are variables.
- use_population: Logical; if
TRUE, uses population covariance matrix. Default isTRUE. - tol: Tolerance value for matrix inversion (
solve()); default is1e-25.
mardia()
Purpose: Perform Mardia’s skewness and kurtosis tests for assessing multivariate normality.
Usage:
mardia(data, use_population = TRUE, tol = 1e-25)- data: A numeric matrix or data frame with observations in rows and variables in columns.
- use_population: Logical; if
TRUE, uses population covariance matrix. Default isTRUE. - tol: Tolerance value used during matrix inversion with
solve(). Default is1e-25.
royston()
Purpose: Perform Royston’s multivariate normality test by combining univariate Shapiro-Wilk or Shapiro-Francia statistics and adjusting for variable correlations.
Usage:
royston(data, tol = 1e-25)- data: A numeric matrix or data frame with observations in rows and variables in columns.
- tol: Numeric tolerance used for matrix inversion via
solve(). Default is1e-25.
doornik_hansen()
Purpose: Perform the Doornik-Hansen omnibus test for multivariate normality using transformed data to combine skewness and kurtosis measures.
Usage:
doornik_hansen(data)- data: A numeric matrix or data frame with observations in rows and variables in columns.
energy()
Purpose: Perform the E-statistic test (Energy test) for multivariate normality using parametric bootstrap to estimate the null distribution.
Usage:
energy(data, R = 1000, seed = 123)- data: A numeric matrix or data frame with observations in rows and variables in columns.
- R: Integer; number of bootstrap replicates. Default is
1000. - seed: Optional integer for setting random seed to ensure reproducibility.
mv_outlier()
Purpose: Identify multivariate outliers using robust Mahalanobis distances with either a quantile-based or ARW-adjusted cutoff. Optionally generates a Q-Q plot.
Usage:
mv_outlier(
data,
outlier = TRUE,
qqplot = TRUE,
alpha = 0.05,
method = "quan",
label = TRUE,
title = "Chi-Square Q-Q Plot"
)- data: A numeric matrix or data frame with rows as observations and at least two numeric columns.
- outlier: Logical; if
TRUE, includes Mahalanobis distances and outlier flags in the output. Default isTRUE. - qqplot: Logical; if
TRUE, generates a chi-square Q-Q plot for visualizing outliers. Default isTRUE. - alpha: Numeric; significance level used for ARW-adjusted cutoff. Default is
0.05. - method: Outlier detection method. Must be either
"quan"or"adj". Default is"quan". - label: Logical; if
TRUEandqqplot = TRUE, labels outliers in the plot. Default isTRUE. - title: Character string for plot title. Default is
"Chi-Square Q-Q Plot".
multivariate_diagnostic_plot()
Purpose: Generate Mahalanobis Q-Q plots or kernel density visualizations for two numeric variables to assess multivariate normality or bivariate distribution shape.
Usage:
multivariate_diagnostic_plot(data, type = "qq", tol = 1e-25, use_population = TRUE)- data: A numeric vector, matrix, or data frame. Must contain exactly two numeric variables. Non-numeric columns are dropped; incomplete rows are removed.
- type: One of
"qq","persp", or"contour". Default is"qq". "qq": Mahalanobis Q-Q plot"persp": 3D KDE surface (interactive)"contour": 2D KDE contour (interactive)- tol: Tolerance value used during matrix inversion. Default is
1e-25. - use_population: Logical; if
TRUE, uses population covariance matrix. Default isTRUE.
univariate_diagnostic_plot()
Purpose: Generate diagnostic plots for univariate or multivariate numeric data, including Q-Q plots, histograms with density overlays, boxplots, and scatterplot matrices.
Usage:
univariate_diagnostic_plot(data, type = "qq", title = NULL, interactive = FALSE)- data: A numeric vector, matrix, or data frame with observations in rows and variables in columns.
- type: Character string specifying the type of plot to create. One of:
"qq": Q-Q plots
"histogram": Histograms with density curves"boxplot": Boxplots"scatter": Scatterplot matrix- title: Optional character string specifying a custom plot title.
- interactive: Logical; if
TRUE, returns an interactiveplotlyplot.
test_univariate_normality()
Purpose: Perform a univariate normality test on each numeric variable in a vector, matrix, or data frame.
Usage:
test_univariate_normality(data, test = "SW")- data: A numeric vector, matrix, or data frame. Non-numeric columns are removed with a warning.
- test: Character string specifying the test to apply. Options include:
"SW": Shapiro-Wilk
"SF": Shapiro-Francia"AD": Anderson-Darling"CVM": Cramér-von Mises"Lillie": Lilliefors
descriptives()
Purpose: Compute descriptive statistics for each numeric variable in a data frame, matrix, or vector.
Usage:
descriptives(data)- data: A numeric vector, matrix, or data frame with observations in rows and variables in columns.
box_cox_transform()
Purpose: Apply Box-Cox power transformation to each numeric variable in the input data using either estimated or rounded lambda values.
Usage:
box_cox_transform(data, type = "optimal")- data: A numeric vector, matrix, or data frame with observations in rows and variables in columns.
- type: Character; either
"optimal"(use estimated lambda) or"rounded"(use rounded lambda). Default is"optimal".