FlexMix Driver for Regularized Multivariate Normal Mixtures

This model driver implements the regularization method as introduced by Fraley and Raftery (2007) for univariate normal mixtures. Default parameters for the regularization are taken from that paper. We extend this to the multivariate case assuming independence between variables within components, i.e., we only implement the special case where the covariance matrix is diagonal. For more general applications of normal mixtures see package mclust.

FLXMCregnorm(
  formula = . ~ .,
  zeta_p = NULL,
  kappa_p = 0.01,
  nu_p = 3,
  G = NULL
)

Arguments

formula: A formula which is interpreted relative to the formula specified in the call to flexmix::flexmix() using stats::update.formula(). Only the left-hand side (response) of the formula is used. Default is to use the original model formula specified in flexmix::flexmix().
zeta_p: Scale (hyperparameter for IG prior). If not given the empirical variance divided by the square of the number of components is used as per Fraley and Raftery (2007).
kappa_p: Shrinkage parameter. Functions as if you added kappa_p observations according to the population mean to each component (hyperparameter for IG prior)
nu_p: Degress of freedom (hyperparameter for IG prior)
G: Number of components in the mixture model (not used if zeta_p is given)

Value

an object of class "FLXC"

Details

For the regularization the conjugate prior distributions for the normal distribution are used, which are:

Normal prior with parameter mu_p and sigma^2/kappa_p for the mean.
Inverse Gamma prior with parameters nu_p/2 and zeta_p^2/2 for the variance.

mu_p is computed from the data as the overall means across all components.

A value for the scale hyperparameter zeta_p may be specified directly. Otherwise the empirical variance divided by the square of the number of components is used as per Fraley and Raftery (2007). In which case the number of components (parameter G) needs to be specified.

References

Ernst, D, Ortega Menjivar, L, Scharl, T, Grün, B (2025). Ordinal Clustering with the flex-Scheme. Austrian Journal of Statistics. Submitted manuscript.
Fraley, C, Raftery, AE (2007) Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering. Journal of Classification, 24(2), 155-181

Examples

library("flexmix")
library("flexord")
library("flexclust")

# example data
data("iris", package = "datasets")
my_iris <- subset(iris, select=setdiff(colnames(iris), "Species")) |>
    as.matrix()

# cluster one model with a scale parameter similar to the default for 3 components
m1 <- stepFlexmix(my_iris~1,
                 model=FLXMCregnorm(zeta_p=c(0.23, 0.06, 1.04, 0.19)),
                 k=3)
#> 3 : * * *

summary(m1)
#> 
#> Call:
#> stepFlexmix(my_iris ~ 1, model = FLXMCregnorm(zeta_p = c(0.23, 
#>     0.06, 1.04, 0.19)), k = 3)
#> 
#>        prior size post>0 ratio
#> Comp.1 0.383   71    150 0.473
#> Comp.2 0.275   29    150 0.193
#> Comp.3 0.343   50    150 0.333
#> 
#> 'log Lik.' -683.2037 (df=26)
#> AIC: 1418.407   BIC: 1496.684 
#> 

# rand index of clusters vs species
randIndex(clusters(m1), iris$Species)
#>       ARI 
#> 0.6311837 

# cluster one model with default scale parameter
m2 <- stepFlexmix(my_iris~1,
                 model=FLXMCregnorm(G=3),
                 k=3)
#> 3 : * * *

summary(m2)
#> 
#> Call:
#> stepFlexmix(my_iris ~ 1, model = FLXMCregnorm(G = 3), k = 3)
#> 
#>        prior size post>0  ratio
#> Comp.1 0.552   94    150 0.6267
#> Comp.2 0.107    6    146 0.0411
#> Comp.3 0.341   50    150 0.3333
#> 
#> 'log Lik.' -684.242 (df=26)
#> AIC: 1420.484   BIC: 1498.76 
#> 

# rand index of clusters vs species
randIndex(clusters(m2), iris$Species)
#>       ARI 
#> 0.5596496 


# rand index between both models (should be around 0.8)
randIndex(clusters(m1), clusters(m2))
#>       ARI 
#> 0.6833932