This model driver can be used to cluster data using the beta-binomial distribution.

FLXMCregbetabinom(
  formula = . ~ .,
  size,
  alpha = 0,
  eps = sqrt(.Machine$double.eps)
)

Arguments

formula

A formula which is interpreted relative to the formula specified in the call to flexmix::flexmix() using stats::update.formula(). Only the left-hand side (response) of the formula is used. Default is to use the original model formula specified in flexmix::flexmix().

size

Number of trials (one or more).

alpha

A non-negative scalar acting as regularization parameter. Can be regarded as adding alpha observations equal to the population mean to each component.

eps

Lower threshold for the shape parameters a and b.

Value

an object of class "FLXC"

Details

Using a regularization parameter alpha greater than zero can be viewed as adding alpha observations equal to the population mean to each component. This can be used to avoid degenerate solutions (i.e., probabilites of 0 or 1). It also has the effect that clusters become more similar to each other the larger alpha is chosen. For small values this effect is, however, mostly negligible.

References

Ernst, D, Ortega Menjivar, L, Scharl, T, Grün, B (2025). Ordinal Clustering with the flex-Scheme. Austrian Journal of Statistics. Submitted manuscript.

Kondofersky, I (2008). Modellbasiertes Clustern mit der Beta-Binomialverteilung. Bachelor's thesis, Ludwig-Maximilians-Universität München.

Examples

library("flexmix")
#> Loading required package: lattice
library("flexord")
library("flexclust")
#> Loading required package: grid
#> Loading required package: modeltools
#> Loading required package: stats4

# Sample data
k <- 4     # nr of clusters
size <- 4  # nr of trials
N <- 100   # obs. per cluster

set.seed(0xdeaf)

# random probabilities per component
probs <- lapply(seq_len(k), \(ki) runif(10, 0.01, 0.99))

# sample data
dat <- lapply(probs, \(p) {
    lapply(p, \(p_i) {
        rbinom(N, size, p_i)
    }) |> do.call(cbind, args=_)
}) |> do.call(rbind, args=_)

true_clusters <- rep(1:4, rep(N, k))

# Sample data is drawn from a binomial distribution but we fit
# beta-binomial which is a slight mis-specification but the
# beta-binomial can be seen as a generalized binomial.
m <- flexmix(dat~1, model=FLXMCbetabinom(size=size, alpha2=0),
             cluster = true_clusters)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'model' in selecting a method for function 'flexmix': could not find function "FLXMCbetabinom"

# \donttest{
# Cluster without regularization
m1 <- stepFlexmix(dat~1, model=FLXMCbetabinom(size=size, alpha2=0), k=k)
#> 4 : *Error in h(simpleError(msg, call)) : 
#>   error in evaluating the argument 'model' in selecting a method for function 'flexmix': could not find function "FLXMCbetabinom"
#>  *
#> Warning: restarting interrupted promise evaluation
#> Warning: restarting interrupted promise evaluation
#> Error in h(simpleError(msg, call)) : 
#>   error in evaluating the argument 'model' in selecting a method for function 'flexmix': could not find function "FLXMCbetabinom"
#>  *
#> Warning: restarting interrupted promise evaluation
#> Warning: restarting interrupted promise evaluation
#> Error in h(simpleError(msg, call)) : 
#>   error in evaluating the argument 'model' in selecting a method for function 'flexmix': could not find function "FLXMCbetabinom"
#> 
#> Error in stepFlexmix(dat ~ 1, model = FLXMCbetabinom(size = size, alpha2 = 0),     k = k): no convergence to a suitable mixture

# Cluster with regularization
m2 <- stepFlexmix(dat~1, model=FLXMCbetabinom(size=size, alpha2=1), k=k)
#> 4 : *Error in h(simpleError(msg, call)) : 
#>   error in evaluating the argument 'model' in selecting a method for function 'flexmix': could not find function "FLXMCbetabinom"
#>  *
#> Warning: restarting interrupted promise evaluation
#> Warning: restarting interrupted promise evaluation
#> Error in h(simpleError(msg, call)) : 
#>   error in evaluating the argument 'model' in selecting a method for function 'flexmix': could not find function "FLXMCbetabinom"
#>  *
#> Warning: restarting interrupted promise evaluation
#> Warning: restarting interrupted promise evaluation
#> Error in h(simpleError(msg, call)) : 
#>   error in evaluating the argument 'model' in selecting a method for function 'flexmix': could not find function "FLXMCbetabinom"
#> 
#> Error in stepFlexmix(dat ~ 1, model = FLXMCbetabinom(size = size, alpha2 = 1),     k = k): no convergence to a suitable mixture

# Both models are mostly able to reconstruct the true clusters (ARI ~ 0.95)
# (it's a very easy clustering problem)
# Small values for the regularization don't seem to affect the ARI (much)
randIndex(clusters(m1), true_clusters)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'randIndex': error in evaluating the argument 'object' in selecting a method for function 'clusters': object 'm1' not found
randIndex(clusters(m2), true_clusters)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'randIndex': error in evaluating the argument 'object' in selecting a method for function 'clusters': object 'm2' not found
# }