Package 'RGMM' reference manual

Title:	Robust Mixture Model
Description:	Algorithms for estimating robustly the parameters of a Gaussian, Student, or Laplace Mixture Model.
Authors:	Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut]
Maintainer:	Antoine Godichon-Baggioni <[email protected]>
License:	GPL (>= 2)
Version:	2.1.0
Built:	2025-03-02 04:49:53 UTC
Source:	https://github.com/cran/RGMM

Robust Mixture Model

Description

In this package, we provide functions to provide robust clustering in the case of Gaussian, Student and Laplace Mixture Models. Function RobVar computes robustly the covariance of a numerical data set which are realizations of Gaussian, Student or Laplace vectors. Function RobMM enables to provide a clustering of a numerical data set, RMMplot enables to produce graph for Robust Mixture Models, while Gen_MM enables to generate possibly contaminated mixture of Gaussian, Student and Laplace vectors.

Author(s)

Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut]

Maintainer: Antoine Godichon-Baggioni <[email protected]>

References

Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.

Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480

Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.

Gen_MM

Description

Generate a sample of a Mixture Model

Usage

Gen_MM(nk=NA, df=3, mu=NA, Sigma=FALSE, delta=0,cont="Student",
                model="Gaussian", dfcont=1, mucont=FALSE, Sigmacont=FALSE,
                minU=-20, maxU=20)
Gen_MM(nk=NA, df=3, mu=NA, Sigma=FALSE, delta=0,cont="Student",
                model="Gaussian", dfcont=1, mucont=FALSE, Sigmacont=FALSE,
                minU=-20, maxU=20)

Arguments

`nk`	An integer vector containing the desired number of data for each class. The defulat is `nk=rep(500,3)`.
`df`	An integer larger (or qual) than `3` specifying the degrees of freedom of the Student law, if `model='Student'`. Default is `3`.
`mu`	A numeric matrix whose raws correspond to the centers of the classes. By default, `mu` is generated randomly.
`Sigma`	An array containing the variance of each class. See exemple for more details.
`delta`	A positive scalr between `0` and `1` giving the proportion of contaminated data. Default is `0`.
`cont`	The kind of contamination chosen. Can be equal to `'Unif'` or `'Student'`.
`model`	A string character specifying the model chosen for the Mixture Model. Can be equal to `'Gaussian'` (default) or `'Student'`.
`dfcont`	A positive integer specifying the degrees of freedom of the contamination laws if `cont='Student'`. Default is `1`.
`mucont`	A numeric matrix whose rows correspond to the centers of the contamination laws. By default, `mucont` is chosen equal to `mu`.
`Sigmacont`	An array containing the variance of each contamination law. By default, `sigmacont` is chosen equal to `sigma`.
`minU`	A scalar giving the lower bound of the uniform law of the contamination if `cont='Unif'`.
`maxU`	A scalar giving the upper bound of the uniform law of the contamination if `cont='Unif'`.

Value

A list with:

`Z`	An integer vector specifying the true classification. If `delta` is non nul, the contaminated data are consider as a class.
`C`	A `0-1` vector specifying the contaminated data.
`X`	A numerical matrix giving the generated data.

Examples

p <- 3
nk <- rep(50,p)
mu <- c()
for (i in 1:length(nk))
{
  Z <- rnorm(3)
  mu <- rbind(mu,length(nk)*Z/sqrt(sum(Z^2)))
}
Sigma <- array(dim=c(length(nk), p, p))
for (i in 1:length(nk))
{
  Sigma[i, ,] <- diag(p)
}
ech <- Gen_MM(nk=nk,mu=mu,Sigma=Sigma)
p <- 3
nk <- rep(50,p)
mu <- c()
for (i in 1:length(nk))
{
  Z <- rnorm(3)
  mu <- rbind(mu,length(nk)*Z/sqrt(sum(Z^2)))
}
Sigma <- array(dim=c(length(nk), p, p))
for (i in 1:length(nk))
{
  Sigma[i, ,] <- diag(p)
}
ech <- Gen_MM(nk=nk,mu=mu,Sigma=Sigma)

RMMplot

Description

A plot function for Robust Mixture Model

Usage

RMMplot(a,outliers=TRUE,
    graph=c('Two_Dim','Two_Dim_Uncertainty','ICL','BIC',
    'Profiles','Uncertainty'),bestresult=TRUE,K=FALSE)
RMMplot(a,outliers=TRUE,
    graph=c('Two_Dim','Two_Dim_Uncertainty','ICL','BIC',
    'Profiles','Uncertainty'),bestresult=TRUE,K=FALSE)

Arguments

`a`	Output from `RobMM`.
`outliers`	An argument telling if there are outliers or note. In this case, Two dimensional plots and profiles plots will be done without detected outliers. Default is `TRUE`.
`graph`	A string specifying the type of graph requested. Default is `c('Two_Dim','Two_Dim_Uncertainty','ICL','BIC', 'Profiles','Uncertainty')`.
`bestresult`	A logical indicating if the graphs must be done for the result chosen by the selected criterion. Default is `TRUE`.
`K`	A logical or positive integer giving the chosen number of clusters for each the graphs should be drawn.

Examples

## Not run: 
ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3))
 X <- ech$X
 res <- RobMM(X , nclust=3)
 RMMplot(res,graph=c('Two_Dim'))
 
## End(Not run)
## Not run: 
ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3))
 X <- ech$X
 res <- RobMM(X , nclust=3)
 RMMplot(res,graph=c('Two_Dim'))
 
## End(Not run)

RobMM

Description

Robust Mixture Model

Usage

RobMM(X, nclust=2:5, model="Gaussian", ninit=10,
               nitermax=50, niterEM=50, niterMC=50, df=3,
               epsvp=10^(-4), mc_sample_size=1000, LogLike=-Inf,
               init='genie', epsPi=10^-4, epsout=-20,scale='none',
               alpha=0.75, c=ncol(X), w=2, epsilon=10^(-8),
               criterion='BIC',methodMC="RobbinsMC", par=TRUE,
               methodMCM="Weiszfeld")
RobMM(X, nclust=2:5, model="Gaussian", ninit=10,
               nitermax=50, niterEM=50, niterMC=50, df=3,
               epsvp=10^(-4), mc_sample_size=1000, LogLike=-Inf,
               init='genie', epsPi=10^-4, epsout=-20,scale='none',
               alpha=0.75, c=ncol(X), w=2, epsilon=10^(-8),
               criterion='BIC',methodMC="RobbinsMC", par=TRUE,
               methodMCM="Weiszfeld")

Arguments

`X`	A matrix giving the data.
`nclust`	A vector of positive integers giving the possible number of clusters.
`model`	The mixture model. Can be `'Gaussian'` (by default), `'Student'` and `'Laplace'`.
`ninit`	The number of random initisalizations. Befault is `10`.
`nitermax`	The number of iterations for the Weiszfeld algorithm if `MethodMCM= 'Weiszfeld'`.
`niterEM`	The number of iterations for the EM algorithm.
`niterMC`	The number of iterations for estimating robustly the variance of each class if `methodMC='FixMC'` or `methodMC='GradMC'`.
`df`	The degrees of freedom for the Student law if `model='Student'`.
`scale`	Run the algorithm on scaled data if `scale='robust'`.
`epsvp`	The minimum values the estimates of the eigenvalues of the Median Covariation Matrix can take. Default is `10^-4`.
`mc_sample_size`	The number of data generated for the Monte-Carlo method for estimating robustly the variance.
`LogLike`	The initial loglikelihood to "beat". Defulat is `-Inf`.
`init`	Can be `F` if no non random initialization of the algorithm is done, `'genie'` if the algorithm is initialized with the help of the function `'genie'` of the package `genieclust` or `'Mclust'` if the initialization is done with the function `hclass` of the package `Mclust`.
`epsPi`	A scalar to ensure the estimates of the probabilities of belonging to a class or uniformly lower bounded by a positive constant.
`epsout`	If the probability of belonging of a data to a class is smaller than `exp(epsout)`, this probbility is replaced by `exp(epsout)` for calculating the logLikelihood. If the probability is too weak for each class, the data is considered as an outlier. Defautl is `-20`.
`alpha`	A scalar between 1/2 and 1 used in the stepsequence for the Robbins-Monro method if `methodMC='RobbinsMC'`.
`c`	The constant in the stepsequence if `methodMC='RobbinsMC'` or `methodMC='GradMC'`.
`w`	The power for the weighted averaged Robbins-Monro algorithm if `methodMC='RobbinsMC'`.
`epsilon`	Stoping condition for the Weiszfeld algorithm.
`criterion`	The criterion for selecting the number of cluster. Can be `'ICL'` (default) or `'BIC'`.
`methodMC`	The method chosen to estimate robustly the variance. Can be `'RobbinsMC'`, `'GradMC'` or `'FixMC'`.
`par`	Is equal to `T` if the parallelization of the algorithm is allowed.
`methodMCM`	The method chosen for estimating the Median Covariation Matrix. Can be `'Gmedian'` or `'Weiszfeld'`

Value

A list with:

`bestresult`	A list giving all the results fo the best clustering (chosen with respect to the selected criterion.
`allresults`	A list containing all the results.
`ICL`	The ICL criterion for all the number of classes selected.
`BIC`	The ICL criterion for all the number of classes selected.
`data`	The initial data.
`nclust`	A vector of positive integers giving the possible number of clusters.
`Kopt`	The number of clusters chosen by the selected criterion.

For the lists bestresult and allresults[[k]]:

`centers`	A matrix whose rows are the centers of the classes.
`Sigma`	A matrix containing all the variance of the classes
`LogLike`	The final LogLikelihood.
`Pi`	A matrix giving the probabilities of each data to belong to each class.
`niter`	The number of iterations of the EM algorithm.
`initEM`	A vector giving the initialized clustering if `init='Mclust'` or `init='genie'`.
`prop`	A vector giving the proportions of each classes.
`outliers`	A vector giving the detected outliers.

References

Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.

Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480

Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.

Examples

## Not run: 
ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3))
 X <- ech$X
 res <- RobMM(X , nclust=3)
 RMMplot(res,graph=c('Two_Dim'))
 
## End(Not run)
## Not run: 
ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3))
 X <- ech$X
 res <- RobMM(X , nclust=3)
 RMMplot(res,graph=c('Two_Dim'))
 
## End(Not run)

RobVar

Description

Robust estimate of the variance

Usage

RobVar(X, c=2, alpha=0.75, model='Gaussian', methodMCM='Weiszfeld',
                methodMC='Robbins' , mc_sample_size=1000, init=rep(0, ncol(X)),
                init_cov=diag(ncol(X)),
                epsilon=10^(-8), w=2, df=3, niterMC=50,
                cgrad=2, niterWeisz=50, epsWeisz=10^-8, alphaMedian=0.75, cmedian=2)
RobVar(X, c=2, alpha=0.75, model='Gaussian', methodMCM='Weiszfeld',
                methodMC='Robbins' , mc_sample_size=1000, init=rep(0, ncol(X)),
                init_cov=diag(ncol(X)),
                epsilon=10^(-8), w=2, df=3, niterMC=50,
                cgrad=2, niterWeisz=50, epsWeisz=10^-8, alphaMedian=0.75, cmedian=2)

Arguments

`X`	A numeric matrix of whose rows correspond to observations.
`c`	A positive scalar giving the constant in the stepsequence of the Robbins-Monro or Gradient method if `methodMC='RobbinsMC'` or `methodMC='GradMC'`. Default is `2`.
`alpha`	A scalar between 1/2 and 1 giving the power in the stepsequence for the Robbins-Monro algorithm is `methodMC='RobbinsMC'`. Default is `0.75`.
`model`	A string character specifying the model: can be `'Gaussian'` (default), `'Student'` or `'Laplace'`.
`methodMCM`	A string character specifying the method to estimate the Median Covariation Matrix. Can be `'Gmedian'` or `'Weiszfeld'` (defualt).
`methodMC`	A string character specifying the method to estimate robustly the variance. Can be `'Robbins'` (default), `'Fix'` or `'Grad'`.
`mc_sample_size`	A positive integer giving the number of data simulated for the Monte-Carlo method. Default is `1000`.
`init`	A numeric vector giving the initialization for estimating the median.
`init_cov`	A numeric matrix giving an initialization for estimating the Median Covariation Matrix.
`epsilon`	A positive scalar giving a stoping condition for algorithm.
`w`	A positive integer specifying the power for the weighted averaged Robbins-Monro algorithm if `methodMC='RobbinsMC'`.
`df`	An integer larger (or equal) than `3` specifying the degrees of freedom for the Student law if `model='Student'`. See also `Gen_MM`. Default is `3`.
`niterMC`	An integer giving the number of iterations for iterative algorithms if the selected method is `'Grad'` or `'Fix'`. Default is `50`.
`cgrad`	A numeric vector with positive values giving the stepsequence of the gradient algorithm for estimating the variance if `methodMC='Grad'`. Its length has to be equal to `niter`.
`niterWeisz`	A positive integer giving the maximum number of iterations for the Weiszfeld algorithms if `methodMCM='Weiszfeld'`. Default is `50`.
`epsWeisz`	A stopping factor for the Weiszfeld algorithm.
`alphaMedian`	A scalar betwwen 1/2 and 1 giving the power of the stepsequence of the gradient algorithm for estimating the Median Covariation Matrix if `methodMCM='Gmedian'`. Default is `0.75`.
`cmedian`	A positive scalar giving the constant in the stepsequence of the gradient algorithm for estimating the Median Covariation Matrix if `methodMCM='Gmedian'`. Default is `2`.

Value

An object of class list with the following outputs:

`median`	The median of `X`.
`variance`	The robust variance of `X`.
`median`	The Median Covariation Matrix of `X`.

References

Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.

Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480

Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.

Examples


n <- 2000
d <- 5
Sigma <-diag(1:d)
mean <- rep(0,d)
X <- mvtnorm::rmvnorm(n,mean,Sigma)
RVar=RobVar(X)
n <- 2000
d <- 5
Sigma <-diag(1:d)
mean <- rep(0,d)
X <- mvtnorm::rmvnorm(n,mean,Sigma)
RVar=RobVar(X)

Package 'RGMM'

Help Index

Robust Mixture Model

Description

Author(s)

References

Gen_MM

Description

Usage

Arguments

Value

See Also

Examples

RMMplot

Description

Usage

Arguments

See Also

Examples

RobMM

Description

Usage

Arguments

Value

References

See Also

Examples

RobVar

Description

Usage

Arguments

Value

References

See Also

Examples