Title: | Robust Mixture Model |
---|---|
Description: | Algorithms for estimating robustly the parameters of a Gaussian, Student, or Laplace Mixture Model. |
Authors: | Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut] |
Maintainer: | Antoine Godichon-Baggioni <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.0 |
Built: | 2025-03-02 04:49:53 UTC |
Source: | https://github.com/cran/RGMM |
In this package, we provide functions to provide robust clustering in the case of Gaussian, Student and Laplace Mixture Models. Function RobVar
computes robustly the covariance of a numerical data set which are realizations of Gaussian, Student or Laplace vectors. Function RobMM
enables to provide a clustering of a numerical data set, RMMplot
enables to produce graph for Robust Mixture Models, while Gen_MM
enables to generate possibly contaminated mixture of Gaussian, Student and Laplace vectors.
Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut]
Maintainer: Antoine Godichon-Baggioni <[email protected]>
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
Generate a sample of a Mixture Model
Gen_MM(nk=NA, df=3, mu=NA, Sigma=FALSE, delta=0,cont="Student", model="Gaussian", dfcont=1, mucont=FALSE, Sigmacont=FALSE, minU=-20, maxU=20)
Gen_MM(nk=NA, df=3, mu=NA, Sigma=FALSE, delta=0,cont="Student", model="Gaussian", dfcont=1, mucont=FALSE, Sigmacont=FALSE, minU=-20, maxU=20)
nk |
An integer vector containing the desired number of data for each class. The defulat is |
df |
An integer larger (or qual) than |
mu |
A numeric matrix whose raws correspond to the centers of the classes. By default, |
Sigma |
An array containing the variance of each class. See exemple for more details. |
delta |
A positive scalr between |
cont |
The kind of contamination chosen. Can be equal to |
model |
A string character specifying the model chosen for the Mixture Model. Can be equal to |
dfcont |
A positive integer specifying the degrees of freedom of the contamination laws if |
mucont |
A numeric matrix whose rows correspond to the centers of the contamination laws. By default, |
Sigmacont |
An array containing the variance of each contamination law. By default, |
minU |
A scalar giving the lower bound of the uniform law of the contamination if |
maxU |
A scalar giving the upper bound of the uniform law of the contamination if |
A list with:
Z |
An integer vector specifying the true classification. If |
C |
A |
X |
A numerical matrix giving the generated data. |
p <- 3 nk <- rep(50,p) mu <- c() for (i in 1:length(nk)) { Z <- rnorm(3) mu <- rbind(mu,length(nk)*Z/sqrt(sum(Z^2))) } Sigma <- array(dim=c(length(nk), p, p)) for (i in 1:length(nk)) { Sigma[i, ,] <- diag(p) } ech <- Gen_MM(nk=nk,mu=mu,Sigma=Sigma)
p <- 3 nk <- rep(50,p) mu <- c() for (i in 1:length(nk)) { Z <- rnorm(3) mu <- rbind(mu,length(nk)*Z/sqrt(sum(Z^2))) } Sigma <- array(dim=c(length(nk), p, p)) for (i in 1:length(nk)) { Sigma[i, ,] <- diag(p) } ech <- Gen_MM(nk=nk,mu=mu,Sigma=Sigma)
A plot function for Robust Mixture Model
RMMplot(a,outliers=TRUE, graph=c('Two_Dim','Two_Dim_Uncertainty','ICL','BIC', 'Profiles','Uncertainty'),bestresult=TRUE,K=FALSE)
RMMplot(a,outliers=TRUE, graph=c('Two_Dim','Two_Dim_Uncertainty','ICL','BIC', 'Profiles','Uncertainty'),bestresult=TRUE,K=FALSE)
a |
Output from |
outliers |
An argument telling if there are outliers or note. In this case, Two dimensional plots and profiles plots will be done without detected outliers. Default is |
graph |
A string specifying the type of graph requested.
Default is |
bestresult |
A logical indicating if the graphs must be done for the result chosen by the selected criterion. Default is |
K |
A logical or positive integer giving the chosen number of clusters for each the graphs should be drawn. |
## Not run: ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3)) X <- ech$X res <- RobMM(X , nclust=3) RMMplot(res,graph=c('Two_Dim')) ## End(Not run)
## Not run: ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3)) X <- ech$X res <- RobMM(X , nclust=3) RMMplot(res,graph=c('Two_Dim')) ## End(Not run)
Robust Mixture Model
RobMM(X, nclust=2:5, model="Gaussian", ninit=10, nitermax=50, niterEM=50, niterMC=50, df=3, epsvp=10^(-4), mc_sample_size=1000, LogLike=-Inf, init='genie', epsPi=10^-4, epsout=-20,scale='none', alpha=0.75, c=ncol(X), w=2, epsilon=10^(-8), criterion='BIC',methodMC="RobbinsMC", par=TRUE, methodMCM="Weiszfeld")
RobMM(X, nclust=2:5, model="Gaussian", ninit=10, nitermax=50, niterEM=50, niterMC=50, df=3, epsvp=10^(-4), mc_sample_size=1000, LogLike=-Inf, init='genie', epsPi=10^-4, epsout=-20,scale='none', alpha=0.75, c=ncol(X), w=2, epsilon=10^(-8), criterion='BIC',methodMC="RobbinsMC", par=TRUE, methodMCM="Weiszfeld")
X |
A matrix giving the data. |
nclust |
A vector of positive integers giving the possible number of clusters. |
model |
The mixture model. Can be |
ninit |
The number of random initisalizations. Befault is |
nitermax |
The number of iterations for the Weiszfeld algorithm if |
niterEM |
The number of iterations for the EM algorithm. |
niterMC |
The number of iterations for estimating robustly the variance of each class if |
df |
The degrees of freedom for the Student law if |
scale |
Run the algorithm on scaled data if |
epsvp |
The minimum values the estimates of the eigenvalues of the Median Covariation Matrix can take. Default is |
mc_sample_size |
The number of data generated for the Monte-Carlo method for estimating robustly the variance. |
LogLike |
The initial loglikelihood to "beat". Defulat is |
init |
Can be |
epsPi |
A scalar to ensure the estimates of the probabilities of belonging to a class or uniformly lower bounded by a positive constant. |
epsout |
If the probability of belonging of a data to a class is smaller than |
alpha |
A scalar between 1/2 and 1 used in the stepsequence for the Robbins-Monro method if |
c |
The constant in the stepsequence if |
w |
The power for the weighted averaged Robbins-Monro algorithm if |
epsilon |
Stoping condition for the Weiszfeld algorithm. |
criterion |
The criterion for selecting the number of cluster. Can be |
methodMC |
The method chosen to estimate robustly the variance. Can be |
par |
Is equal to |
methodMCM |
The method chosen for estimating the Median Covariation Matrix. Can be |
A list with:
bestresult |
A list giving all the results fo the best clustering (chosen with respect to the selected criterion. |
allresults |
A list containing all the results. |
ICL |
The ICL criterion for all the number of classes selected. |
BIC |
The ICL criterion for all the number of classes selected. |
data |
The initial data. |
nclust |
A vector of positive integers giving the possible number of clusters. |
Kopt |
The number of clusters chosen by the selected criterion. |
For the lists bestresult
and allresults[[k]]
:
centers |
A matrix whose rows are the centers of the classes. |
Sigma |
A matrix containing all the variance of the classes |
LogLike |
The final LogLikelihood. |
Pi |
A matrix giving the probabilities of each data to belong to each class. |
niter |
The number of iterations of the EM algorithm. |
initEM |
A vector giving the initialized clustering if |
prop |
A vector giving the proportions of each classes. |
outliers |
A vector giving the detected outliers. |
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
See also Gen_MM
, RMMplot
and RobVar
.
## Not run: ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3)) X <- ech$X res <- RobMM(X , nclust=3) RMMplot(res,graph=c('Two_Dim')) ## End(Not run)
## Not run: ech <- Gen_MM(mu = matrix(c(rep(-2,3),rep(2,3),rep(0,3)),byrow = TRUE,nrow=3)) X <- ech$X res <- RobMM(X , nclust=3) RMMplot(res,graph=c('Two_Dim')) ## End(Not run)
Robust estimate of the variance
RobVar(X, c=2, alpha=0.75, model='Gaussian', methodMCM='Weiszfeld', methodMC='Robbins' , mc_sample_size=1000, init=rep(0, ncol(X)), init_cov=diag(ncol(X)), epsilon=10^(-8), w=2, df=3, niterMC=50, cgrad=2, niterWeisz=50, epsWeisz=10^-8, alphaMedian=0.75, cmedian=2)
RobVar(X, c=2, alpha=0.75, model='Gaussian', methodMCM='Weiszfeld', methodMC='Robbins' , mc_sample_size=1000, init=rep(0, ncol(X)), init_cov=diag(ncol(X)), epsilon=10^(-8), w=2, df=3, niterMC=50, cgrad=2, niterWeisz=50, epsWeisz=10^-8, alphaMedian=0.75, cmedian=2)
X |
A numeric matrix of whose rows correspond to observations. |
c |
A positive scalar giving the constant in the stepsequence of the Robbins-Monro or Gradient method if |
alpha |
A scalar between 1/2 and 1 giving the power in the stepsequence for the Robbins-Monro algorithm is |
model |
A string character specifying the model: can be |
methodMCM |
A string character specifying the method to estimate the Median Covariation Matrix. Can be |
methodMC |
A string character specifying the method to estimate robustly the variance. Can be |
mc_sample_size |
A positive integer giving the number of data simulated for the Monte-Carlo method. Default is |
init |
A numeric vector giving the initialization for estimating the median. |
init_cov |
A numeric matrix giving an initialization for estimating the Median Covariation Matrix. |
epsilon |
A positive scalar giving a stoping condition for algorithm. |
w |
A positive integer specifying the power for the weighted averaged Robbins-Monro algorithm if |
df |
An integer larger (or equal) than |
niterMC |
An integer giving the number of iterations for iterative algorithms if the selected method is |
cgrad |
A numeric vector with positive values giving the stepsequence of the gradient algorithm for estimating the variance if |
niterWeisz |
A positive integer giving the maximum number of iterations for the Weiszfeld algorithms if |
epsWeisz |
A stopping factor for the Weiszfeld algorithm. |
alphaMedian |
A scalar betwwen 1/2 and 1 giving the power of the stepsequence of the gradient algorithm for estimating the Median Covariation Matrix if |
cmedian |
A positive scalar giving the constant in the stepsequence of the gradient algorithm for estimating the Median Covariation Matrix if |
An object of class list
with the following outputs:
median |
The median of |
variance |
The robust variance of |
median |
The Median Covariation Matrix of |
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
n <- 2000 d <- 5 Sigma <-diag(1:d) mean <- rep(0,d) X <- mvtnorm::rmvnorm(n,mean,Sigma) RVar=RobVar(X)
n <- 2000 d <- 5 Sigma <-diag(1:d) mean <- rep(0,d) X <- mvtnorm::rmvnorm(n,mean,Sigma) RVar=RobVar(X)