Title: | Robust Multivariate Regression |
---|---|
Description: | Robust methods for estimating the parameters of multivariate Gaussian linear models. |
Authors: | Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut], Laure Sansonnet [aut] |
Maintainer: | Antoine Godichon-Baggioni <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-11-02 04:43:20 UTC |
Source: | https://github.com/cran/RobRegression |
This Package focuses on multivariate robust Guassian linear regression.
We provide a function Robust_Mahalanobis_regression
which enables to obtain robust estimates of the parameters of Multivariate Gaussian Linear Models with the help of the Mahalanobis distance, using a Stochastic Gradient algorithm or a Fix point. This is based on the function Robust_Variance
which allows to obtain robust estimation of the variance, and so, also for low rank matrices (see Godichon-Baggioni and RObin (2024) <doi:10.1007/s11222-023-10362-9>)
Robust methods for estimating the parameters of
multivariate Gaussian linear models..
Package: | RobRegression |
Type: | Package |
Title: | Robust Multivariate Regression |
Version: | 0.1.0 |
Authors@R: | c(person("Antoine","Godichon-Baggioni", role = c("aut", "cre","cph"), email = "[email protected]"), person("Stéphane","Robin", role = "aut"), person("Laure","Sansonnet", role = "aut")) |
Description: | Robust methods for estimating the parameters of multivariate Gaussian linear models. |
License: | GPL (>= 2) |
Encoding: | UTF-8 |
Imports: | Rcpp, foreach, doParallel, mvtnorm,parallel,RSpectra , capushe, KneeArrower, fastmatrix, DescTools |
LinkingTo: | Rcpp, RcppArmadillo |
NeedsCompilation: | yes |
RoxygenNote: | 7.1.2 |
Packaged: | 2024-04-22 17:56:54 UTC; pug56 |
Author: | Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut], Laure Sansonnet [aut] |
Maintainer: | Antoine Godichon-Baggioni <[email protected]> |
Date/Publication: | 2024-04-23 09:00:02 UTC |
Repository: | https://godichon-baggioni.r-universe.dev |
RemoteUrl: | https://github.com/cran/RobRegression |
RemoteRef: | HEAD |
RemoteSha: | ed424e0300fb2384b611adf7e3eb49c185bba10d |
Index of help topics:
RobRegression-package Robust Multivariate Regression Robust_Mahalanobis_regression Robust_Mahalanobis_regression Robust_Variance Robust_Variance Robust_regression Robust_regression
Antoine Godichon-Baggioni [aut, cre, cph], Stéphane Robin [aut], Laure Sansonnet [aut]
Maintainer: Antoine Godichon-Baggioni <[email protected]>
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Godichon-Baggioni, A. and Robin, S. (2024). Recursive ridge regression using second-order stochastic algorithms. Computational Statistics & Data Analysis, 190, 107854.
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
We propose here a function which enables to provide a robust estimation of the parameters of Multivariate Gaussian Linear Models of the form where
is a 0-mean Gaussian vector of variance
. In addition, one can aslo consider a low-rank variance of the form
where
is a positive scalar and
is a matrix of rank
. More precisely, the aim is to minimize the functional
.
Robust_Mahalanobis_regression(X, Y, alphaRM=0.66, alphareg=0.66, w=2, lambda=0, creg='default', K=2:30, par=TRUE, epsilon=10^(-8), method_regression='Offline', niter_regression=50, cRM='default', mc_sample_size='default', method_MCM='Weiszfeld', methodMC='Robbins', niterMC=50, ridge=1, eps_vp=10^(-4), nlambda=50, scale='none', tol=10^(-3))
Robust_Mahalanobis_regression(X, Y, alphaRM=0.66, alphareg=0.66, w=2, lambda=0, creg='default', K=2:30, par=TRUE, epsilon=10^(-8), method_regression='Offline', niter_regression=50, cRM='default', mc_sample_size='default', method_MCM='Weiszfeld', methodMC='Robbins', niterMC=50, ridge=1, eps_vp=10^(-4), nlambda=50, scale='none', tol=10^(-3))
X |
A |
Y |
A |
method_regression |
The method used for estimating the parameter. Should be |
niter_regression |
The maximum number of regression iterations if the fix point algorithm is used, i.e. if |
epsilon |
Stoping condition for the fix point algorithm if |
scale |
If a scaling is used. |
ridge |
The power of the penalty: i.e. should be |
lambda |
A vector giving the different studied penalizations. If |
par |
Is equal to |
nlambda |
The number of tested penalizations if |
alphaRM |
A scalar between 1/2 and 1 used in the stepsequence if the Robbins-Monro algorithm is used, i.e. if |
alphareg |
A scalar between 1/2 and 1 used in the stepsequence for stochastic gradient algorithm if |
w |
The power for the weighted averaged algorithms if |
creg |
The constant in the stepsequence if the averaged stochastic gradient algorithm is used, i.e. if |
K |
A vector containing the possible values of |
mc_sample_size |
The number of data generated for the Monte-Carlo method for estimating robustly the eigenvalues of the variance. |
method_MCM |
The method chosen to estimate Median Covariation Matrix. Can be |
methodMC |
The method chosen to estimate robustly the variance. Can be |
niterMC |
The number of iterations for estimating robustly the variance of each class if |
eps_vp |
The minimum values for the estimates of the eigenvalues of the Variance can take. Default is |
cRM |
The constant in the stepsequence if the Robbins-Monro algorithm is used to robustly estimate the variance, i.e. if |
tol |
A scalar that avoid numerical problems if method='Offline'. Default is |
A list with:
beta |
A |
Residual_Variance |
A |
criterion |
A vector giving the loss for the different chosen |
all_beta |
A list containing the different estimation of the parameters (with respect to the different choices of |
lambda_opt |
A scalar giving the selected |
variance_results |
A list giving the results on the variance of the noise obtained with the help of the function |
Details of the list variance_results
:
Sigma |
The robust estimation of the variance. |
invSigma |
The robuste estimation of the inverse of the variance. |
MCM |
The Median Covariation Matrix. |
eigenvalues |
A vector containing the estimation of the |
MCM_eigenvalues |
A vector containing the estimation of the |
cap |
The result given for capushe for selecting |
reduction_results |
A list containing the results for all possible |
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
See also Robust_Variance
, Robust_regression
and RobRegression-package
.
p=5 q=10 n=2000 mu=rep(0,q) Sigma=diag(c(q,rep(0.1,q-1))) epsilon=mvtnorm::rmvnorm(n = n,mean = mu,sigma = Sigma) X=mvtnorm::rmvnorm(n=n,mean=rep(0,p)) beta=matrix(rnorm(p*q),ncol=q) Y=X %*% beta+epsilon Res_reg=Robust_Mahalanobis_regression(X,Y,par=FALSE) sum((Res_reg$beta-beta)^2)
p=5 q=10 n=2000 mu=rep(0,q) Sigma=diag(c(q,rep(0.1,q-1))) epsilon=mvtnorm::rmvnorm(n = n,mean = mu,sigma = Sigma) X=mvtnorm::rmvnorm(n=n,mean=rep(0,p)) beta=matrix(rnorm(p*q),ncol=q) Y=X %*% beta+epsilon Res_reg=Robust_Mahalanobis_regression(X,Y,par=FALSE) sum((Res_reg$beta-beta)^2)
This function gives robust estimates of the paramter of the Multivariate Linear regression with the help of the euclidean distance, or with the help of the Mahalanobis distance for some matrice Sigma. More precisely, the aim is to minimize
.
Robust_regression(X,Y, Mat_Mahalanobis=diag(rep(1,ncol(Y))), niter=50,lambda=0,c='default',method='Offline', alpha=0.66,w=2,ridge=1,nlambda=50, init=matrix(runif(ncol(X)*ncol(Y))-0.5,nrow=ncol(X),ncol=ncol(Y)), epsilon=10^(-8), Mahalanobis_distance = FALSE, par=TRUE,scale='none',tol=10^(-3))
Robust_regression(X,Y, Mat_Mahalanobis=diag(rep(1,ncol(Y))), niter=50,lambda=0,c='default',method='Offline', alpha=0.66,w=2,ridge=1,nlambda=50, init=matrix(runif(ncol(X)*ncol(Y))-0.5,nrow=ncol(X),ncol=ncol(Y)), epsilon=10^(-8), Mahalanobis_distance = FALSE, par=TRUE,scale='none',tol=10^(-3))
X |
A (n,p)-matrix whose raws are the explaining data. |
Y |
A (n,q)-matrix whose raws are the variables to be explained. |
method |
The method used for estimating the parameter. Should be |
Mat_Mahalanobis |
A (q,q)-matrix giving |
Mahalanobis_distance |
A logical telling if the Mahalanobis distance is used. Default is |
scale |
If a scaling is used. |
niter |
The maximum number of iteration if |
init |
A (p,q)-matrix which gives the initialization of the algorithm. |
ridge |
The power of the penalty: i.e should be |
lambda |
A vector giving the different studied penalizations. If |
nlambda |
The number of tested penalizations if |
par |
Is equal to |
c |
The constant in the stepsequence if the averaged stochastic gradient algorithm, i.e if |
alpha |
A scalar between 1/2 and 1 used in the stepsequence for stochastic gradient algorithm if |
w |
The power for the weighted averaged Robbins-Monro algorithm if |
epsilon |
Stoping condition for the fix point algorithm if |
tol |
A scalar that avoid numerical problems if method='Offline'. Default is |
A list with:
beta |
A (p,q)-matrix giving the estimation of the parameters. |
criterion |
A vector giving the loss for the different chosen |
all_beta |
A list containing the different estimation of the parameters (with respect to the different coices of |
lambda_opt |
A scalar giving the selected |
Godichon-Baggioni, A., Robin, S. and Sansonnet, L. (2023): A robust multivariate linear regression based on the Mahalanobis distance
See also Robust_Variance
, Robust_Mahalanobis_regression
and RobRegression-package
.
p=5 q=10 n=2000 mu=rep(0,q) epsilon=mvtnorm::rmvnorm(n = n,mean = mu) X=mvtnorm::rmvnorm(n=n,mean=rep(0,p)) beta=matrix(rnorm(p*q),ncol=q) Y=X %*% beta+epsilon Res_reg=Robust_regression(X,Y) sum((Res_reg$beta-beta)^2)
p=5 q=10 n=2000 mu=rep(0,q) epsilon=mvtnorm::rmvnorm(n = n,mean = mu) X=mvtnorm::rmvnorm(n=n,mean=rep(0,p)) beta=matrix(rnorm(p*q),ncol=q) Y=X %*% beta+epsilon Res_reg=Robust_regression(X,Y) sum((Res_reg$beta-beta)^2)
The aim is to provide a robust estimation of the variance for Guassian models with reduction dimension. More precisely we considering a q dimensional random vector whose variance can be written as where C is a matrix of rank d, with d possibly much smaller than q,
is a positive scalar, and I is the identity matrix.
Robust_Variance(X,K=ncol(X),par=TRUE,alphaRM=0.75, c='default',w=2,mc_sample_size='default', methodMC='Robbins',niterMC=50,method_MCM='Weiszfeld', eps_vp=10^(-6))
Robust_Variance(X,K=ncol(X),par=TRUE,alphaRM=0.75, c='default',w=2,mc_sample_size='default', methodMC='Robbins',niterMC=50,method_MCM='Weiszfeld', eps_vp=10^(-6))
X |
A matrix whose raws are the vector we want to estimate the variance. |
K |
A vector containing the possible values of d. The 'good' d is chosen with the help of a penatly criterion if the length of K is larger than 10. Default is |
par |
Is equal to |
mc_sample_size |
The number of data generated for the Monte-Carlo method for estimating robustly the eigenvalues of the variance. |
methodMC |
The method chosen to estimate robustly the variance. Can be |
niterMC |
The number of iterations for estimating robustly the variance of each class if |
method_MCM |
The method chosen to estimate Median Covariation Matrix. Can be |
alphaRM |
A scalar between 1/2 and 1 used in the stepsequence for the Robbins-Monro method if |
c |
The constant in the stepsequence if |
w |
The power for the weighted averaged Robbins-Monro algorithm if |
eps_vp |
The minimum values for the estimates of the eigenvalues of the Variance can take. Default is |
A list with:
Sigma |
The robust estimation of the variance. |
invSigma |
The robuste estimation of the inverse of the variance. |
MCM |
The Median Covariation Matrix. |
eigenvalues |
A vector containing the estimation of the d+1 main eigenvalues of the variance, where d+1 is the optimal choice belong K. |
MCM_eigenvalues |
A vector containing the estimation of the d+1 main eigenvalues of the Median Covariation Matrix, where d+1 is the optimal choice belong K. |
cap |
The result given for capushe for selecting d if the length of K is larger than 10. |
reduction_results |
A list containing the results for all possible K. |
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
See also Robust_Mahalanobis_regression
, Robust_regression
and RobRegression-package
.
q<-100 d<-10 n<-2000 Sigma<- diag(c(d:1,rep(0,q-d)))+ diag(rep(0.1,q)) X=mvtnorm::rmvnorm(n=n,sigma=Sigma) RobVar = Robust_Variance(X,K=q) sum((RobVar$Sigma-Sigma)^2)/q
q<-100 d<-10 n<-2000 Sigma<- diag(c(d:1,rep(0,q-d)))+ diag(rep(0.1,q)) X=mvtnorm::rmvnorm(n=n,sigma=Sigma) RobVar = Robust_Variance(X,K=q) sum((RobVar$Sigma-Sigma)^2)/q