Diagnostics
wls
LinRegOutliers.OrdinaryLeastSquares.wls
— Functionwls(X, y, wts)
Estimate weighted least squares regression and create OLS object with estimated parameters.
Arguments
X::AbstractMatrix{Float64}
: Design matrix.y::AbstractVector{Float64}
: Response vector.wts::AbstractVector{Float64}
: Weights vector.
Examples
julia> X = hcat(ones(24), phones[:,"year"]);
julia> y = phones[:,"calls"];
julia> w = ones(24)
julia> w[15:20] .= 0.0
julia> reg = wls(X, y, w)
julia> reg.betas
2-element Vector{Float64}:
-63.481644325290425
1.3040571939231453
dffit
LinRegOutliers.Diagnostics.dffit
— Functiondffit(setting, i)
Calculate the effect of the ith observation on the linear regression fit.
Arguments
setting::RegressionSetting
: A regression setting object.i::Int
: Index of the observation.
Examples
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> dffit(reg, 1)
2.3008326745719785
julia> dffit(reg, 15)
2.7880619386124295
julia> dffit(reg, 16)
3.1116532421969794
julia> dffit(reg, 17)
4.367981450347031
julia> dffit(reg, 21)
-5.81610150322166
References
Belsley, David A., Edwin Kuh, and Roy E. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. Vol. 571. John Wiley & Sons, 2005.
dffits
LinRegOutliers.Diagnostics.dffits
— Functiondffits(setting)
Calculate dffit
for all observations.
Arguments
setting::RegressionSetting
: A regression setting object.
Examples
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> dffits(reg)
24-element Vector{Float64}:
2.3008326745719785
1.2189579001467337
0.35535667547543426
-0.14458523141740898
-0.5558346324846752
-0.8441316814464983
-1.0329184407957257
-1.16600692151232
-1.2005633711667656
-1.2549187193476428
-1.3195581500053777
-1.42383876236147
-1.5917690629803474
-1.6582086833534504
2.7880619386124295
3.1116532421969794
4.367981450347031
5.927603041427858
8.442860517217582
12.370243663029527
-5.81610150322166
-10.089153963127842
-12.10803256546825
-14.67006851119936
References
Belsley, David A., Edwin Kuh, and Roy E. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. Vol. 571. John Wiley & Sons, 2005.
hatmatrix
LinRegOutliers.Diagnostics.hatmatrix
— Functionhatmatrix(setting)
Calculate Hat matrix of dimensions n x n for a given regression setting with n observations.
Arguments
setting::RegressionSetting
: A regression setting object.
Examples
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> size(hatmatrix(reg))
(24, 24)
studentizedResiduals
LinRegOutliers.Diagnostics.studentizedResiduals
— FunctionstudentizedResiduals(setting)
Calculate Studentized residuals for a given regression setting.
# Arguments:
setting::RegressionSetting
: A regression setting object.
Examples
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> studentizedResiduals(reg)
24-element Vector{Float64}:
0.2398783264505892
0.1463945666608097
0.04934549995087145
-0.023289236798461784
-0.10408303320973748
-0.18382934382804111
-0.2609395640240455
-0.33934473417314376
-0.3973205657179429
-0.46258080183149236
-0.5261488085924144
-0.5918396227060093
-0.6616423337899147
-0.6611792918262785
1.0277190922689816
1.0297863954540103
1.2712201589839855
1.4974523565936426
1.8386296155264197
2.316394853333409
-0.9368354141338643
-1.4009989983319822
-1.4541520919831887
-1.529459974327181
adjustedResiduals
LinRegOutliers.Diagnostics.adjustedResiduals
— FunctionadjustedResiduals(setting)
Calculate adjusted residuals for a given regression setting.
# Arguments:
setting::RegressionSetting
: A regression setting object.
Examples
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> adjustedResiduals(reg)
24-element Vector{Float64}:
13.486773572526268
8.2307993473897
2.774371467851612
-1.3093999279776498
-5.851901346871404
-10.335509559699863
-14.670907823058053
-19.07911256736661
-22.338710565623828
-26.00786250934617
-29.58187157605512
-33.27523207616458
-37.19977737822219
-37.173743587631165
57.781855070799956
57.898085871534626
71.47231139524963
84.19185329435882
103.37399662263209
130.23557965295348
-52.6720662600165
-78.76891816539992
-81.75736547266746
-85.9914301855088
jacknifedS
LinRegOutliers.Diagnostics.jacknifedS
— FunctionjacknifedS(setting, k)
Estimate standard error of regression with the kth observation is dropped.
# Arguments
setting::RegressionSetting
: A regression setting object.k::Int
: Index of the omitted observation.
# Examples
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> jacknifedS(reg, 2)
57.518441664761035
julia> jacknifedS(reg, 15)
56.14810222161477
cooks
LinRegOutliers.Diagnostics.cooks
— Functioncooks(setting)
Calculate Cook distances for all observations in a regression setting.
Arguments
setting::RegressionSetting
: A regression setting object.
Examples
julia> reg = createRegressionSetting(@formula(calls ~ year), phones);
julia> cooks(reg)
24-element Vector{Float64}:
0.005344774190779822
0.0017088194691033689
0.00016624914057962608
3.1644452583114795e-5
0.0005395058666404081
0.0014375008774859539
0.0024828140956511258
0.0036279720445167277
0.004357605989540906
0.005288503758364767
0.006313578057565415
0.0076561205696857254
0.009568574875389256
0.009970039008782357
0.02610396373381051
0.029272523880917646
0.05091236198400663
0.08176555044049343
0.14380266904640235
0.26721539425047447
0.051205153558783356
0.13401084683481085
0.16860324592350226
0.2172819114905912
References
Cook, R. Dennis. "Detection of influential observation in linear regression." Technometrics 19.1 (1977): 15-18.
cooksoutliers
LinRegOutliers.Diagnostics.cooksoutliers
— Functioncooksoutliers(setting; alpha = 0.5)
Calculates Cooks distance for a given regression setting and reports the potentials outliers
Arguments
setting::RegressionSetting
: RegressionSetting object with a formula and dataset.alpha::Float
: Probability for cutoff value. quantile(Fdist(p, n-p), alpha) is used for cutoff. Default is 0.5.
Output
["distance"]
: Cooks distances.["cutoff"]
: Quantile of the F distribution.["potentials"]
: Vector of indices of potential regression outliers.
mahalanobisSquaredMatrix
LinRegOutliers.Diagnostics.mahalanobisSquaredMatrix
— FunctionmahalanobisSquaredMatrix(data::DataFrame; meanvector=nothing, covmatrix=nothing)
Calculate Mahalanobis distances.
Arguments
data::DataFrame
: A DataFrame object of the multivariate data.meanvector::AbstractVector{Float64}
: Optional mean vector of variables.covmatrix::AbstractMatrix{Float64}
: Optional covariance matrix of data.
# References
Mahalanobis, Prasanta Chandra. "On the generalized distance in statistics." National Institute of Science of India, 1936.
dfbeta
LinRegOutliers.Diagnostics.dfbeta
— Functiondfbeta(setting, omittedIndex)
Apply DFBETA diagnostic for a given regression setting and observation index.
Arguments
setting::RegressionSetting
: A regression setting object.omittedIndex::Int
: Index of the omitted observation.
Example
julia> setting = createRegressionSetting(@formula(calls ~ year), phones);
julia> dfbeta(setting, 1)
2-element Vector{Float64}:
9.643915678524024
-0.14686166007904422
dfbetas
LinRegOutliers.Diagnostics.dfbetas
— Functiondfbetas(setting)
Apply DFBETA diagnostic of all of the observations for a given regression setting.
Arguments
setting::RegressionSetting
: A regression setting object.
See also: dfbeta
covratio
LinRegOutliers.Diagnostics.covratio
— Functioncovratio(setting, omittedIndex)
Apply covariance ratio diagnostic for a given regression setting and observation index.
Arguments
setting::RegressionSetting
: A regression setting object.omittedIndex::Int
: Index of the omitted observation.
Example
julia> setting = createRegressionSetting(@formula(calls ~ year), phones);
julia> covratio(setting, 1)
1.2945913799871505
hadimeasure
LinRegOutliers.Diagnostics.hadimeasure
— Functionhadimeasure(setting; c = 2.0)
Apply Hadi's regression diagnostic for a given regression setting
Arguments
setting::RegressionSetting
: A regression setting object.c::Float64
: Critical value selected between 2.0 - 3.0. The default is 2.0.
Example
julia> setting = createRegressionSetting(@formula(calls ~ year), phones);
julia> hadimeasure(setting)
References
Chatterjee, Samprit and Hadi, Ali. Regression Analysis by Example. 5th ed. N.p.: John Wiley & Sons, 2012.
diagnose
LinRegOutliers.Diagnostics.diagnose
— Functiondiagnose(setting; alpha = 0.5)
Diagnose a regression setting and report potential outliers using dffits
, dfbetas
cooks
, and hatmatrix
Arguments
setting::RegressionSetting
: A regression setting object.alpha
: Alpha value for Cooks distance cutoff. Seecooksoutliers
.