MixedModelsSmallSample.MixedModelsSmallSampleModule
MixedModelsSmallSample

Small-sample corrections for linear mixed models fitted by MixedModels.jl.

Linear Mixed Model

A linear mixed model expresses the $n$-vector of responses as

\[y = X\beta + Zu + \varepsilon, \quad u \sim \mathcal{N}(0, G(\theta)), \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n),\]

where $X$ is the $n \times p$ fixed-effects design matrix, $\beta$ is the $p$-vector of fixed-effects coefficients, $Z$ is the random-effects design matrix, $u$ is the vector of random effects with covariance $G(\theta)$, and $\varepsilon$ is the residual error.

The marginal variance of $y$ is

\[V(\theta) = ZG(\theta)Z^\top + \sigma^2 I_n.\]

The Small-Sample Problem

The generalized least squares estimator $\hat{\beta}$ has covariance

\[\Phi(\theta) = (X^\top V(\theta)^{-1} X)^{-1}.\]

Standard inference replaces $\theta$ with its REML estimate $\hat{\theta}$, yielding $\hat{\Phi} = \Phi(\hat{\theta})$, and uses asymptotic approximations (e.g., treating $t$-statistics as normally distributed). For small samples, this ignores:

  1. Bias in $\hat{\Phi}$ due to estimating $\theta$
  2. Uncertainty in $\hat{\theta}$ affecting the reference distribution

This package implements two corrections:

  • Kenward-Roger (KenwardRoger): Applies a bias correction to $\hat{\Phi}$ and uses a scaled F-distribution with approximate degrees of freedom.
  • Satterthwaite (Satterthwaite): Uses moment-matching to approximate degrees of freedom for the t/F reference distribution.

References

  • Kenward, M. G. and Roger, J. H. (1997). "Small sample inference for fixed effects from restricted maximum likelihood." Biometrics, 53(3), 983-997.
  • Satterthwaite, F. E. (1946). "An Approximate Distribution of Estimates of Variance Components." Biometrics Bulletin, 2(6), 110-114.
  • Fai, A. H.-T. and Cornelius, P. L. (1996). "Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments." Journal of Statistical Computation and Simulation, 54(4), 363-378.

See Also

source

Getting Started

We will look at a split plot experiment, described in chapter 10 of:

Goos, Peter, and Bradley Jones. Optimal design of experiments: a case study approach. John Wiley & Sons, 2011.

Split-plot experiments have hard-to-change factors and easy-to-change factors. Hard-to-change factors are not properly randomized, they are kept at the same factor setting for multiple observations. The dataset can thus be split up into blocks, where the hard-to-change factors remain constant. These blocks are also called whole plots. The observations within such a block are more similar to one another than observations coming from a different block. A random intercept per block is introduced to deal with the correlation between observations. In the below dataset, FRH and RRH are hard-to-change, while YA and GC are easy-to-change. The WP column shows that there are 10 blocks in the dataset in which FRH and RRH remain constant. Efficiency is the output we want to model.

Click here to expand the details loading the data.
#! format: off
using DataFrames
df = DataFrame(
  WP = [1,1,1,1,1,
        2,2,2,2,2,
        3,3,3,3,3,
        4,4,4,4,4,
        5,5,5,5,5,
        6,6,6,6,6,
        7,7,7,7,7,
        8,8,8,8,8,
        9,9,9,9,9,
        10,10,10,10,10],
  FRH = [-1,-1,-1,-1,-1,
         0,0,0,0,0,
         1,1,1,1,1,
         -1,-1,-1,-1,-1,
         0,0,0,0,0,
         0,0,0,0,0,
         1,1,1,1,1,
         -1,-1,-1,-1,-1,
         0,0,0,0,0,
         1,1,1,1,1],
  RRH = [-1,-1,-1,-1,-1,
         -1,-1,-1,-1,-1,
         -1,-1,-1,-1,-1,
         0,0,0,0,0,
         0,0,0,0,0,
         0,0,0,0,0,
         0,0,0,0,0,
         1,1,1,1,1,
         1,1,1,1,1,
         1,1,1,1,1],
  YA = [-1,1,0,1,-1,
        1,0,1,0,-1,
        1,1,-1,-1,0,
        -1,0,1,0,-1,
        0,1,0,-1,0,
        1,0,0,0,-1,
        0,1,0,0,-1,
        -1,1,0,1,-1,
        -1,0,0,0,1,
        -1,1,-1,1,0],
  GC = [-1,1,0,-1,1,
        1,0,-1,1,0,
        -1,1,-1,1,0,
        0,1,0,-1,1,
        -1,0,0,1,0,
        0,0,0,0,-1,
        -1, 1,0,1,0,
        -1,-1,0,1,1,
        0,-1,0,1,1,
        1,-1,-1,0,1],
  EFFICIENCY = [
                0.873,0.969,0.959,0.821,1.019,
                0.921,0.923,0.821,0.958,0.914,
                0.705,0.861,0.756,0.861,0.78,
                0.999,1.06,0.91,0.854,1.082,
                0.841,0.885,0.905,1.025,0.936,
                0.844,0.891,0.902,0.871,0.833,
                0.786,0.879,0.86,0.905,0.87,
                0.98,0.907,1.045,1.094,1.159,
                1.0,0.901,0.99,1.059,1.025,
                0.991,0.828,0.853,0.923,0.997
                ]
  )
50×6 DataFrame
RowWPFRHRRHYAGCEFFICIENCY
Int64Int64Int64Int64Int64Float64
11-1-1-1-10.873
21-1-1110.969
31-1-1000.959
41-1-11-10.821
51-1-1-111.019
620-1110.921
720-1000.923
820-11-10.821
920-1010.958
1020-1-100.914
1131-11-10.705
1231-1110.861
1331-1-1-10.756
1431-1-110.861
1531-1000.78
164-10-100.999
174-10011.06
184-10100.91
194-100-10.854
204-10-111.082
215000-10.841
22500100.885
23500000.905
24500-111.025
25500000.936
26600100.844
27600000.891
28600000.902
29600000.871
30600-1-10.833
317100-10.786
32710110.879
33710000.86
34710010.905
35710-100.87
368-11-1-10.98
378-111-10.907
388-11001.045
398-11111.094
408-11-111.159
41901-101.0
429010-10.901
43901000.99
44901011.059
45901111.025
461011-110.991
4710111-10.828
481011-1-10.853
491011100.923
501011010.997
df[1:10, :]
10×6 DataFrame
RowWPFRHRRHYAGCEFFICIENCY
Int64Int64Int64Int64Int64Float64
11-1-1-1-10.873
21-1-1110.969
31-1-1000.959
41-1-11-10.821
51-1-1-111.019
620-1110.921
720-1000.923
820-11-10.821
920-1010.958
1020-1-100.914
df[45:50, :]
6×6 DataFrame
RowWPFRHRRHYAGCEFFICIENCY
Int64Int64Int64Int64Int64Float64
1901111.025
21011-110.991
310111-10.828
41011-1-10.853
51011100.923
61011010.997

We work with a full quadratic model, where the intercept is allowed to vary between blocks. First, we get the output using only MixedModels. Here, the p-values for the main effects (first-order terms) are all below 1e-10.

using MixedModels
fm = @formula(
    EFFICIENCY ~
        1 +
    (1 | WP) +
    FRH +
    RRH +
    YA +
    GC +
    FRH & RRH +
    FRH & YA +
    FRH & GC +
    RRH & YA +
    RRH & GC +
    YA & GC +
    FRH & FRH +
    RRH & RRH +
    YA & YA +
    GC & GC
)
m = fit(MixedModel, fm, df; REML=true)
Est.SEzpσ_WP
(Intercept)0.91140.011777.67<1e-990.0182
FRH-0.06090.0079-7.70<1e-13
RRH0.05220.00796.60<1e-10
YA-0.02470.0027-9.04<1e-18
GC0.07450.002727.54<1e-99
FRH & RRH0.00420.00970.440.6605
FRH & YA0.01060.00333.240.0012
FRH & GC-0.01110.0032-3.460.0005
RRH & YA-0.00150.0033-0.470.6384
RRH & GC0.00780.00322.440.0147
YA & GC-0.00000.0033-0.010.9939
FRH & FRH-0.00750.0127-0.580.5587
RRH & RRH0.02910.01272.280.0224
YA & YA-0.00750.0047-1.600.1105
GC & GC-0.00640.0048-1.340.1812
Residual0.0148

Now we look at the adjustments made in the Kenward Roger approach. The p-values for the main effects of the hard-to-change factors are much larger. This is primarily because the degrees of freedom for these factors are very limited.

Similar results hold for the second-order terms. Terms involving only the hard-to-change factors have limited degrees of freedom. The interactions involving both a hard and an easy-to-change factor have similar degrees of freedom as terms only involving easy-to-change factors.

using MixedModelsSmallSample
kr = small_sample_adjust(m, KenwardRoger(; fim=ExpectedFIM()))
Coef.Std. ErrorDenDFtPr(>|t|)
(Intercept)0.911390.0117344.207877.6698.3744e-8
FRH-0.0608680.00790253.9751-7.70230.0015681
RRH0.0521560.00790253.97516.59990.0027915
YA-0.0247190.002735231.03-9.03723.3648e-10
GC0.0745430.00271131.18927.4971.9693e-23
FRH & RRH0.00424870.00967283.96570.439250.68336
FRH & YA0.0105840.003263531.0253.24310.0028267
FRH & GC-0.0110650.003202131.073-3.45560.0016105
RRH & YA-0.00153680.00327131.032-0.469820.64177
RRH & GC0.00782470.003211831.1142.43620.020759
YA & GC-2.5083e-50.003285331.383-0.0076350.99396
FRH & FRH-0.00745490.0127494.0712-0.584770.58958
RRH & RRH0.0290960.0127434.06412.28330.083427
YA & YA-0.00748780.004700531.21-1.5930.12124
GC & GC-0.00642160.004807931.11-1.33560.19136

The standard errors on the estimates are also slightly different. However, for this specific experiment, the adjustment does not have a large effect. An example where there is a large difference in standard error will be added later.

Multiple random effects

using CSV
using DataFrames
using MixedModels

using MixedModelsSmallSample

df = DataFrame(MixedModels.dataset(:sleepstudy))
fm = @formula(reaction ~ 1 + days + (1 + days | subj))
m = fit(MixedModel, fm, df; REML=true)
Est.SEzpσ_subj
(Intercept)251.40516.824636.84<1e-9924.7405
days10.46731.54586.77<1e-105.9221
Residual25.5918
kr = small_sample_adjust(m, KenwardRoger(; fim=ExpectedFIM())) # or ObservedFIM()
Coef.Std. ErrorDenDFtPr(>|t|)
(Intercept)251.416.824617.036.8381.1709e-17
days10.4671.545817.06.77153.2638e-6

More examples

Scalar Random Effects

Vector Random Effects

Similar software

How It Works

The computations follow the same high-level flow for both Kenward-Roger and Satterthwaite:

  1. Compute $V(\theta)$, $V(\theta)^{-1}$, and derivative matrices for either Unstructured or FactorAnalytic.
  2. Compute W = I(θ)^{-1} via vcov_varpar, using either ObservedFIM or ExpectedFIM.
  3. For KenwardRoger, compute the bias-corrected covariance of β (see MixedModelsSmallSample.compute_adjusted_covariance).
  4. Compute denominator degrees of freedom and (for KR) the F-scaling factor (see MixedModelsSmallSample.compute_dof).

API Reference

Main Functions

MixedModelsSmallSample.small_sample_adjustFunction
small_sample_adjust(m::MixedModel, method) -> LinearMixedModelAdjusted

Apply small-sample corrections to a fitted linear mixed model.

This is the main entry point for computing adjusted inference. The function:

  1. Validates that m is unweighted; KenwardRoger additionally requires a REML fit
  2. Computes $W = I(\theta)^{-1}$, the asymptotic covariance of $\hat{\theta}$ (see vcov_varpar)
  3. For KenwardRoger: computes adjusted covariance $\Phi_A$

(see compute_adjusted_covariance)

  1. Computes denominator degrees of freedom $\nu_k$ for each fixed effect (see compute_dof)

Arguments

  • m::MixedModel: A fitted LinearMixedModel from MixedModels.jl (must use REML for KR; ML or REML for SW)
  • method: Either KenwardRoger or Satterthwaite

Returns

The returned object can be displayed to show adjusted coefficient tables with corrected standard errors (KR only) and degrees of freedom.

Example

using MixedModels, MixedModelsSmallSample
m = fit(MixedModel, @formula(y ~ x + (1|g)), data; REML=true)
kr = small_sample_adjust(m, KenwardRoger())
sw = small_sample_adjust(m, Satterthwaite())

See Also

source
MixedModelsSmallSample.small_sample_ftestFunction
small_sample_ftest(m::LinearMixedModelAdjusted, L) -> (ν, F*)

Compute an adjusted F-test for the null hypothesis $L^\top\beta = 0$.

Returns $(ν, F^*)$ where $(ν, λ)$ is obtained from compute_dof and $F^* = λF$ with $F = β^\top Mβ/q$ and $M = L(L^\top\Phi L)^{-1}L^\top$ (using the unadjusted covariance $\Phi$).

Arguments

Returns

  • ν: Denominator degrees of freedom
  • F*: Scaled F-statistic

Under $H_0$, $F^* \sim F(q, \nu)$ approximately.

Example

kr = small_sample_adjust(model, KenwardRoger())
L = [0 1 0; 0 0 1]'  # Test β₂ = β₃ = 0
ν, Fstar = small_sample_ftest(kr, L)
pvalue = 1 - cdf(FDist(size(L,2), ν), Fstar)
source
small_sample_ftest(m::MixedModel, L; method=KenwardRoger()) -> (ν, F*)

Convenience wrapper that first applies small_sample_adjust then computes the F-test.

Equivalent to:

m_adj = small_sample_adjust(m, method)
small_sample_ftest(m_adj, L)

Arguments

source

Adjustment Methods

Options

MixedModelsSmallSample.ObservedFIMType
ObservedFIM

Use the observed (Hessian-based) Fisher Information Matrix for the variance parameters.

Let $V$, $V^{-1}$, $dV_i$ and $d^2V_{ij}$ come from VarianceDecomposition. Define

\[\Phi = (X^\top V^{-1} X)^{-1},\qquad P = V^{-1} - V^{-1}X\,\Phi\,X^\top V^{-1},\qquad r = y - X\hat{\beta}.\]

Then the observed information is

\[I^{\text{obs}}_{ij}(\theta) = -\frac{1}{2}\operatorname{tr}\left(P dV_i P dV_j\right) + r^{\top} V^{-1} dV_i P dV_j V^{-1} r + \frac{1}{2}\operatorname{tr}\left(P d^2V_{ij}\right) - \frac{1}{2} r^{\top} V^{-1} d^2V_{ij} V^{-1} r.\]

See Also

source
MixedModelsSmallSample.ExpectedFIMType
ExpectedFIM

Use the expected Fisher Information Matrix for the variance parameters.

Let $V^{-1}$, $dV_i$, $P_i$ and $Q_{ij}$ come from VarianceDecomposition, and define $\Phi = (X^\top V^{-1} X)^{-1}$.

Then the expected information used by vcov_varpar is

\[I^{\text{exp}}_{ij}(\theta) = \frac{1}{2}\operatorname{tr}\left(V^{-1} dV_i V^{-1} dV_j\right) - \operatorname{tr}(\Phi\,Q_{ij}) + \frac{1}{2}\operatorname{tr}(\Phi\,P_i\,\Phi\,P_j).\]

See Also

source
MixedModelsSmallSample.UnstructuredType
Unstructured

Unstructured (variance component) parameterization.

In this parameterization, the marginal variance is linear in the variance parameters. Writing the model as $V(\theta) = \sigma^2 I_n + \sum_b Z_b G_b(\theta) Z_b^\top$, Unstructured parameterizes each block covariance $G_b(\theta)$ by its unique (co)variance components.

Gradient Structure

Because $V$ is linear in $\theta$, the derivatives are trivial: each $\partial V/\partial\theta_i$ is constant and $\partial^2 V/(\partial\theta_i\,\partial\theta_j) = 0$.

This implies the $R_{ij}$ terms used by the Kenward-Roger covariance correction vanish.

See Also

source
MixedModelsSmallSample.FactorAnalyticType
FactorAnalytic

Factor analytic (Cholesky) parameterization.

For each random-effects block $b$, decompose the block covariance as $G_b = L_b L_b^\top$ with $L_b$ lower triangular. The variance parameters are $\sigma^2$ and the free elements of the $L_b$.

The marginal variance is

\[V = \sigma^2 I_n + \sum_b Z_b G_b Z_b^\top = \sigma^2 I_n + \sum_b Z_b L_b L_b^\top Z_b^\top.\]

For MixedModels.jl models, the internal random-effects factor is stored as a relative factor $\lambda_b$; we use the scaled factor $L_b = \sigma\,\lambda_b$.

Derivatives of V

The derivative of $V$ with respect to a Cholesky element $L_{b,ij}$ uses the chain rule through $G_b = L_b L_b^\top$:

\[\frac{\partial V}{\partial L_{ij}} = Z_b \frac{\partial G_b}{\partial L_{ij}} Z_b^\top, \qquad \frac{\partial G_b}{\partial L_{ij}} = E_{ij} L_b^\top + L_b E_{ij}^\top,\]

where $E_{ij}$ is the elementary matrix with a 1 at position $(i,j)$.

The second derivatives are non-zero. For indices within the same block,

\[\frac{\partial^2 G_b}{\partial L_{i_1 j_1}\,\partial L_{i_2 j_2}} = \delta_{j_1 j_2}\,(E_{i_1 i_2} + E_{i_2 i_1}),\]

and hence

\[\frac{\partial^2 V}{\partial L_{i_1 j_1}\,\partial L_{i_2 j_2}} = Z_b\,\frac{\partial^2 G_b}{\partial L_{i_1 j_1}\,\partial L_{i_2 j_2}}\,Z_b^\top.\]

Unlike Unstructured, this parameterization has non-zero second derivatives $\partial^2 V / \partial L_{ij} \partial L_{kl}$, which must be included in the Kenward-Roger bias correction.

See Also

source

Result Types

MixedModelsSmallSample.LinearMixedModelAdjustedType
LinearMixedModelAdjusted{T<:LinearMixedModel, O<:AbstractLinearMixedModelAdjustment}

A wrapper around a LinearMixedModel that holds the small-sample adjusted results.

Fields

  • m: The original LinearMixedModel.
  • adj: The adjustment option used (KenwardRoger or Satterthwaite).
  • varcovar_adjusted: The adjusted variance-covariance matrix of the fixed effects (corrected for KR, unadjusted for SW).
  • W: The asymptotic covariance matrix of the variance parameters.
  • P: Derived matrix (P_i).
  • Q: Derived matrix (Q_{ij}) (optional, nothing for Satterthwaite).
  • R: Derived matrix (R_{ij}) (optional, nothing for Satterthwaite).
  • R_factor: Scaling factor for (R_{ij}) (0.0 for Satterthwaite).
  • ν: The denominator degrees of freedom for each fixed effect coefficient.
source

Internals

These types and functions are not exported but may be useful for understanding or extending the package.

Variance Decomposition

MixedModelsSmallSample.VarianceDecompositionType
VarianceDecomposition

Cached matrices derived from the marginal covariance V(θ) and its derivatives.

Let θs denote the variance parameters, and write dVs[i] = ∂V/∂θ_i and d2Vs[i,j] = ∂²V/(∂θ_i∂θ_j).

This struct stores V, Vinv, θs, dVs, d2Vs, and the derived matrices

\[P_i = -X^\top V^{-1} dV_i V^{-1} X, \qquad Q_{ij} = X^\top V^{-1} dV_i V^{-1} dV_j V^{-1} X, \qquad R_{ij} = X^\top V^{-1} d^2V_{ij} V^{-1} X,\]

and R_factor. R_factor is a parameterization-dependent scalar applied to R_{ij} in the Kenward-Roger covariance bias correction (see compute_adjusted_covariance).

The definition of θ (and thus the rest of the calculated quantities here) depends on the chosen parameterization; see Unstructured and FactorAnalytic.

source

Covariance Adjustment Computation

MixedModelsSmallSample.compute_adjusted_covarianceFunction
compute_adjusted_covariance(Φ, vd, W, method) -> Matrix

Compute the (possibly adjusted) covariance matrix of fixed effects.

For KenwardRoger, applies the bias correction:

\[\Phi_A = \hat{\Phi} + 2\hat{\Phi}\,B\,\hat{\Phi},\]

where the bias matrix $B$ accounts for the variability in estimating $\theta$:

\[B = \sum_{i,j} W_{ij} \bigl(Q_{ij} - P_i\hat{\Phi}P_j + R_{\text{factor}}\cdot R_{ij}\bigr).\]

Here $W$ is the asymptotic covariance of $\hat{\theta}$, and $P_i, Q_{ij}, R_{ij}$ are defined in VarianceDecomposition. The R_factor is -1/4 for FactorAnalytic parameterization and 1 for Unstructured.

For Satterthwaite, returns $\Phi$ unchanged (no bias correction).

Arguments

source

Degrees of Freedom Computation

MixedModelsSmallSample.compute_dofFunction
compute_dof(m::LinearMixedModelAdjusted, L) -> (ν, λ)

Compute degrees of freedom and F-statistic scaling factor for contrast L.

Dispatches to the specific method based on m.adj:

Arguments

Returns

  • ν: Denominator degrees of freedom
  • λ: F-statistic scaling factor (1.0 for Satterthwaite)
source
compute_dof(::KenwardRoger, m, L)

Kenward-Roger denominator degrees of freedom and F-scaling factor.

For testing $H_0: L^\top\beta = 0$ with $q$ constraints, the standard Wald F-statistic $F = \beta^\top M \beta / q$ (where $M = L(L^\top\Phi L)^{-1}L^\top$) is scaled to $F^* = \lambda F$. The pair $(\nu, \lambda)$ is chosen so that $F^*$ approximately follows an $F(q, \nu)$ distribution.

Algorithm

  1. Compute moment quantities:

    \[A_1 = \sum_{i,j} W_{ij} \operatorname{tr}(M\Phi P_i \Phi) \operatorname{tr}(M\Phi P_j \Phi)\]

    \[A_2 = \sum_{i,j} W_{ij} \operatorname{tr}(M\Phi P_i \Phi\, M\Phi P_j \Phi)\]

  2. Compute intermediate terms:

    \[B = \frac{A_1 + 6A_2}{2q}, \quad g = \frac{(q+1)A_1 - (q+4)A_2}{(q+2)A_2}\]

    \[c_1 = \frac{g}{3q + 2(1-g)}, \quad c_2 = \frac{q-g}{3q + 2(1-g)}, \quad c_3 = \frac{q+2-g}{3q + 2(1-g)}\]

  3. Compute expectation and variance of the scaled statistic:

    \[E^* = \frac{1}{1 - A_2/q}, \quad V^* = \frac{2}{q} \cdot \frac{1 + c_1 B}{(1-c_2 B)^2 (1-c_3 B)}\]

  4. Match to F-distribution moments:

    \[\rho = \frac{V^*}{2(E^*)^2}, \quad \nu = 4 + \frac{q+2}{q\rho - 1}, \quad \lambda = \frac{\nu}{E^*(\nu - 2)}\]

source
compute_dof(::Satterthwaite, m, L)

Satterthwaite denominator degrees of freedom for contrast L.

For a scalar contrast $c^\top\beta$, the variance is $v = c^\top\Phi\,c$. The Satterthwaite approximation matches the first two moments of $\hat{v}$ to a scaled chi-squared distribution, yielding:

\[\nu = \frac{2v^2}{(\nabla_\theta v)^\top W (\nabla_\theta v)},\]

where the gradient is $\nabla_\theta v = [-c^\top\Phi\,P_i\,\Phi\,c]_i$ and $W$ is the asymptotic covariance of $\hat{\theta}$.

Multivariate Contrasts

For a matrix $L$ (multiple constraints), the algorithm:

  1. Computes the eigendecomposition of $L^\top\Phi\,L$
  2. Transforms to orthogonal contrasts $\tilde{L} = L \cdot \text{eigenvectors}$
  3. Computes $\nu_i$ for each orthogonal contrast
  4. Combines using: $\nu = 2 E_Q / (E_Q - q)$ where $E_Q = \sum_i \nu_i/(\nu_i - 2)$

Reference

Satterthwaite, F. E. (1946). "An Approximate Distribution of Estimates of Variance Components." Biometrics Bulletin, 2(6), 110-114.

source