MixedModelsSmallSample

The confidence intervals and hypothesis tests for the fixed effects in MixedModels.jl are based on large sample approximations. This package implements small sample corrections for these intervals and tests, as described in:

Kenward, Michael G., and James H. Roger. "Small sample inference for fixed effects from restricted maximum likelihood." Biometrics (1997): 983-997,

and in:

Hrong-Tai Fai, Alex, and Paul L. Cornelius. "Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments." Journal of statistical computation and simulation 54.4 (1996): 363-378.

Getting Started

We will look at a split plot experiment, described in chapter 10 of:

Goos, Peter, and Bradley Jones. Optimal design of experiments: a case study approach. John Wiley & Sons, 2011.

Split-plot experiments have hard-to-change factors and easy-to-change factors. Hard-to-change factors are not properly randomized, they are kept at the same factor setting for multiple observations. The dataset can thus be split up into blocks, where the hard-to-change factors remain constant. These blocks are also called whole plots. The observations within such a block are more similar to one another than observations coming from a different block. A random intercept per block is introduced to deal with the correlation between observations. In the below dataset, FRH and RRH are hard-to-change, while YA and GC are easy-to-change. The WP column shows that there are 10 blocks in the dataset in which FRH and RRH remain constant. Efficiency is the output we want to model.

Click here to expand the details loading the data.
#! format: off
using DataFrames
df = DataFrame(
  WP = [1,1,1,1,1,
        2,2,2,2,2,
        3,3,3,3,3,
        4,4,4,4,4,
        5,5,5,5,5,
        6,6,6,6,6,
        7,7,7,7,7,
        8,8,8,8,8,
        9,9,9,9,9,
        10,10,10,10,10],
  FRH = [-1,-1,-1,-1,-1,
         0,0,0,0,0,
         1,1,1,1,1,
         -1,-1,-1,-1,-1,
         0,0,0,0,0,
         0,0,0,0,0,
         1,1,1,1,1,
         -1,-1,-1,-1,-1,
         0,0,0,0,0,
         1,1,1,1,1],
  RRH = [-1,-1,-1,-1,-1,
         -1,-1,-1,-1,-1,
         -1,-1,-1,-1,-1,
         0,0,0,0,0,
         0,0,0,0,0,
         0,0,0,0,0,
         0,0,0,0,0,
         1,1,1,1,1,
         1,1,1,1,1,
         1,1,1,1,1],
  YA = [-1,1,0,1,-1,
        1,0,1,0,-1,
        1,1,-1,-1,0,
        -1,0,1,0,-1,
        0,1,0,-1,0,
        1,0,0,0,-1,
        0,1,0,0,-1,
        -1,1,0,1,-1,
        -1,0,0,0,1,
        -1,1,-1,1,0],
  GC = [-1,1,0,-1,1,
        1,0,-1,1,0,
        -1,1,-1,1,0,
        0,1,0,-1,1,
        -1,0,0,1,0,
        0,0,0,0,-1,
        -1, 1,0,1,0,
        -1,-1,0,1,1,
        0,-1,0,1,1,
        1,-1,-1,0,1],
  EFFICIENCY = [
                0.873,0.969,0.959,0.821,1.019,
                0.921,0.923,0.821,0.958,0.914,
                0.705,0.861,0.756,0.861,0.78,
                0.999,1.06,0.91,0.854,1.082,
                0.841,0.885,0.905,1.025,0.936,
                0.844,0.891,0.902,0.871,0.833,
                0.786,0.879,0.86,0.905,0.87,
                0.98,0.907,1.045,1.094,1.159,
                1.0,0.901,0.99,1.059,1.025,
                0.991,0.828,0.853,0.923,0.997
                ]
  )
50×6 DataFrame
RowWPFRHRRHYAGCEFFICIENCY
Int64Int64Int64Int64Int64Float64
11-1-1-1-10.873
21-1-1110.969
31-1-1000.959
41-1-11-10.821
51-1-1-111.019
620-1110.921
720-1000.923
820-11-10.821
920-1010.958
1020-1-100.914
1131-11-10.705
1231-1110.861
1331-1-1-10.756
1431-1-110.861
1531-1000.78
164-10-100.999
174-10011.06
184-10100.91
194-100-10.854
204-10-111.082
215000-10.841
22500100.885
23500000.905
24500-111.025
25500000.936
26600100.844
27600000.891
28600000.902
29600000.871
30600-1-10.833
317100-10.786
32710110.879
33710000.86
34710010.905
35710-100.87
368-11-1-10.98
378-111-10.907
388-11001.045
398-11111.094
408-11-111.159
41901-101.0
429010-10.901
43901000.99
44901011.059
45901111.025
461011-110.991
4710111-10.828
481011-1-10.853
491011100.923
501011010.997
df[1:10, :]
10×6 DataFrame
RowWPFRHRRHYAGCEFFICIENCY
Int64Int64Int64Int64Int64Float64
11-1-1-1-10.873
21-1-1110.969
31-1-1000.959
41-1-11-10.821
51-1-1-111.019
620-1110.921
720-1000.923
820-11-10.821
920-1010.958
1020-1-100.914
df[45:50, :]
6×6 DataFrame
RowWPFRHRRHYAGCEFFICIENCY
Int64Int64Int64Int64Int64Float64
1901111.025
21011-110.991
310111-10.828
41011-1-10.853
51011100.923
61011010.997

We work with a full quadratic model, where the intercept is allowed to vary between blocks. First, we get the output using only MixedModels. Here, the p-values for the main effects (first-order terms) are all below 1e-10.

using MixedModels
fm = @formula(
    EFFICIENCY ~
        1 +
        (1 | WP) +
        FRH +
        RRH +
        YA +
        GC +
        FRH & RRH +
        FRH & YA +
        FRH & GC +
        RRH & YA +
        RRH & GC +
        YA & GC +
        FRH & FRH +
        RRH & RRH +
        YA & YA +
        GC & GC
)
m = fit(MixedModel, fm, df; REML=true)
Est.SEzpσ_WP
(Intercept)0.91140.011777.67<1e-990.0182
FRH-0.06090.0079-7.70<1e-13
RRH0.05220.00796.60<1e-10
YA-0.02470.0027-9.04<1e-18
GC0.07450.002727.54<1e-99
FRH & RRH0.00420.00970.440.6605
FRH & YA0.01060.00333.240.0012
FRH & GC-0.01110.0032-3.460.0005
RRH & YA-0.00150.0033-0.470.6384
RRH & GC0.00780.00322.440.0147
YA & GC-0.00000.0033-0.010.9939
FRH & FRH-0.00750.0127-0.580.5587
RRH & RRH0.02910.01272.280.0224
YA & YA-0.00750.0047-1.600.1105
GC & GC-0.00640.0048-1.340.1812
Residual0.0148

Now we look at the adjustments made in the Kenward Roger approach. The p-values for the main effects of the hard-to-change factors are much larger. This is primarily because the degrees of freedom for these factors are very limited.

Similar results hold for the second-order terms. Terms involving only the hard-to-change factors have limited degrees of freedom. The interactions involving both a hard and an easy-to-change factor have similar degrees of freedom as terms only involving easy-to-change factors.

using MixedModelsSmallSample
kr = adjust_KR(m; FIM_σ²=:expected)
Coef.Std. ErrorDenDFtPr(>|t|)
(Intercept)0.911390.0117344.207877.6698.3743e-8
FRH-0.0608680.00790253.9751-7.70230.0015681
RRH0.0521560.00790253.97516.59990.0027915
YA-0.0247190.002735231.03-9.03723.3648e-10
GC0.0745430.00271131.18927.4971.9694e-23
FRH & RRH0.00424870.00967283.96570.439250.68336
FRH & YA0.0105840.003263531.0253.24310.0028267
FRH & GC-0.0110650.003202131.073-3.45560.0016105
RRH & YA-0.00153680.00327131.032-0.469820.64177
RRH & GC0.00782470.003211831.1142.43620.020759
YA & GC-2.5084e-50.003285331.383-0.00763520.99396
FRH & FRH-0.00745490.0127494.0712-0.584770.58958
RRH & RRH0.0290960.0127434.06412.28340.083427
YA & YA-0.00748780.004700531.21-1.5930.12124
GC & GC-0.00642160.004807931.11-1.33560.19136

The standard errors on the estimates are also slightly different. However, for this specific experiment, the adjustment does not have a large effect. An example where there is a large difference in standard error will be added later.

Multiple random effects

using CSV
using DataFrames
using MixedModels

using MixedModelsSmallSample

df = DataFrame(MixedModels.dataset(:sleepstudy))
fm = @formula(reaction ~ 1 + days + zerocorr(1 + days | subj)) # zerocorr is necessary
m = fit(MixedModel, fm, df; REML=true)
Est.SEzpσ_subj
(Intercept)251.40516.885436.51<1e-9925.0513
days10.46731.55966.71<1e-105.9882
Residual25.5653
kr = adjust_KR(m; FIM_σ²=:expected) # :expected or :observed
Coef.Std. ErrorDenDFtPr(>|t|)
(Intercept)251.416.885418.18736.5131.796e-18
days10.4671.559618.1876.71172.5709e-6

Mathematical details

To quantify the uncertainty on the estimates of the variance components, either the observed or the expected Fisher information matrix can be used, by passing :expected or :observed to the keyword argument FIM_σ². Observed is the default option.

Limitations

Currently, correlated random effects are not supported. In the above example the zerocorr is necessary.

More examples

Random intercept

Random intercept and main effects

Similar software

API