線形混合モデルについて②

2月 07, 2019

線形混合モデルについて②

どうも, またしても修士1年の濱崎です.
本日は, 前回に引き続き, 線形混合モデル(linear mixed effects model) に関して解説していこうと思います.
前回：線形混合モデルについて①

2回目の今回は, 線形混合モデルの解法に関して解説してゆきます.

解法

基本的に, 制限付き最尤法 (REML; restricted maximum likelihood) と呼ばれる手法に基づき, 混合モデルが解かれることが多いです. これは, 変量効果の分散 $\sigma ^ 2 _ { \mathrm { u } }$ を推定することが目的の時に, $\sigma ^ 2 _ { \mathrm { u } }$ を不偏推定量 (unbiased estimator) となるように各パラメータを推定する手法です.
まず各分散成分 (variance components) の推定を行うのですが, この推定には色々な手法が提案されています. ~2000年代前半までは, $\mathbf{ u }$ を潜在変数 (latent variable) とみなして EMアルゴリズムを用いたり, あるいは Newton-Raphson法のような数値最適化手法により, 分散成分を推定することが多かったように思います¹. そういった中でも今回は, 2008年に Kang らによって発表された EMMA (efficient mixed model association) と呼ばれる手法 ² について紹介したいと思います.

EMMA（分散成分の推定）

まず, 線形混合モデルの対数尤度 (full log likelihood) は以下のように書き表せます.

$l _ { \mathrm { F } } ( \mathbf { y } ; \boldsymbol { \beta } , \sigma _ { \mathrm { u } } , \delta ) = \frac { 1 } { 2 } \left[ - n \log \left( 2 \pi \sigma _ { \mathrm { u } } ^ { 2 } \right) - \log | \mathbf { H } | - \frac { 1 } { \sigma _ { \mathrm { u } } ^ { 2 } } ( \mathbf { y } - \mathbf { X } \boldsymbol { \beta } ) ^ \text{T} \mathbf { H } ^ { - 1 } ( \mathbf { y } - \mathbf { X } \boldsymbol { \beta } ) \right],$

ここで, 表現型の分散共分散行列を $\mathbf { V }$ とすると,

$\mathbf { V } = \sigma _ { \mathrm { u } } ^ { 2 } \mathbf { Z } \mathbf { K } \mathbf { Z } ^ \mathrm {T} + \sigma _ { \mathrm { e } } ^ { 2 } \mathbf { I } _ { n }$

であり, 対数尤度の式の中の $\mathbf { H }$ は, $\delta = \frac { \sigma _ { \mathrm { e } } ^ { 2 } } { \sigma _ { \mathrm { u } } ^ { 2 } }$ として,

$\mathbf { H } = \frac { \mathbf { V } } { \sigma _ { \mathrm { u } } ^ { 2 } } = \mathbf { Z } \mathbf { K } \mathbf { Z } ^ \mathrm {T} + \delta \mathbf { I } _ { n }$

と表されます.
また REML にて最大化すべき制限付き対数尤度 (restricted log likelihood) は,

$l _ { \mathrm { R } } ( \mathbf { y } ; \sigma _ { \mathrm { u } } , \delta ) = l _ { \mathrm { F } } ( \mathbf { y } ; \hat { \boldsymbol { \beta } } , \sigma _ { \mathrm { u } } , \delta ) + \frac { 1 } { 2 } \left[ p\log \left( 2 \pi \sigma _ { \mathrm { u } } ^ { 2 } \right) + \log \left| \mathbf { X } ^ \mathrm {T} \mathbf { X } \right| - \log \left| \mathbf { X } ^ \mathrm {T} \mathbf { H } ^ { -1 } \mathbf { X } \right| \right]$

として表されます.
まず, これを $\boldsymbol { \beta }$ および $\sigma _ { \mathrm { u } } ^ { 2 }$ に関して最大化を行います（単純にそれぞれのパラメータで偏微分した時の $l ^ {\prime} _ { \mathrm { F / R} }=0$ の解を求めます）. これによりそれぞれの推定値は,

$\hat { \boldsymbol { \beta } } = \left( \mathbf { X } ^ \mathrm {T} \mathbf { H } ^ { -1 } \mathbf { X } \right) ^ { - 1 } \mathbf { X } ^ \mathrm {T} \mathbf { H } ^ { -1 } \mathbf { y },$

$\hat { \sigma } _ { \mathrm { u } } ^ { 2 } = \frac { \mathbf { R } } {df}$

のように表されます. ここで,

$\mathbf { R } = \left( \mathbf { y } - \mathbf { X } \hat { \boldsymbol { \beta } } \right) ^ \text{T} \mathbf { H } ^ { - 1 } \left( \mathbf { y } - \mathbf { X } \hat { \boldsymbol { \beta } } \right),$

$df = \left\{ \begin{array}{ll} n & ( \text {for full log likelihood } l _ { \mathrm { F } } )\\ n - p & ( \text {for restricted log likelihood } l _ { \mathrm { R } } ) \end{array} \right.$

です. （ちなみにここでの $\mathbf { R }$ は, 前回の $\mathbf { R }$ とは異なります… まぎらわしくてすみません.）
次に, これらの推定値を元の対数尤度関数に代入し直します. すると, 対数尤度関数は

$f _ { \mathrm { F } } ( \delta ) = l _ { \mathrm { F } } ( \mathbf { y } ; \hat { \boldsymbol { \beta } } , \hat { \sigma } _ { \mathrm { u } } , \delta ) = \frac { 1 } { 2 } \left[ - n \log \frac { 2 \pi \mathbf { R } } { n } - \log | \mathbf { H } | - n \right]$

となります.

さて, ここでいくつかの定義を行います. まず, $\mathbf { Z } \mathbf { K } \mathbf { Z } ^ \mathrm {T}$ の固有値を $\boldsymbol { \xi }$ , 固有値を対角成分に持つ対角行列を $\mathbf { D } _ { \mathrm { F } }$ , 固有ベクトルの行列 ( $n \times n$ 行列)を $\mathbf { U } _ { \mathrm { F } }$ と定義します. すなわち,

$\mathbf { H } = \mathbf { U } _ { \mathrm { F } } \mathbf { D } _ { \mathrm { F } } \mathbf { U } _ { \mathrm { F } } ^ \text{T} + \delta \mathbf { I } _ n = \mathbf { U } _ { \mathrm { F } } \text{ diag} \left( \xi _ { 1 } + \delta , \cdots , \xi _ { n } + \delta \right) \mathbf { U } _ { \mathrm { F } } ^ \text{T}$

さらに, 準備として以下の値を次に示すように定義します.

$\mathbf { S } = \mathbf { I } _ n - \mathbf { X } \left( \mathbf { X } ^ \text{T} \mathbf { X } \right) ^ { - 1 } \mathbf { X } ^ \text{T}$

また, $\mathbf { S } \mathbf { H } \mathbf { S }$ のスペクトル分解を以下のように行います.

$\mathbf { S } \mathbf { H } \mathbf { S } = \mathbf { S } \left( \mathbf { Z } \mathbf { K } \mathbf { Z } ^ \mathrm {T} + \delta \mathbf { I } _ n \right) \mathbf { S } = \mathbf { U } _ { \mathrm { R } } \text{ diag} \left( \lambda _ { 1 } + \delta , \cdots , \lambda _ { n - p } + \delta \right) \mathbf { U } _ { \mathrm { R } } ^ \text{T}$

ここで, $\mathbf { S }$ を掛けたおかげで $\mathbf { S } \mathbf { H } \mathbf { S }$ はランク落ちしているため, 固有値の数は $n - p$ 個, 固有値ベクトルの行列 $\mathbf { U } _ { \mathrm { R } }$ は $n \times n - p$ 行列になっています.
最後に, 次のように $n \times n$ 行列 $\mathbf { P }$ を定義します.

$\mathbf { P } = \mathbf { I } _ { n } - \mathbf { X } \left( \mathbf { X } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { X } \right) ^ { - 1 } \mathbf { X } ^ \text {T} \mathbf { H } ^ { - 1 }$

これらの定義を用いれば, 先ほどの $\mathbf { R }$ が以下のように書き表せます.
$\mathbf { R } = ( \mathbf { y } - \mathbf { X } \hat { \boldsymbol { \beta } } ) ^ \text {T} \mathbf { H } ^ { - 1 } ( \mathbf { y } - \mathbf { X } \hat { \boldsymbol { \beta } } ) = \mathbf { y } ^ \text {T} \left( \mathbf { I } _ { n } - \mathbf { X } \left( \mathbf { X } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { X } \right) ^ { - 1 } \mathbf { X } ^ \text {T} \mathbf { H } ^ { - 1 } \right) ^ \text {T} \mathbf { H } ^ { - 1 } \left( \mathbf { I } _ { n } - \mathbf { X } \left( \mathbf { X } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { X } \right) ^ { - 1 } \mathbf { X } ^ \text {T} \mathbf { H } ^ { - 1 } \right) \mathbf { y } = \mathbf { y } ^ \text {T} \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \mathbf { y }$
ここで, $\mathbf { P } \mathbf { S } = \mathbf { P }$ , $\mathbf { S } \mathbf { P } = \mathbf { S }$ , $\mathbf { S } \mathbf { P } ^ \text {T} = \mathbf { P } ^ \text {T}$ , $\mathbf { H } \mathbf { P } ^ \text {T} \mathbf { H } ^ { -1 } = \mathbf { P }$ , $\mathbf { H } ^ { -1 } \mathbf { P } \mathbf { H } = \mathbf { P } ^ \text {T}$ , $\mathbf { P } ^ { 2 } = \mathbf { P }$ であることを利用すれば,

$\left( \mathbf { S } \mathbf { H } \mathbf { S } \right) \left( \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \right) \left( \mathbf { S } \mathbf { H } \mathbf { S } \right) = \mathbf { S } \mathbf { H } \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \mathbf { H } \mathbf { S } = \mathbf { S } \mathbf { P } \mathbf { P } \mathbf { H } \mathbf { S } = \mathbf { S } \mathbf { H } \mathbf { S }$
かつ
$\left( \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \right) \left( \mathbf { S } \mathbf { H } \mathbf { S } \right) \left( \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \right) = \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \mathbf { H } \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } = \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \mathbf { P } \mathbf { P } = \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P }$
であることがわかります.
さらに,
$\left( \mathbf { S } \mathbf { H } \mathbf { S } \right) \left( \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \right) = \left( \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \right) \left( \mathbf { S } \mathbf { H } \mathbf { S } \right) = \mathbf { S }$
から, $\mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P }$ と $\mathbf { S } \mathbf { H } \mathbf { S }$ の積（可換）はエルミート行列（この場合, 実対称行列）になります.
したがって, 定義から $\mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P }$ は $\mathbf { S } \mathbf { H } \mathbf { S }$ の擬似逆行列（pseudo-inverse matrix, または, 一般化逆行列 : generalized inverse matrix） であることがわかります.

$\mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } = \left ( \mathbf { S } \mathbf { H } \mathbf { S } \right ) ^ { + } = \mathbf { U } _ { \mathrm { R } } \text{ diag} \left [ \left( \lambda _ { 1 } + \delta , \cdots , \lambda _ { n - p } + \delta \right) ^ { -1 } \right] \mathbf { U } _ { \mathrm { R } } ^ \text{T}$
なお, $( \cdot ) ^ { + }$ は擬似逆行列を表します.
よって, $\mathbf { U } _ { \mathrm { R } } ^ \text {T} \mathbf { y } = \boldsymbol { \eta }$ と定める（ $\boldsymbol { \eta }$ は $n - p \times 1$ ベクトル）と, $\mathbf { R }$ はさらに,

$\mathbf { R } = \mathbf { y } ^ \text {T} \left ( \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \right ) \mathbf { y } = \boldsymbol { \eta } ^ \text {T} \text{ diag} \left [ \left( \lambda _ { 1 } + \delta , \cdots , \lambda _ { n - p } + \delta \right) ^ { -1 } \right] \boldsymbol { \eta } = \sum _ { s = 1 } ^ { n - p } \frac { \eta _ { s } ^ { 2 } } { \lambda _ { s } + \delta }$

ここから, 対数尤度は次のように書き下せます.

$f _ { \mathrm { F } } ( \delta ) = - \frac { 1 } { 2 } \left[ n \log \frac { 2 \pi } { n } + n \log { \left ( \sum _ { s = 1 } ^ { n - p } \frac { \eta _ { s } ^ { 2 } } { \lambda _ { s } + \delta } \right ) } + \sum _ { i = 1 } ^ { n } \log { \left( \xi _ { i } + \delta \right) } + n \right]$

さらに, 制限付き対数尤度は, $\mathbf { B } ^ \text { T } \mathbf { y }$ の尤度を考えれば良いことがわかっています. ここで, 行列 $\mathbf { B }$ は $\mathbf { B } \mathbf { B } ^ \text { T } = \mathbf { S }$ かつ $\mathbf { B } ^ \text { T } \mathbf { B } = \mathbf { I }$ を満たす行列です. さて, これまでの結果から,

$\mathbf { S } = \left( \mathbf { S } \mathbf { H } \mathbf { S } \right) \left( \mathbf { P } ^ \text {T} \mathbf { H } ^ { - 1 } \mathbf { P } \right) = \left( \mathbf { S } \mathbf { H } \mathbf { S } \right) \left ( \mathbf { S } \mathbf { H } \mathbf { S } \right ) ^ { + } = \mathbf { U } _ { \mathrm { R } } \mathbf { U } _ { \mathrm { R } } ^ \text { T }$
であることがわかっており, さらに, $\mathbf { U } _ { \mathrm { R } } ^ \text { T } \mathbf { U } _ { \mathrm { R } } = \mathbf { I } _ { n - p }$ から, $\mathbf { B } = \mathbf { U } _ { \mathrm { R } }$ として良いことがわかります.
したがって, $\mathbf { U } _ { \mathrm { R } } ^ \text { T } \mathbf { y }$ の尤度を考えればよく, これは平均 $\mathbf { 0 }$ , 分散が

$\mathbf { U } _ { \mathrm { R } } ^ \text { T } \mathbf { V } \mathbf { U } _ { \mathrm { R } } = \sigma ^ { 2 } _ { \mathrm { u } } \mathbf { U } _ { \mathrm { R } } ^ \text { T } \mathbf { H } \mathbf { U } _ { \mathrm { R } } = \sigma ^ { 2 } _ { \mathrm { u } } \text{ diag} \left( \lambda _ { 1 } + \delta , \cdots , \lambda _ { n - p } + \delta \right)$
の多変量正規分布に従います. なぜなら,

$\mathbf { U } _ { \mathrm { R } } \mathbf { U } _ { \mathrm { R } } ^ \text { T } \mathbf { H } \mathbf { U } _ { \mathrm { R } } \mathbf { U } _ { \mathrm { R } } ^ \text { T } = \mathbf { S } \mathbf { H } \mathbf { S } = \mathbf { U } _ { \mathrm { R } } \text{ diag} \left( \lambda _ { 1 } + \delta , \cdots , \lambda _ { n - p } + \delta \right) \mathbf { U } _ { \mathrm { R } } ^ \text { T }$
が成り立つからです.
さて, この正規分布に $\hat { \sigma } _ { \mathrm { u } } ^ { 2 } = \mathbf { R } \text{ / } { n - p }$ を代入することで, 結果的に以下の形で制限付き尤度が書き下せます.

$f _ { \mathrm { R } } ( \delta ) = - \frac { 1 } { 2 } \left[ ( n - p ) \log \frac { 2 \pi } { n - p } + ( n - p ) \log \left( \sum _ { s = 1 } ^ { n - p } \frac { \eta _ { s } ^ { 2 } } { \lambda _ { s } + \delta } \right) + \sum _ { s = 1 } ^ { n - p } \log \left( \lambda _ { s } + \delta \right) + ( n - p ) \right]$

ここまで来ればあとは簡単です. MLなら $f _ { \mathrm { F } } ( \delta )$ , REMLなら $f _ { \mathrm { R } } ( \delta )$ を, 分散比 $\delta$ に関して, $\delta > 0$ という制約のもとで, 何らかの最適手法を用いて最大化すれば良いわけです.
これにより $\hat { \delta }$ が求まり, $\hat { \sigma } _ { \mathrm { u } } ^ { 2 }$ と掛け合わせることで, 誤差分散の推定値 $\hat { \sigma } _ { \mathrm { e } } ^ { 2 } = \hat { \delta } \hat { \sigma } _ { \mathrm { u } } ^ { 2 }$ を求めることができます. これで, EMMAによる分散成分の推定は終わりです。

そのほかの手法

分散成分の推定の手法としては, EMMAの他にも色々の手法が提案されてきています. これには, 近年ゲノムワイド関連解析 (GWAS: genome-wide association study) などのような混合モデルを何度も解く必要がある手法が登場し, その結果できるだけ効率よく混合モデルを解く手法が必要になってきたことが大きく関係していると考えられます. 具体的には, 2012年に発表されたGEMMAと呼ばれる手法³や, あるいは効率的なEMアルゴリズムを利用する手法⁴などが提案されています. あるいは, 計算的には特に有利というわけではないですが, ベイズ的な解釈を可能にする, Gibbs samplingを用いた手法⁵なども古くから用いられています（当然最尤法の枠組みからは外れるわけですが）. まあ色々な手法があるわけですが, 目的に応じて適切な手法を選ぶのが良いと思います.

というわけでめちゃくちゃ長くなってしまったので, 今回はここまでです. 数式が多かったので, 更新に結構時間がかかってしまいました.
次回は, この分散成分の推定値を用いて, 固定効果及び遺伝子型値をどう推定するかについて紹介します.

それではまた.

（文責：濱崎）

参考文献

Lindstrom, M.J. and Bates, D.M. (1988) Newton-Raphson and EM Algorithms for Linear Models for Repeated-Measures Data. J Am Stat Assoc. 83(404): 1014-1022. ↩︎
Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723. ↩︎
Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824. ↩︎
Diffey, S.M., Smith, A.B., Welsh, A.H. and Cullis, B.R. (2017) A new REML (parameter expanded) EM algorithm for linear mixed models. Aust New Zeal J Stat. 59(4): 433-448. ↩︎
Sorensen, D. and Gianola, D. (2002) Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Chapter 6 & Chapter 13. ↩︎

このブログを検索

生物測定学ブログ

注目

日本進化学会学会賞

線形混合モデルについて②

解法

EMMA（分散成分の推定）

そのほかの手法

参考文献

コメント

コメントを投稿

人気の投稿

Python を Windows (64 bit) で使うときの注意点

日本進化学会学会賞