kyungheee 2024. 11. 14. 19:26

MLE๋Š” frequentist approach์— ํ•ด๋‹นํ•œ๋‹ค.

MLE

1. Likelihood

์šฐ์„  Likelihood๋ž€, ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ • ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋งŒ๋“ค์–ด์กŒ์„ ํ™•๋ฅ ์„ ๋งํ•œ๋‹ค.

($\theta$๊ฐ€ ๋ฐ์ดํ„ฐ $X$๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ์„ค๋ช…ํ•˜๋Š”์ง€๋ผ๊ณ  ๋ด๋„ ๋ฌด๋ฐฉํ•˜๋‹ค.)

๊ทธ๋ž˜์„œ ์ˆ˜์‹ ๋˜ํ•œ $L(\theta) = p(X \mid \theta)$๋กœ ๋‚˜ํƒ€๋‚œ๋‹ค.

 

๋ถ„ํฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ $\theta=(\mu, \sigma)$์ธ ์ •๊ทœ๋ถ„ํฌ๋ผ๊ณ  ๊ฐ€์ •ํ•˜๋ฉด, ํ•œ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ $x_n$์ด ์ด ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅผ ํ™•๋ฅ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 

$$p(x_n \mid \theta) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \{ -\frac{(x_n - \mu)^2}{2 \sigma^2} \}$$

 

๋ชจ๋“  ๋ฐ์ดํ„ฐ $X = \{x_1, \dots, x_n\}$์ด independentํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ likelihood๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

$$p(X \mid \theta) = \prod_{n=1}^N p(x_n \mid \theta)$$

 

 

2. log likelihood

์šฐ๋ฆฌ๋Š” likelihood๊ฐ€ ์ตœ๋Œ€๊ฐ€ ๋˜๋„๋กํ•˜๋Š” ๋ถ„ํฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ $\theta^*$๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค

ํ•˜์ง€๋งŒ, ์‹์ด ๊ณฑ์…ˆ์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์–ด ๋ฏธ๋ถ„ํ•˜๊ธฐ ์‰ฝ์ง€ ์•Š๊ธฐ์— log์™€ -๋ฅผ ๋ถ™์—ฌ์„œ ๊ทธ ๊ฐ’์ด ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๊ฐ’์„ ๊ตฌํ•œ๋‹ค.

$$- \ln p(X \mid \theta) = - \sum_{n=1}^N \ln p(x_n \mid \theta)$$

 

3. Maximum Likelihood Estimation

์ด์ œ log likelihood๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉด์„œ, likelihood๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” $\theta$๋ฅผ ์ฐพ์„ ๊ฒƒ์ด๋‹ค.

$$
- \frac{\partial}{\partial \theta} \sum_{n=1}^N \ln p(x_n \mid \theta) = \sum_{n=1}^N \frac{\frac{\partial}{\partial \theta} p(x_n \mid \theta)}{p(x_n \mid \theta)} \overset{!}{=} 0
$$

์ด ์‹์„ ๋งŒ์กฑ์‹œํ‚ค๋Š” $\theta$์„ ์ฐพ์œผ๋ฉด ์šฐ๋ฆฌ๋Š” likelihood๋ฅผ ์ตœ๋Œ€ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋ฏธ๋ถ„ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

\begin{align} \frac{\partial}{\partial \mu} E(\mu, \sigma) &= -\sum_{n=1}^N \frac{\frac{\partial}{\partial \mu} p(x_n \mid \mu, \sigma)}{p(x_n \mid \mu, \sigma)} \\ &= -\sum_{n=1}^N \frac{-2 (x_n - \mu)}{2 \sigma^2} \\ &= \frac{1}{\sigma^2} \sum_{n=1}^N (x_n - \mu) \\ &= \frac{1}{\sigma^2} \left( \sum_{n=1}^N x_n - N\mu \right) \end{align}

 

๋”ฐ๋ผ์„œ ํ‰๊ท  $\mu$๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜์˜จ๋‹ค.

$$\hat{\mu} = \frac{1}{N} \sum_{n=1}^N x_n$$

๋˜ํ•œ ๋ถ„์‚ฐ $\sigma^2$ ๋˜ํ•œ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

$$\hat{\sigma}^2 = \frac{1}{N} \sum_{n=1}^N (x_n - \hat{\mu})^2$$

 

์ด์ฒ˜๋Ÿผ likelihood๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ $\theta = (\mu, \sigma)$ ๊ฐ’์„ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ์„ maximum likelihood estimation (MLE)๋ผ๊ณ  ํ•œ๋‹ค.

 

 

์ถœ์ฒ˜ https://process-mining.tistory.com/93

 

Maximum Likelihood๋ž€? (MLE๋ž€?)

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” Maximum Likelihood๊ฐ€ ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ฒ ๋‹ค. ์ด ํฌ์ŠคํŒ…์€ ์ •๊ทœ ๋ถ„ํฌ์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. Likekihood Likelihood๋ž€, ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ • ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ๋งŒ๋“ค์–ด์กŒ์„(generate) ํ™•

process-mining.tistory.com