๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿš“ Self Study/๐Ÿ”ด Machine Learning

[๋ฐฉํ•™ ์ค‘ ๊ณต๋ถ€] Machine Learning. Information Theory #2 - maximum entropy distribution, conditional theory, KL divergence, M-projection and I-projection, mutual information, cross entropy, transfer entropy

by UKHYUN22 2022. 8. 10.
728x90

 

๊ท ๋“ฑ๋ถ„ํฌ๋Š” Maximum Entropy Distributiton์„ ๊ฐ€์ง„๋‹ค.

 

[a,b]๋ผ๋Š” ์ œํ•œ๋œ ๋ฒ”์œ„์—์„œ ์ตœ๋Œ€์˜ ์•คํŠธ๋กœํ”ผ๋ฅผ ๊ฐ€์ง€๋Š” ๋ถ„ํฌ๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์„ ์ง„ํ–‰ํ•ด๋ณด์ž. ์—”ํŠธ๋กœํ”ผ์˜ ๊ฒฝ์šฐ ์Œ์˜ ๋กœ๊ทธ๊ฐ’์„ ๊ฐ€์ง€๋ฉฐ (Degree of Surprise) ์—ฐ์†์ ์ธ ๊ฒฝ์šฐ ์œ„์™€ ๊ฐ™์ด ํ‘œํ˜„๋œ๋‹ค. ์ด๋•Œ Constraint๋Š” [a,b]์—์„œ ๋ชจ๋“  ํ™•๋ฅ ์„ ๋”ํ•˜๋ฉด 1์ด ๋œ๋‹ค๋Š” ์ ์ด๋‹ค. Lagrange ๊ณฑ์…ˆ์„ ์ง„ํ–‰ํ•˜๋ฉด ํ™•๋ฅ  ๊ฐ’์ด ๊ท ๋“ฑ๋ถ„ํฌ์—์„œ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. 

 

Marginal Entropy์˜ ๊ฒฝ์šฐ ์•ž์„œ ๊ณ„์‚ฐํ•œ ๊ฒƒ ๊ณผ ๋™์ผํ•œ ์•คํŠธ๋กคํ”ผ์˜ ๊ณ„์‚ฐ์‹์ด๋ฉฐ ์‹œ๊ทธ๋งˆ์˜ ๊ฒฝ์šฐ ์ด์‚ฐ์ ์ธ ๊ฒฝ์šฐ์— ๋Œ€ํ•œ ํ‘œํ˜„์ด ๋œ๋‹ค. Joint Entropy์˜ ๊ฒฝ์šฐ ๋‘ ๊ฐ€์ง€ ์ด์ƒ์˜ ํ™•๋ฅ ์ด ๊ฒฐํ•ฉ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๊ณ  Conditional Entropy์˜ ๊ฒฝ์šฐ ํ”ํžˆ ์•Œ๊ณ  ์žˆ๋Š” ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์˜ ์‚ฌ๋ก€๊ฐ€ ๋œ๋‹ค. ์•คํŠธ๋กœํ”ผ์˜ ๊ฒฝ์šฐ ์ด๋ ‡๊ฒŒ Marginal, Joint, Conditional Entropy ๊ฐ„์˜ ์ƒ๊ด€ ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š”๋ฐ ์ด๋ฅผ ๋‹ค์Œ ํŽ˜์ด์ง€์—์„œ ์ฆ๋ช…ํ•œ๋‹ค. 

 

 

ํ”ํžˆ ์•Œ๊ณ  ์žˆ๋Š” Joint Probability๋ฅผ Conditional Probability๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ณผ์ •์„ ์ง„ํ–‰ํ•˜๋ฉด์„œ ์ฆ๋ช…๋œ๋‹ค. ๋…ธ๋ž€์ƒ‰  ๋ฐ•์Šค๋ฅผ ๋ณด๋ฉด ๋งจ์ฒ˜์Œ Joint Entropy๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์‹์—์„œ ์ถœ๋ฐœํ•˜๋ฉฐ Joint Probability๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์ˆ˜์‹์„ ํ’€์–ด์„œ Marginal๊ณผ Conditional Probability๋กœ ์น˜ํ™˜ํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์ •๋ฆฌ๋ฅผ ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ด€๊ณ„์‹์„ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋‹ค. 

Relative ์•คํŠธ๋กœํ”ผ

 

Relative ์•คํŠธ๋กœํ”ผ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋ฉฐ ์ด๋Š” ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ P์™€ Q์˜ ๋ฐœ์‚ฐ ์ •๋„๋ฅผ  ์•Œ ์ˆ˜  ์žˆ๋‹ค. ๋ณดํ†ต ๋‘ ํ™•๋ฅ ๋ถ„ํฌ์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ์‚ฌ์šฉ์ด ๋˜๋ฉฐ ์–ด๋–ค ์ด์ƒ์ ์ธ ๋ถ„ํฌ๋ฅผ ๊ตฌํ•˜๊ณ ์ž ํ•  ๋•Œ ํ•ด๋‹น ๋ถ„ํฌ์— ๊ทผ์‚ฌํ•˜๋Š” ๋‹ค๋ฅธ ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•˜์—ฌ Samplingํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ์‚ฌ์šฉ์ด ๋˜๊ธฐ๋„ ํ•œ๋‹ค. ์ง๊ด€์ ์œผ๋กœ ๋ณธ๋‹ค๋ฉด P์™€ Q๋ถ„ํฌ์˜ Cross Entropy๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์—ฌ๊ธฐ์„œ P๋ถ„ํฌ์˜ Entropy๋ฅผ ๋นผ์ค€๋‹ค๋ฉด ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

[์ถœ์ฒ˜ ์ธ์šฉ, ์ฐธ๊ณ  : https://hyunw.kim/blog/2017/10/27/KL_divergence.html ]

 

ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  Empirical Approach๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ Q๋ฅผ ์ฐพ๋Š” ์ˆ˜์‹์„ ์‚ดํŽด๋ณด๋ฉด, Basian Theory์ฒ˜๋Ÿผ Posterior์˜ ํ™•๋ฅ ์„ ๊ตฌํ•  ๋•Œ Conversion ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. 

 

 

 

 

KL Divergence

 

KL Divergence๋Š” Asymmetric์ด์—ฌ์„œ KL(p||q) ์™€ KL(q||p)์˜ ๊ฐ’์ด ๋‹ค๋ฅด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ์ง๊ด€์ ์œผ๋กœ  ๋‘ ๋ถ„ํฌ์˜ ๊ฑฐ๋ฆฌ ์ฐจ์ด์ธ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ