Friday, October 30, 2009

estimating mutual information

just read a very interesting article on estimating mutual information from random variable samples. pretty much everything else i've seen on this subject is based on either histograms or kdes. so improving the algorithms comes down to improving histogram or kernel parameters. 'estimating mutual information' by a. kraskov, h. st\:ogbauer, and p. grassberger takes a unique approach, based on nearest neighbors. seems to do pretty well on few data points and nearly independent sets. the paper also shows an interesting application to ica. certainly worth a try, especially if i'm comparing a number of approaches to mi estimation. it points out that the norms need not be the same or even have the same space, so i could use the rank or any other transform (log is a popular one) to spread out some data or otherwise emphasize some parts to reduce estimation error without changing the theoretical result. the main results to implement are equations 8 and 9. be careful that the definitions of n_x and n_y are different in 8 and 9, though they could be counted simultaneously (on the same pass). compare fig 4 to 13. the exact value for I in the caption comes from eq 11. ref 34 goes into the uniqueness and robustness of the components that drop out of ica analysis. the paragraph under fig 17 has computed values from web-accessible data that could be used for testing code. it would be interesting to see if component 1 in fig 19 reflects phase differences in component 2 due to propagation delay. the second term of the second line in eq a4 equals 0, which confirms the fact that reparameterization does not change mutual information. using small values for k increases statistical errors while large k increases systematic errors. probably best to try multiple ks and compare trends to fig 4. the digamma function is implemented in scipy.special as psi.

No comments: