[cctbxbb] Model based outlier calculation in mmtbx.scaling.outlier_rejection
Peter Zwart
phzwart at gmail.com
Sun Sep 8 22:56:13 PDT 2013
The reason for the power of N might have to do with extreme value statistics.
Tomorrow I'll take s detailed look.
Sent from my iPhone
On Sep 8, 2013, at 18:33, Keitaro Yamashita <k.yamashita at spring8.or.jp> wrote:
> Dear all,
>
> Thank you for your replies. I have already had a look at Read (1999)
> paper, but I could not find direct explanation of this implementation
> (or what the message in the code explains).
>
> Thanks to an advice of my friend, I understand that what the code does
> is something like likelihood-ratio test. The reason why taking square
> root is because cumulative distribution function of chi-square
> distribution with freedom of one is erf(sqrt(x/2)). However, I still
> do not understand the reason why it is raised to the power of N (**N).
> I would be grateful if you explained the reason.
>
> Best regards,
> Keitaro
>
> 2013/9/6 Peter Zwart <PHZwart at lbl.gov>:
>> Hi,
>>
>> It has been a while since I wrote this and you could potentially be right
>> that I forget to devide by the second derivative, I'll have a look.
>>
>> P
>>
>>
>> On 5 September 2013 02:31, Keitaro Yamashita <k.yamashita at spring8.or.jp>
>> wrote:
>>>
>>> Dear cctbx developers,
>>>
>>> I am interested in the implementation of model-based reflection
>>> outlier rejection. As I read the code
>>> mmtbx/scaling/outlier_rejection.py (lines 244-351), I noticed that
>>> maybe there was a discrepancy between what log_message explained and
>>> the actual code. The log_message in the code says:
>>>
>>>> Outliers are rejected on the basis of the assumption that a scaled
>>>> log likelihood differnce 2(log[P(Fobs)]-log[P(Fmode)])/Q\" is
>>>> distributed
>>>> according to a Chi-square distribution (Q\" is equal to the second
>>>> derivative of the log likelihood function of the mode of the
>>>> distribution).
>>>> The outlier threshold of the p-value relates to the p-value of the
>>>> extreme value distribution of the chi-square distribution.
>>>
>>> while actual p_value is calculated for each hkl as
>>> p_value = 1 - erf(sqrt(LLG))**N,
>>> where
>>> LLG = log p(F=Fbar | Fmodel) - log p(F=Fobs | Fmodel),
>>> and N is the number of reflections. Here, Fbar is F which
>>> gives the maximum value of p(F | Fmodel). At least, Q (the second
>>> derivative of p(F=Fbar | Fmodel)) is not used in the actual
>>> calculation.
>>>
>>> Could someone please explain the meaning of the actual calculation?
>>> Why taking square-root and raising erf() result to the power of N?
>>>
>>> Thank you very much,
>>> Keitaro
>>> _______________________________________________
>>> cctbxbb mailing list
>>> cctbxbb at phenix-online.org
>>> http://phenix-online.org/mailman/listinfo/cctbxbb
>>
>>
>>
>>
>> --
>> -----------------------------------------------------------------
>> P.H. Zwart
>> Research Scientist
>> Berkeley Center for Structural Biology
>> Lawrence Berkeley National Laboratories
>> 1 Cyclotron Road, Berkeley, CA-94703, USA
>> Cell: 510 289 9246
>> BCSB: http://bcsb.als.lbl.gov
>> PHENIX: http://www.phenix-online.org
>> SASTBX: http://sastbx.als.lbl.gov
>> -----------------------------------------------------------------
>>
>> _______________________________________________
>> cctbxbb mailing list
>> cctbxbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/cctbxbb
>>
> _______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb
More information about the cctbxbb
mailing list