avva: (Default)
[personal profile] avva
Вот еще одна задачка из теории вероятностей, на этот раз сложнее, чем предыдущая на ту же тему.

Здоровая 42-летняя женщина пошла на прием к зубному врачу, чтобы удалить зуб мудрости. Зуб был успешно удален в районе полудня, а к вечеру она скоропостижно скончалась. Наиболее вероятная причина смерти - анафилактический шок: резкая аллергическая реакция на какое-то лекарство.

Есть три лекарства, которые находятся под подозрением. Во-первых, пациентка выпила таблетку пенициллина перед визитом к врачу (ввиду сердечных шумов). Во-вторых, врач выписал ей рецепт на болеутоляющее лекарство - зомепирак - и сказал принимать, если боль будет сильной. В-третьих, врач использовал новокаин во время операции. Известно, что пациентка купила зомепирак, но неизвестно, выпила она его дома или нет (не знают, сколько было таблеток во флаконе). Остальные два лекарства точно были в ее организме.

Специалист по медицинской статистике сопоставил известные данные, подсчитал риски и пришел к следующему выводу: если она действительно выпила зомепирак, то с вероятностью 95% ее смерть произошла именно из-за зомепирака, а не других лекарств или еще каких-то причин (мало ли что бывает, берегите себя). Кроме того, есть данные опросов, которые говорят, что 60% больных, которые покупают зомепирак в таких же обстоятельствах, действительно испытывают такую боль, что выпивают его.

Оцените вероятность того, что именно зомепирак убил бедную женщину. Подсказка: правильный ответ - не 57%.

(Если вам не хватает каких-то данных, попробуйте использовать какие-нибудь разумные предположения, и объясните их).

P.S. В комментариях есть несколько правильных ответов, учтите.

Date: 2011-02-16 10:28 pm (UTC)
From: [identity profile] yurilax.livejournal.com
Можете расшифровать, что обозначают ваши predicates?
I.e. what's Death, what's Z, etc.

The way I read it, 95% is not
Prob ("patient dies" | "patient takes Z"),

but rather

Prob ("patient died from Z" | "patient had died" & "patient had taken Z")

Of course, "patient had taken Z" is redundant for the 95%, but it isn't for the 5%. I.e. the 5% are
Prob ("patient died NOT from Z" | "patient had died" & "patient had taken Z")

I don't think it's valid to simply drop the "patient had taken Z" conditional and say that 5% is the general probability that patient had died from causes other than Z, given that she died.

But I'm probably wrong, given the strong consensus here. Please enlighten me, if you could. :)

Date: 2011-02-17 01:04 am (UTC)
From: [identity profile] avva.livejournal.com
It isn't valid, but it's a reasonable assumption that the different causes work independently, so dropping it in fact doesn't change the probability.

To be more explicit, and a little more careful than angerona in naming the predicates:

We need to get to P("died from Z" | "died") =

P("died from Z" & "died") / P("died") =

P("died from Z") / P("died"), because "died from Z" is a subset of "died".

So let's try to calculate P("died from Z") and P("died").

We're given P("died from Z" | "died" & "had taken Z") = 0.95. This, again because of subsets, equals P("died from Z") / P("died" & "had taken Z"). The denominator breaks down as P("died" & "had taken Z" & "died from Z") + P("died" & "had taken Z" & "died otherwise"), where the first summand is again simply P("died from Z"). So we have 0.95*(P("died from Z") + P("had taken Z" & "died otherwise")) = P("died from Z"), or
in other words P("died from Z") = 19*P("had taken Z" & "died otherwise").

Now let us assume that events "died otherwise" (not from Z) and "had taken Z" are independent of each other. I think that's a reasonable assumption. It leads us to the following:

P("died otherwise") = P("died otherwise" | "had taken Z") = P("died otherwise" | "had not taken Z"). Expanding the last two,

P("died otherwise" & "had taken Z") / 0.6 = P("died otherwise" & "had not taken Z") / 0.4. Here I substituted known values of P("had taken Z") and P("had not taken Z"). It follows that P("died otherwise" & "had not taken Z") = 2/3*P("died otherwise" & "had taken Z").

Denote the probability P("died otherwise" & "had taken Z") by X.

Finally, consider that P("died") = P("died from Z") + P("died otherwise") = P("died from Z") + P("died otherwise" & "had taken Z") + P("died otherwise" & "had not taken Z") = 19X + X + 2/3*X = 62X/3, using previously established identities. And P("died from Z") is 19X. So the final answer is 19X/(62X/3) = 57/62.

The reason this came out rather long is that we get to the right answer without knowing X. A much more compressed version of exactly the same argument is in alexeybobkov's comment here. Note that he uses the reasonable assumption I spelled out above implicitly, by defining x as P("died otherwise") without reference to taking/not taking Z, and then using it in the formula in places conditioned upon taking/not taking Z.

Date: 2011-02-17 04:32 am (UTC)
alexeybobkov: (Default)
From: [personal profile] alexeybobkov
he uses the reasonable assumption I spelled out above implicitly

Я, конечно, понимал, что делаю допущение, но вот не математик я... не умею так хорошо сформулировать. Предпочёл оставить этот пункт без объяснений.

Date: 2011-02-20 12:30 pm (UTC)
From: [identity profile] yurilax.livejournal.com
Hmm... If x is P("died otherwise"), shouldn't it be 1 for the case when the patient died without taking Z? Are we mixing up a priori and a posteriori probabilities?

I.e. Prob("patient WILL die from causes other than Z")
with
Prob("patient HAD died from causes other than Z")


On another point, considering only the population of dead patients who had taken Z, we are making an independence assumption for the causes, which implies that patients would not have died of the other cause, were it not for the first one. I.e. if the patient would NOT had taken Z, he would have lived (and vice versa for the "other" causes).

But that in itself might be a stretch, because it's highly possible that Z is just a much faster killer that all "other" causes and were it not for Z, a large number of patients who have died from it, would still have died from those slower causes. For example, because they are susceptible to anaphylactic shock in general, to a whole range of painkillers.

Date: 2011-02-20 01:33 pm (UTC)
From: [identity profile] avva.livejournal.com
Hmm... If x is P("died otherwise"), shouldn't it be 1 for the case when the patient died without taking Z?

Sure.

I.e. Prob("patient WILL die from causes other than Z")
with
Prob("patient HAD died from causes other than Z")


My analysis is timeless. There are three possible outcomes: the patient lives; the patient died from Z; the patient dies "otherwise". I think you're confused because you're trying to interpret P(A|B) as something like "what's the probability of A *now that* B happened, as opposed to probability of A *before* B happened?". But that requires you to assume something about relative times of A and B, for example, which is the "faster killer", which may be fairly meaningless in this case, and in any case aren't given in the data.

Instead, think of P(A|B) as the probability of A happening when we "restrict the universe" to require B to happen. B doesn't have to always happen before A, or after A; you're looking at it all "after the fact". You're the coroner standing at the body of the dead woman, not the patient trying to assess her chances at any given moment.

On another point, considering only the population of dead patients who had taken Z, we are making an independence assumption for the causes, which implies that patients would not have died of the other cause, were it not for the first one. I.e. if the patient would NOT had taken Z, he would have lived (and vice versa for the "other" causes).

I don't understand how you're getting to this; we aren't assuming anything like it, on the contrary, in our analysis some patients *will* die even if they don't take Z.

Maybe it'll be less confusing if you think of it in terms of elementary events? Consider a labelling scheme where T means "took", NT "not took", and Z/O/L means died from Z/died otherwise/lived. Then you have 6 possible "outcomes":

TZ
TO
TL
NTZ
NTO
NTL

except NTZ is impossible, so really five of them. Suppose you draw a table where you have two columns, T and NT, and three rows, Z/O/L, and you put in each cell the fraction of f people, out of a million random tries, who suffered that fate. Then the data you're given says that approximately

TZ+TO+TL = 0.6, NTO+NTL = 0.4
TZ/(TZ+TO) = 0.95

The assumption I was talking about says

TO/(TZ+TO+TL) = NTO/(NTO+NTL)

From these equations, you can derive TZ(TZ+TO+NTO) = 57/62.

Date: 2011-02-20 03:55 pm (UTC)
From: [identity profile] yurilax.livejournal.com
Thank you for bearing with me. :)

B doesn't have to always happen before A, or after A; you're looking at it all "after the fact". You're the coroner standing at the body of the dead woman, not the patient trying to assess her chances at any given moment.

I think you are quite right about my perspective. I am trying to understand this problem more from the point of view of a researcher trying to uncover not just "what killed her", but also "would she have survived had she not taken Z".

And that is why I cannot accept the assumption
TO/(TZ+TO+TL) = NTO/(NTO+NTL)
without experimental proof. Which would have involved studying the mortality rate between groups of people who take Z and those who don't, to see whether it's really 20 times larger in the former.

Because I think we are missing a substantial group of Z-takers who would have died anyways had they not taken Z and which hence would make NTO/(NTO+NTL) proportion bigger than just TO/(TZ+TO+TL). My belief is based on the way drugs work (which is not timeless): if Z causes, say, suffocation within 5 hours in susceptible patients and they die from that, then the coroner will put Z down as the cause of death. But if Novocaine causes, say, heart failure within 10 hours and all patients take N, it wouldn't be a valid assumption to say that because only 5% of dead Z-takers die from N and not from Z, then the same proportion must translate to the non-Z-takers.

December 2025

S M T W T F S
  123 4 56
78 9 10 11 1213
1415 1617181920
21 22 23 24 2526 27
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Dec. 29th, 2025 03:21 pm
Powered by Dreamwidth Studios