Physics 3333 / CFB 3333 Risk and Probability
Click here for Professor Fisher's slides on probabilities and probabilistic fallacies. (password required)
Click here for Professor Fisher's slides on using probabilities in decision making. (password required)
Evaluating Risks and Probability
What Is Risk Anyway?
Risk is the probability that something undesirable will happen. Whatever it is, you won't like it. You wouldn't call the possibility of winning a million dollars a risk - you'd want that to happen.
The reason for evaluating risk is that practically everything is uncertain. Maybe it'll happen - maybe it won't. How to evaluate the probability?
A Quick Look at Probability
Probability is simply a way of describing how likely it is that some random event will occur. A probability is normally expressed as a number ranging from 0 to 1. A probability of 0 means that the event cannot happen; there is some law of physics (or something else) that prevents it absolutely. A probability of 1 means that the event will always occur. A number greater than 0 and less than 1 expresses how likely the event is. A probability of 0.01 (1%) means that the event is expected in about 1 out of 100 times. A probability of 0.5 (50%) means that the event is as likely to occur about half the time; when flipping a coin heads should occur in about half the throws. A probability of 0.9 (90%) means that the event will happen 9 out of 10 times.
The Basis for a Probability Statement
If you see a statement of a probability for something, it might be useful to know the basis for the number given. There are three possibilities.
- Degree of Belief
- Propensity
- Frequency
Degree of Belief: This applies to single events where there is no base of experience to draw on. Consider some statements (which we made up).
- This new surgical procedure has an 80% chance of success.
- My chance of winning the Cliburn piano competition is 60%.
- I bought a stock my brother-in-law recommended. There's a 90% chance it's going to really take off.
Now for an important principle - belief is not certainty unless it is supported by evidence.
Propensity: This is an analytical value based on knowledge of the mechanisms involved in the events. For example, take an ordinary six-sided die. If you know that is is well-made, with its center of mass precisely at the geometric center of the die and its shape is a perfect cube, then any given face of the die should land on top with a probability of 1/6 (0.166666...). You can say this before you ever throw the die to test it.
Frequency: Here we're talking about actual frequency of occurrence. This is experience, or data. Let's consider a die again. Suppose you have a die and you don't know how it is made. There's only one way to evaluate the die - throw it several hundred times and record what happens. If, in fact, each of the faces appears about 1/6 of the time, you would conclude that the die is well-made and fair. On the other hand, if the die displays a shortage of sixes and a surplus of ones, it's probably loaded on the 6 side. You determine the probabilities by actually testing it.
Why Look at Probability Anyway?
We look at probabilities because practically everything is uncertain to some extent. Outcomes cannot be predicted exactly. There is one thing, however, that is reasonably certain (OK - highly probable), and that is that you don't like uncertainty. It is very hard to deal with uncertainty, particularly when risk is involved. You want positive answers. The only problem is that, in many cases, positive, certain answers are not available. You must deal with uncertainty, and probability is a tool for doing that.
Some Nitty-Gritty About Probability
We're now going to go into a bit of detail about probability. Follow along carefully.
The first thing is a little notation - a way to write probabilities. We use
p(A) = 0.6
to indicate that the probability of event A is 0.6. The probability of A
is written as p(A). The probability of B would be p(B). You see the idea.
Let's make up an example. Suppose you select any random SMU student.
What is the probability that the student is interested in going to Mustang
football games? You might guess a probability by finding out how many students
actually go to the games and dividing by the total number of students at SMU.
That would give you an estimate of the probability. You could write it as
p(interested in football) = 0.3 (we made up the number)
Let's keep going here. Suppose you select the random student and discover
that said student has a ticket for the next football game. Now what is the
probability that your random student is interested in the football games?
It's a LOT higher! You would write the probability like this:
p(interested in football|has game ticket) = 0.8 (made up the number)
This represents the probability that our random student is interested in
going to Mustang football games given that the student already has a
ticket for the next game. This is what's called conditional probability.
p(A|B) = 0.4
This represents the probability of A given that B is true and that
B has some effect on A. If B has absolutely no effect on A, then you will have
p(A|B) = p(A).
A and B are independent. Now - back to football games. Suppose that your
random student mentions that they are taking a journalism class. What is
p(interested in football|taking journalism)?
If there is no connection (likely), then
p(interested in football|taking journalism) = p(interested in football).
Knowing that they are taking journalism doesn't give you any information
about their interest in football.
What's This Good For?
By now you're wondering when we'll get to the good stuff. OK - here it is.
One area where you will see risk mentioned a LOT is in health care and medicine. These risks will be described with some form of probability. You'll see such accompanying reports of diseases, treatments and tests. You might think that such probabilities would be straightforward and understandable, but this is not the case. There are mathematically defensible ways to represent those probabilities that will mislead you if you don't know how to read them.
An Example
An example please, Professor. OK - here's one. There is a relatively serious disease called crudulosis (don't ask your doctor about this). There is a vaccine that helps, although it is not perfect; it reduces the risk of getting crudulosis by 50%. Sounds good, eh? But what does it really mean?
Let's look at the real statistics for crudulosis. Not everybody gets it; it occurs in about 1% of the general population (about 1 in 100). That means that the risk of catching it is 1% (guess most of us can resist it). The vaccine reduces the risk of getting crudulosis to 0.5%. This is an absolute risk reduction of 0.5% (from 1% to 0.5%). Now, a reduction of 0.5% certainly won't grab anyone's attention, so we try another tack. We divide the risk after vaccine by the risk before and get 0.5/1, which is 0.5, or 50%! The vaccine reduces the risk by half! Now that is an attention getter! The 50% number is a relative risk. If you read about some risk reduction expressed as a relative risk (and that's how most will be expressed), remember that you really don't know what the actual risks are unless you are given the absolute risk numbers.
There's one more number you likely will NOT see reported. It goes like this. Given that the vaccine sort of works, how many people must you vaccinate in order to prevent one case of crudulosis? We'll simply divide the absolute risk reduction into 1; we'll get 200 in this case. We must vaccinate 200 people to prevent one case of crudulosis. This number, 200 in this case, is known as the Number Needed to Treat. It would be very unusual if you saw this number.
Does looking at it a different way change your evaluation of the vaccine?
Another Example
There's good news on the crudulosis front: someone has developed a test which can help determine if you are in the 1% likely to get the disease. The test isn't perfect (no test is), but it is useful. Here are the statistics for the test.
- p(positive|patient prone to crudulosis) = .995 (true positive)
- p(negative|patient prone to crudulosis) = .005 (false negative)
- p(positive|patient NOT prone to crudulosis) = .04 (false positive)
- p(negative|patient NOT prone to crudulosis) = .96 (true negative)
- base rate = 0.01 (1%) base rate of crudulosis
Now for the exercise. Suppose you read these numbers in the newspaper and decide to go for testing to see if you ought to get crudulosis vaccine. Sure enough, your test is positive. What is the probability that you actually should go get vaccinated? Write down your estimate.
Not so easy, is it? If you find it baffling, don't feel bad. Most doctors don't know how to do it either.
Representing This So You Can Understand It
The formal way of evaluating this is called Bayes' Theorem.
To show this we have to abbreviate a bit.
(base)*p(positive|prone) p(prone|positive) = ---------------------------------------------------------- (base)*p(positive|prone) + (1-base)*p(positive|not prone)
Ferocious, isn't it? Very few people will figure out how to evaluate this. There is hope, however. If you can learn to convert the statistics from probabilities to natural frequencies, they will make sense.
To do this, let's assume that 100 people are screened with the new test. What's going to happen? Look at the data for the crudulosis test above.
- Crudulosis is found in about 1% of the population.
- The test is essentially certain to detect someone who needs to be vaccinated.
- 4% of people who do NOT need vaccination will test positive anyway.
100 (sample to test) / \ / \ Positive 5 95 Negative / \ \ / \ \ 1 4 95 (true (false (true positive) positive) negative)
In testing 100 people, we get 5 positives. Now - given a positive test, what is the probability that the individual needs to be vaccinated? It's 1 in 5, or 20% (0.2). Surprised? What this (imperfect) test has done is allow the probability that an individual testing positive actually needs vaccination to be increased from 1% (base rate) to 20%. It does not indicate need for vaccination with certainty.
Once you convert the probabilities to natural frequencies, the whole thing is a lot clearer. You can also see that the false positive rate is very important. As that rate gets higher, the value of the test for screening decreases; you'll spend a lot of time and resources checking out false positives.
The Illusion of Certainty
We said earlier that most people don't like uncertainty. They want solid, positive answers, not probabilities. Unfortunately, life isn't always certain. That said, we need to note that there are times when certainty is claimed when, in fact, it does not exist. Sometimes DNA matching and HIV tests are claimed to be "absolutely certain" and always correct. This is not true. All of these tests have small false positive rates. There is an illusion of certainty which is not justified. Also - any premise based solely on belief and without evidence is NOT certain.
One more statistic you need to be aware of is the Number Needed to Treat (NNT).
After you have defined a "bad outcome," the NNT indicates how many people
you must treat/vaccinate/etc to prevent one bad outcome.
NNT is derived from the reduction in absolute risk.
NNT = 1/(absolute risk reduction)
In our crudulosis example above, the reduction in absolute risk achieved
by the vaccine is from 1% (.01) to 0.5% (.005). The reduction in absolute
risk is .01-.005, 0r .005. THe NNT is then 1/.005, which is 200.
This means that you must vaccinate 200 people to prevent 1 case of crudulois.
Outline