Physics 3333 / CFB 3333 Risk and Probability

Click here for Professor Fisher's slides on probabilities and probabilistic fallacies. (password required)

Click here for Professor Fisher's slides on using probabilities in decision making. (password required)

Evaluating Risks and Probability

What Is Risk Anyway?

Risk is the probability that something undesirable will happen. Whatever it is, you won't like it. You wouldn't call the possibility of winning a million dollars a risk - you'd want that to happen.

The reason for evaluating risk is that practically everything is uncertain. Maybe it'll happen - maybe it won't. How to evaluate the probability?

A Quick Look at Probability

Probability is simply a way of describing how likely it is that some random event will occur. A probability is normally expressed as a number ranging from 0 to 1. A probability of 0 means that the event cannot happen; there is some law of physics (or something else) that prevents it absolutely. A probability of 1 means that the event will always occur. A number greater than 0 and less than 1 expresses how likely the event is. A probability of 0.01 (1%) means that the event is expected in about 1 out of 100 times. A probability of 0.5 (50%) means that the event is as likely to occur about half the time; when flipping a coin heads should occur in about half the throws. A probability of 0.9 (90%) means that the event will happen 9 out of 10 times.

The Basis for a Probability Statement

If you see a statement of a probability for something, it might be useful to know the basis for the number given. There are three possibilities.

Degree of Belief
Propensity
Frequency

Degree of Belief: This applies to single events where there is no base of experience to draw on. Consider some statements (which we made up).

This new surgical procedure has an 80% chance of success.
My chance of winning the Cliburn piano competition is 60%.
I bought a stock my brother-in-law recommended. There's a 90% chance it's going to really take off.

Since there's no base of experience, each of these is based solely on belief. The statements also cannot be proved wrong - there is always that probability that the event will not occur. If the surgery fails, that was the 20% chance of failure.

Now for an important principle - belief is not certainty unless it is supported by evidence.

Propensity: This is an analytical value based on knowledge of the mechanisms involved in the events. For example, take an ordinary six-sided die. If you know that is is well-made, with its center of mass precisely at the geometric center of the die and its shape is a perfect cube, then any given face of the die should land on top with a probability of 1/6 (0.166666...). You can say this before you ever throw the die to test it.

Frequency: Here we're talking about actual frequency of occurrence. This is experience, or data. Let's consider a die again. Suppose you have a die and you don't know how it is made. There's only one way to evaluate the die - throw it several hundred times and record what happens. If, in fact, each of the faces appears about 1/6 of the time, you would conclude that the die is well-made and fair. On the other hand, if the die displays a shortage of sixes and a surplus of ones, it's probably loaded on the 6 side. You determine the probabilities by actually testing it.

Why Look at Probability Anyway?

We look at probabilities because practically everything is uncertain to some extent. Outcomes cannot be predicted exactly. There is one thing, however, that is reasonably certain (OK - highly probable), and that is that you don't like uncertainty. It is very hard to deal with uncertainty, particularly when risk is involved. You want positive answers. The only problem is that, in many cases, positive, certain answers are not available. You must deal with uncertainty, and probability is a tool for doing that.

Some Nitty-Gritty About Probability

We're now going to go into a bit of detail about probability. Follow along carefully. The first thing is a little notation - a way to write probabilities. We use

p(A) = 0.6

to indicate that the probability of event A is 0.6. The probability of A is written as p(A). The probability of B would be p(B). You see the idea. Let's make up an example. Suppose you select any random SMU student. What is the probability that the student is interested in going to Mustang football games? You might guess a probability by finding out how many students actually go to the games and dividing by the total number of students at SMU. That would give you an estimate of the probability. You could write it as

p(interested in football) = 0.3 (we made up the number)

Let's keep going here. Suppose you select the random student and discover that said student has a ticket for the next football game. Now what is the probability that your random student is interested in the football games? It's a LOT higher! You would write the probability like this:
p(interested in football|has game ticket) = 0.8 (made up the number)
This represents the probability that our random student is interested in going to Mustang football games given that the student already has a ticket for the next game. This is what's called conditional probability.

p(A|B) = 0.4

This represents the probability of A given that B is true and that B has some effect on A. If B has absolutely no effect on A, then you will have

p(A|B) = p(A).

A and B are independent. Now - back to football games. Suppose that your random student mentions that they are taking a journalism class. What is

p(interested in football|taking journalism)?

If there is no connection (likely), then

p(interested in football|taking journalism) = p(interested in football).

Knowing that they are taking journalism doesn't give you any information about their interest in football.

What's This Good For?

By now you're wondering when we'll get to the good stuff. OK - here it is.

One area where you will see risk mentioned a LOT is in health care and medicine. These risks will be described with some form of probability. You'll see such accompanying reports of diseases, treatments and tests. You might think that such probabilities would be straightforward and understandable, but this is not the case. There are mathematically defensible ways to represent those probabilities that will mislead you if you don't know how to read them.

An Example

An example please, Professor. OK - here's one. There is a relatively serious disease called crudulosis (don't ask your doctor about this). There is a vaccine that helps, although it is not perfect; it reduces the risk of getting crudulosis by 50%. Sounds good, eh? But what does it really mean?

Let's look at the real statistics for crudulosis. Not everybody gets it; it occurs in about 1% of the general population (about 1 in 100). That means that the risk of catching it is 1% (guess most of us can resist it). The vaccine reduces the risk of getting crudulosis to 0.5%. This is an absolute risk reduction of 0.5% (from 1% to 0.5%). Now, a reduction of 0.5% certainly won't grab anyone's attention, so we try another tack. We divide the risk after vaccine by the risk before and get 0.5/1, which is 0.5, or 50%! The vaccine reduces the risk by half! Now that is an attention getter! The 50% number is a relative risk. If you read about some risk reduction expressed as a relative risk (and that's how most will be expressed), remember that you really don't know what the actual risks are unless you are given the absolute risk numbers.

There's one more number you likely will NOT see reported. It goes like this. Given that the vaccine sort of works, how many people must you vaccinate in order to prevent one case of crudulosis? We'll simply divide the absolute risk reduction into 1; we'll get 200 in this case. We must vaccinate 200 people to prevent one case of crudulosis. This number, 200 in this case, is known as the Number Needed to Treat. It would be very unusual if you saw this number.

Does looking at it a different way change your evaluation of the vaccine?

Another Example

There's good news on the crudulosis front: someone has developed a test which can help determine if you are in the 1% likely to get the disease. The test isn't perfect (no test is), but it is useful. Here are the statistics for the test.

p(positive|patient prone to crudulosis) = .995 (true positive)
p(negative|patient prone to crudulosis) = .005 (false negative)
p(positive|patient NOT prone to crudulosis) = .04 (false positive)
p(negative|patient NOT prone to crudulosis) = .96 (true negative)
base rate = 0.01 (1%) base rate of crudulosis

Daunting, isn't it? Here's what it means. The true positive probability is very close to 1, which means that the test will, for all practical purposes, correctly identify anyone prone to getting crudulosis. The false positive probability is 0.04 (4%), which means that 4% of the time the test will come up positive on someone who is NOT prone to crudulosis. Remember this; the false positive rate is VERY important, and it is one number you will NOT likely see reported or correctly interpreted.

Now for the exercise. Suppose you read these numbers in the newspaper and decide to go for testing to see if you ought to get crudulosis vaccine. Sure enough, your test is positive. What is the probability that you actually should go get vaccinated? Write down your estimate.

Not so easy, is it? If you find it baffling, don't feel bad. Most doctors don't know how to do it either.

Representing This So You Can Understand It

The formal way of evaluating this is called Bayes' Theorem. To show this we have to abbreviate a bit.

                                 (base)*p(positive|prone)
p(prone|positive) = ----------------------------------------------------------
                    (base)*p(positive|prone) + (1-base)*p(positive|not prone)

Ferocious, isn't it? Very few people will figure out how to evaluate this. There is hope, however. If you can learn to convert the statistics from probabilities to natural frequencies, they will make sense.

To do this, let's assume that 100 people are screened with the new test. What's going to happen? Look at the data for the crudulosis test above.

Crudulosis is found in about 1% of the population.
The test is essentially certain to detect someone who needs to be vaccinated.
4% of people who do NOT need vaccination will test positive anyway.

If our sample of 100 is representative, we expect to have 1 person who needs the vaccine. The test will, with a probability of 0.995, come out positive. Of the remaining 99, who do NOT need vaccination, the test will be positive for 4 (false positives).

                                100        (sample to test)
                              /     \
                            /         \
                Positive   5            95   Negative
                          / \            \
                        /    \            \
                      1       4            95
                  (true     (false        (true
                 positive)  positive)     negative)

In testing 100 people, we get 5 positives. Now - given a positive test, what is the probability that the individual needs to be vaccinated? It's 1 in 5, or 20% (0.2). Surprised? What this (imperfect) test has done is allow the probability that an individual testing positive actually needs vaccination to be increased from 1% (base rate) to 20%. It does not indicate need for vaccination with certainty.

Once you convert the probabilities to natural frequencies, the whole thing is a lot clearer. You can also see that the false positive rate is very important. As that rate gets higher, the value of the test for screening decreases; you'll spend a lot of time and resources checking out false positives.

The Illusion of Certainty

We said earlier that most people don't like uncertainty. They want solid, positive answers, not probabilities. Unfortunately, life isn't always certain. That said, we need to note that there are times when certainty is claimed when, in fact, it does not exist. Sometimes DNA matching and HIV tests are claimed to be "absolutely certain" and always correct. This is not true. All of these tests have small false positive rates. There is an illusion of certainty which is not justified. Also - any premise based solely on belief and without evidence is NOT certain.

One more statistic you need to be aware of is the Number Needed to Treat (NNT). After you have defined a "bad outcome," the NNT indicates how many people you must treat/vaccinate/etc to prevent one bad outcome. NNT is derived from the reduction in absolute risk.

NNT = 1/(absolute risk reduction)

In our crudulosis example above, the reduction in absolute risk achieved by the vaccine is from 1% (.01) to 0.5% (.005). The reduction in absolute risk is .01-.005, 0r .005. THe NNT is then 1/.005, which is 200. This means that you must vaccinate 200 people to prevent 1 case of crudulois.

Outline