While reading the Super Thinking book for October's book study, I came across Weaponized Lies through the author's references. Here I'll leave the main keywords and related contents that I jotted down while reading.
#Correlation vs. Causation p84
Plotting things that have nothing to do with each other
So many things happen in this world that coincidences are bound to occur. But that doesn't mean one thing caused the other. When two phenomena are related, regardless of whether one caused the other, statisticians call that relationship a correlation. The famous saying related to this is, "correlation does not imply causation." Formal logic has two formal phrases for this rule.
1) 'After this, therefore because of this — post hoc, ergo propter hoc'
This is the logical fallacy that arises when, simply because one phenomenon (Y) happens after another (X), you think X 'caused' Y. People usually brush their teeth before going to work. But brushing teeth is not the 'cause' of going to work. In this case, it's probably the other way around if anything.
2) 'With this, therefore because of this — cum hoc, ergo propter hoc'
This is the logical fallacy that arises when, simply because two phenomena occur simultaneously, you think one caused the other. To make this point, Harvard law student Tyler Vigen wrote a book and built a website highlighting these seemingly plausible co-occurrence relations (correlations).
#Quantification p94
Differences that don't make a difference
Statistical data is also often used to find out how much difference there is between two treatments. Two fertilizers, two painkillers, two teaching methods, the salary disparity between two groups (say, men and women doing the same work), and so on. Two treatments can differ in many ways. There may be a real difference between the two; the sample may contain confounding factors that have nothing to do with the actual situation; there may be errors in the measurement process; or random variation may sometimes produce highly improbable differences on one side or the other of the equation. The investigator's goal is to discover stable, reproducible differences, and we strive to distinguish such differences from experimental error.
But beware of the way the press uses the word "significant." To statisticians, that word doesn't mean "noteworthy." In statistics, "significant" means a result has passed one of hundreds of mathematical tests, such as a t-test, chi-square test, regression analysis, or principal component analysis. A test of statistical significance quantifies how easily the result can be explained by sheer chance. With very many observations, even a trivial difference can fall outside the range of what our model of variation and randomness can explain. Distinguishing what is noteworthy from what is not requires not only various tests but also human judgment.
#Precision vs. Accuracy p100
Precision and accuracy We tend to believe that precise numbers are 'accurate,' but precision and accuracy aren't the same thing. If I say "a lot of people are buying electric cars these days," you'll think I'm guessing. But if I say "16.39% of recently sold cars are electric," you'll think I'm stating something I know for a fact. In such cases, however, you're confusing precision and accuracy.
I might have made up that number, or I may have surveyed only a few people near an electric-car dealership. Recall the Time magazine headline mentioned earlier — that more people have mobile phones than have toilets. The claim is not implausible, but it is distorted. That is decidedly NOT what the UN study found. The UN reported that more people have 'access' to a mobile phone than have 'access' to a toilet, which means something different, as we all know.
Dozens of people may share a single phone. The lack of sanitation is still a miserable reality, but the headline implies that if you actually counted, you'd find more cell phones than toilets in the world — and that's a claim the article's data can't support. 'Access' is one of the words you have to be careful with in statistical data. Saying that some people have access to medical services may simply mean they live near a medical facility. It may not mean that they can actually use it or pay for it. As we saw earlier, the cable channel C-SPAN is available in 100 million households, but that doesn't mean 100 million people watch it. I could claim that 90% of the world's population has 'access' to Weaponized Lies by proving that 90% of the world's population is within a 40km radius of an internet-accessible location, a railroad, a road, an airstrip, a port, or a dogsled trail.
Comparing two completely different things p101
One way to lie with statistics is to compare different things — datasets, populations, products — while pretending that the things being compared are not different. As the old saying goes, you shouldn't compare apples and oranges (two completely different things). You could use suspicious methods to argue that being in the military during an armed conflict — like the war currently going on in Afghanistan — is safer than being comfortable at home in the United States. Start with the fact that 3,482 American soldiers died during wartime duty in 2010. Since the total military personnel was 1,431,000, that's a death rate of 2.4 per 1,000. The death rate across the United States in 2010 was 8.2 per 1,000. In other words, you'd say being in the military in a war zone is more than three times safer than living in the United States. What on earth is going on? The two samples are not comparable, so they shouldn't be compared directly.
#The Storytelling Animal
How do we know?
As storytelling animals and social animals, we are easily swayed by other people's opinions. We obtain information in three ways. First, we discover things directly; second, we absorb them implicitly; third, we are explicitly told by someone. A great deal of what we know about the world falls into this last category. That is, while living, we hear or read somewhere about some piece of information and end up knowing it only indirectly. And we trust expert opinion.
I've never seen an oxygen atom or a water molecule, but I came to believe in the existence of those particles after reading the many publications describing the rigorous experiments conducted on them. Similarly, I've never personally verified that Americans landed on the moon, that the speed of light is about 300,000 km/s, that pasteurization actually kills bacteria, or that humans normally have 23 pairs of chromosomes. I haven't checked whether the elevator in my building was made and is being maintained properly, or whether my doctor actually went to medical school. We trust experts, certificates, licenses, encyclopedias, and textbooks. But we also have to trust ourselves and our own discernment and reasoning.
Cunning liars who want us to waste money or vote against our own interests will try to deceive us with fabricated information, or pose as experts while confusing us with baseless figures. They try to grab our attention with information that, on closer examination, turns out to be essentially meaningless. The solution to this problem is to analyze each kind of claim we encounter as we would analyze statistics and graphs.
#Conditional probability and Bayes' theorem
Conditional probability When considering claims involving statistics, we often look at a subgroup but mistakenly think we are looking at the entire population of unspecified people. What is the probability that you have pneumonia? It probably isn't very high. But if we know more about you and your particular situation, we may be able to estimate that probability higher or lower. Such a probability is called a 'conditional probability.'
Conditional probabilities have a special notation.
The probability that the waiter will bring you ketchup, given that you ordered a hamburger, is written as follows.
P (Ketchup | Hamburger)
Here the vertical bar '|' indicates that the event written after it is the condition. As you can see, this notation omits much of the language used in ordinary description so the formula stays compact. So,
The probability that the waiter brings you ketchup given that you ordered a hamburger and asked for ketchup is written as follows.
P (Ketchup | Hamburger ordered)
(Reference) Namuwiki | Conditional probability
A conditional probability is the probability of a different event occurring given that some event has occurred. The probability that event A occurs given event B is called "the conditional probability of A given B," written as P(A∣B), and read "P A given B" or "P A bar B." P(A∣B) can change under the influence of event B; in general, P(A∣B) and P(B∣A) are not the same.
The pitfall of conditional probability
Conditional probabilities are easy to misinterpret because of their nature, so even when the statistical numbers are factually correct, readers can be misled and accidentally fall into a statistical trap. A famous example is the Monty Hall problem. Consider the following example to understand this trap.
"40% of people who died in car accidents were not wearing seatbelts. So if we flip that around, 60% of those who died in car accidents died despite wearing a seatbelt — so isn't a seatbelt actually more dangerous?"
The conditional-probability trap in this example is that the statistic states that 60% of those who died in car accidents were wearing seatbelts, but is misread as if it meant the probability of dying in a car accident is higher when wearing a seatbelt. Such errors arise because the two conditional probabilities P(A|B) and P(B|A) are different.
Bayes' theorem. When some event is caused by one of two mutually exclusive causes, Bayes' theorem is the theorem that finds the probability that, given the event has occurred, it was caused by one of the two causes.
However, since you are calculating posterior probabilities (a conditional probability — given a situation that some event has set up, computing the chance that another event will follow from it), there's a constraint that you have to know the probabilities of the prior events (prior probabilities). Recently, the related contents are being improved through big data.
Regarding the problem in the main text, suppose P(G) represents the prior probability that the suspect is guilty before we know the sample test result, and P(E) represents the probability that the evidence — the blood sample matching — appears. We want to know P(G|E). Plugging this into the formula above, replacing A with G and B with E, we get the following.
When computing P(G|E) using the formula of Bayes' rule, it can help to use a table.
(Reference) Namuwiki | Bayes' theorem in cognitive science
In psychology, neuroscience, cognitive science, and similar fields, the view has emerged that Bayes' theorem may be the very fundamental way humans think and judge. Opinions among cognitive science researchers are divided on this view, because scholars who study mental processes from the standpoint of Bayesianism assume that the processes occurring in the human brain and mind exactly follow Bayes' theorem. There is a big difference between simply claiming that humans learn about their environment and update their beliefs, and claiming that this process exactly follows Bayes' theorem. The position of Bayesian cognitive science also differs from simply using Bayesian statistics to analyze data in various branches of cognitive science.
Researchers interested in this view, the psychologists and neuroscientists, model how the information processing of neurons or behavior is exactly explained by Bayes' theorem and 'rational' from a Bayesian perspective.
*(Related papers: example in psychology, example in neuroscience)
베이즈 정리 - 위키백과, 우리 모두의 백과사전
위키백과, 우리 모두의 백과사전. 확률론과 통계학에서 베이즈 정리(영어: Bayes’ theorem)는 두 확률 변수의 사전 확률과 사후 확률 사이의 관계를 나타내는 정리다. 베이즈 확률론 해석에 따르면
ko.wikipedia.org
YouTube video explaining Bayes' theorem
Knowing what we don't know
As you know, in this world there are 'known knowns.'
That is, there exist things we know we know.
We also know there are 'known unknowns.'
In other words, we know that there exist things we don't know.
But there are also unknown unknowns in this world — that is, things we don't know we don't know.
— U.S. Secretary of Defense Donald Rumsfeld
Indeed, it's a twisted statement that's hard to grasp the meaning of. There's no reason to use the same word repeatedly like that. The Secretary could have conveyed the meaning more clearly by saying, "In this world, there are things we know, things we know we don't know, and things we don't know we don't know." Of course, beyond these three, there's another category in this world. That is, things we do know but don't realize we know. You've probably had this experience. After someone asked you a question and you answered, you may have thought to yourself, 'I'm not even sure how I came to know that.'
But either way, the heart of the story is valid. The factors that will cause you enormous damage and inconvenience — as the Mark Twain and Josh Billings epigraph at the start of the book says — are the things you think you know but actually don't, and the matters very closely related to the problem at hand that you don't even realize are. To formulate appropriate scientific questions, we have to consider what we know and what we don't know. A properly stated scientific hypothesis is 'falsifiable.'
Real scientists know that they only learn something when things don't go as expected. To summarize the above:
1 Known knowns
In this world, there are things we know — like the distance from the Earth to the Sun. You may not be able to give the answer without looking it up, but you know that the answer is already known. This is what Rumsfeld called 'known knowns.'
2 Known unknowns
In this world, there are things we don't know — like the principle by which a neural signal leads to the feeling of joy. We know that we don't know the answer to such questions. This is what Rumsfeld called 'known unknowns.'
3 Unknown knowns
In this world, there are things we know but don't realize we know — or have forgotten that we know. What was your grandmother's maiden name before she got married? Who sat next to you in third grade? When you find an appropriate cue that helps you recall the answer, you finally realize that you knew it. Rumsfeld didn't mention this, but it's the 'unknown knowns.'
4 Unknown unknowns
In this world, there are things we don't know — and we don't know that we don't know them. Suppose you bought a house. You probably hired an expert to inspect the condition of the roof and foundation, the presence of wood-destroying creatures like termites, and so on, and received a report. But if you've never heard of radon, and your real-estate agent was more interested in closing the deal than in protecting your family's health, you wouldn't think to test for it. Yet many homes have high concentrations of radon, a known carcinogen. This can be regarded as an 'unknown unknown.' Of course, after reading this paragraph, it no longer falls into that category for you. Looking carefully, what unknowns you are aware of versus unaware of depends on your expertise and experience. The pest expert will tell you that he reports only on what he can see — that is, he knows that there may be invisible damage in places he can't access in your house. He doesn't know the kind and extent of that potential damage, but he does know that such damage may exist (known unknown). If you blindly accept his report and conclude the matter is closed, you fail to recognize that other damage may exist (unknown unknown).
https://www.ted.com/speakers/daniel_levitin
Daniel Levitin | Speaker | TED
Daniel Levitin incorporates findings from neuroscience into everyday life.
www.ted.com

