Upon Reflection: In praise of investigative reporting

Note: This is the second in a periodic series of personal reflections on journalism, news literacy, education and related topics by  NLP’s founder and CEO Alan C. Miller. Columns will be posted here at 10 a.m. ET every other Thursday. 

As news reports go, The New York Times’ lead story on Sept. 27 was a blockbuster: Donald Trump paid only $750 in federal income tax the year he won the presidency, $750 in federal income tax his first year in office, and “no income taxes at all in 10 of the previous 15 years — largely because he reported losing much more money than he made.”

I admired the way that Russ Buettner, Susanne Craig and Mike McIntire were able to make these assertions. The Times didn’t even feel the need to provide any attribution for those stunning findings in the story’s opening paragraphs.

That’s because this reporting was based on a trove of tax-return data for Trump and his companies that extended over more than two decades, as well as on other financial documents, legal filings and dozens of interviews. The three reporters, who collectively have decades of experience in unraveling complex financial and political dealings, have been investigating Trump’s finances for nearly four years.

Their efforts culminated in a classic piece of investigative journalism. It broke significant new ground on a subject of enormous public interest with authoritative, compelling and contextual reporting. It did not ask for the reader’s trust; instead, it earned it with detailed documentation. It pulled no punches in sharing its evidence-based findings — while also explaining what remains unknown about Trump’s assets.

The response was telling. A lawyer for the Trump Organization told the Times that the story was “inaccurate” but did not cite specifics. Trump called the report “fake news,” again without citing specific errors. In investigative journalism circles, this is called “a non-denial denial.”

I have a special appreciation for what it takes to do this kind of work. For most of my 29-year newspaper career, I was an investigative reporter. I considered it journalism’s highest calling. In some ways, even as newspapers fold and the number of journalists drops in the face of economic contraction, we are in a new golden age of investigative reporting. Yet amid all the attacks on journalism and the public’s declining trust in it, I believe that most people do not truly understand what it takes to do this work well — and the stakes and standards that lie at the heart of it.

In theory, all reporters should have the ability to do investigative work. Indeed, some of the most iconic investigations have arisen from resourceful beat reporting. But those who devote themselves exclusively to this kind of work — for instance, the members of Investigative Reporters & Editors — are often a different breed who march to a different beat.

They focus principally on corruption, waste, fraud, dishonesty and abuse, whether of human rights or of public trust. They tend to take longer and dig deeper, obtaining records (often through federal or state freedom of information requests), developing a network of inside and expert sources, and thoroughly mastering the subject at hand. Their work typically must endure a multi-layer editing process, often including a review by their publication’s lawyers.

Investigative reporters tend to regard whoever wields power as their primary target. Their north star is impact — to make a difference.

To succeed, their work must be beyond reproach. Most investigations contain hundreds of facts on which findings are based. The subject of such a report will look for any factual error, however inconsequential, to try to undermine the story’s credibility (“If they can’t get even that right, why would you trust their conclusions?”). The threat of a lawsuit, even prior to publication, may hang overhead as well.

In December 2002, following months of reporting, the Los Angeles Times published “The Vertical Vision,” a four-part series on the Marine Corps’ aviation program. My colleague Kevin Sack and I detailed how the Harrier jump jet, the first aircraft the Marines could call their own, had killed 45 pilots — including some of the Corps’ best — in 143 noncombat accidents since 1971, making it the most dangerous aircraft in the U.S. military for decades. We demonstrated that this was the first of three Marine aircraft that would prove to be deeply troubled, painting a portrait of an aviation program whose high cost in blood and treasure was not redeemed on the battlefield.

In our determination to be both fair and accurate, we took the unusual step of reading, word by word, a draft of the series to Marine Corps public affairs officials. In turn, as publication began, the Marines sent supporters a detailed description of what to expect and told them that if they could find any mistakes, however small, the Corps would pounce on those errors to discredit the entire report. (They found nothing. The series led to a congressional hearing and was awarded the 2003 Pulitzer Prize for National Reporting.)

Investigative reporters must resist a particular kind of confirmation bias: falling in love with the story, or with the thesis behind an investigation. The best reporters follow where the evidence leads, rather than seeking documentation to support a desired conclusion. Failing to do so has consequences, as I discovered in my first journalism job at the Times Union in Albany, New York.

I made a critical mistake on an investigative piece about the city’s Democratic machine when I saw and reported what I expected — and, yes, hoped — to see in a legal document. “Always guard against your own assumptions,” my editor, Harry Rosenfeld, admonished me. (His words held special sway since he had been Bob Woodward’s and Carl Bernstein’s boss at The Washington Post during the Watergate investigation.) That — plus the ensuing Page One correction — proved a powerful lesson for an ambitious young reporter.

(This fealty to accuracy, as well as to accountability in the face of factual errors, stands in stark contrast to other types of content that masquerade as journalism — such as conspiracy theories, whose web of alleged insider information, sinister plots and Byzantine clues can take on the aura of reportorial revelation. But these delusions, which require their followers to suspend belief in reality, fall apart under scrutiny because they cannot be independently documented by credible sources.)

The high expectations placed on investigative reporters also put them under considerable pressure. Producing a series — or even a single story —  can take weeks or months, and that time is costly (especially amid tightening budgets). Sources can mislead. Tips don’t always pan out. And a newly discovered fact or document may undo those weeks or months of work by disproving or complicating an investigation’s underlying premise.

Yet overcoming such challenges makes the payoff all the more gratifying: landing an investigation that reveals wrongdoing, prompts public scrutiny, leads to reforms and has meaningful impact.

In early 2003, two months after our Harrier series appeared, a Marine Corps pilot stationed in Kuwait said this to a Los Angeles Times colleague: “Tell Alan Miller that he got it right.” As a result, he added, “Lives will be saved.”

Understanding COVID-19 data: Examining data behind racial disparities

This piece is part of a series, presented by our partner SAS, that explores the role of data in understanding the COVID-19 pandemic. SAS is a pioneer in the data management and analytics field. (Check out other posts in the series on our Get Smart About COVID-19 Misinformation page.)

by Mary Osborne

Are communities of color at greater risk for COVID-19? The question of COVID-19 racial disparities has circulated across media outlets since the start of the pandemic. Science tells us that viruses do not target individuals by race or ethnicity, and yet, this novel virus significantly impacts communities of color in disproportionate ways.

To understand why communities of color are disproportionately impacted by COVID-19, we must look beyond race alone and consider other risk factors that may draw dividing lines. By examining why certain populations are more severely impacted than others, we can begin to identify the underlying causes. To do that, we have to look at the data. Although the data is limited within many communities of color, there is enough to better understand the impact of COVID-19 in certain communities.

Data has demonstrated how a person’s age or underlying medical conditions can be the difference between surviving COVID-19 or succumbing to it. But are there other risk factors to be considered and could any of those factors be tied to racial inequalities?

Population, race and COVID-19

It’s no secret that minority populations have been greatly affected by the COVID-19 pandemic, often at rates that are disproportionate to those of white people. The Black Non-Hispanic population has been hit particularly hard. While they represent 13% of the population in the United States, Black Non-Hispanics comprise over 22% of COVID-19-related deaths.

The disparities in cases and deaths by race vary from state to state, driven by percentages of population. However, a negative trend has emerged in one of the nation’s smallest populations — the American Indian/Alaska Native Non-Hispanic (AI/AN) group. AI/AN people make up around 1.5% of the U.S. population but have experienced almost 1% of total COVID-19 deaths. Let’s read that again: 1% of total COVID-19 deaths are attributed to the AI/AN population, which is enormous considering that this community is such a small percentage of the total population.

Percentage of population and percentage of COVID-19 deaths by race/ethnicity

Source: US Centers for Disease Control and Prevention

The U.S. Centers for Disease Control and Prevention (CDC) have reported that the AI/AN population has a case rate that is 2.8 times higher than the White population, a death rate that is 1.4 times higher than the White population, and a hospitalization rate that is 5.3 times higher than the White population. The hospitalization rate is higher for this population than any other population in the U.S.

Percent of COVID deaths and percentage of population of American Indian/Alaska Native, Non-Hispanic population

Source: US Centers for Disease Control

Secondary risk factors and healthcare access

Yet, the COVID-19 numbers we see in the AI/AN population aren’t dissimilar to those seen in other communities of color. This is likely because they may share risk factors with other minority populations. Similar to members of the Black and Hispanic populations, many Native American families live in close quarters — sharing their homes with more than one generation or extended family. That is partially unique to the AI/AN population because housing on reservation lands is limited, and an increase in the Native American population in the last decade has put a strain on housing resources. These types of living arrangements pose a higher risk of spreading diseases like COVID-19.

Within the AI/AN population, diabetes, obesity and hypertension have emerged as factors that increase risk of severe COVID-19 disease and the need for hospitalization. According to the American Diabetes Association, this disease is more prevalent in the AI/AN population than in any other racial or ethnic group. And AI/AN people are 50 percent more likely to suffer from obesity than Non-Hispanic white people. Hypertension is also common in this population, especially among people with diabetes.

Diabetes rate by race

Source: American Diabetes Association

Economic impacts

The long-term economic impacts from the virus also are disconcerting. AI/AN people have the highest poverty rates of any other U.S. racial ethnic group. Sociologist Beth Redbird from the Institute for Policy Research has found unemployment to be the most significant factor driving poverty in Native American populations. Given the current uncertainty with job markets and employment, an improvement in poverty rates is unlikely.

Poverty by Race

Source: U.S. Census Bureau/American Community Survey

Access to healthcare is another factor to consider. In some cases members of the AI/AN communities drive an hour or more to reach a medical provider. This is further complicated by a lack of transportation on most reservations. The Indian Health Service (IHS) is underfunded and lacks medical providers, equipment and facilities to handle critical patients. It runs 24 hospitals, which have fewer than 71 ventilators and just 33 ICU beds.

So, are there COVID-19 racial disparities?

We know that the color of one’s skin doesn’t make a person more susceptible to COVID-19. But what we’ve seen from the data is that AI/AN communities are disproportionately affected because of other contributing factors. These same factors amplify the risk of COVID-19 among all communities of color.

This pattern of impact isn’t unique to COVID-19 — other diseases behave in much the same way. Instead, COVID-19 has placed a necessary spotlight on these issues because of its devastating effects. The data reaffirms that more research is needed —regarding inequality of healthcare access and how certain populations are affected by viruses like COVID-19. We need to increase society’s diligence to understand and address the unbalanced systems affecting communities of color. While the AI/AN population is often overlooked because of its small numbers, statistical insignificance doesn’t mean members of these communities are insignificant.

Other articles in this series:

SAS logoAbout SAS: Through innovative analytics software and services, SAS helps customers around the world transform data into intelligence.

Early news literacy ‘lessons’ benefit Washington Post reporter

When Shane Harris, an intelligence and national security reporter at The Washington Post, was in fourth grade in the mid-1980s, he received an informal introduction to news literacy.

As part of his language arts and English class, Harris and his classmates spent a week learning how to read the newspaper. They started with the front page, as the teacher explained how stories placed there were likely the most important of the day. Then they turned to the opinion pages, learning the difference between columns and news stories.

“That is basic news literacy,” Harris said recently, “and it kind of informed a lot of my behavior as a news consumer ever since.”

Harris has reported about intelligence and national security for two decades and is the author of two books: The Watchers: The Rise of America’s Surveillance State and @War: The Rise of the Military-Internet Complex.  He’s also a firm believer in news literacy as a solution to today’s convoluted information landscape. So Harris quickly jumped on board when the Post’s executive editor, Marty Baron, told the newsroom  about the News Literacy Project’s Newsroom to Classroom program , which connects journalists with students.

“I was just really pleased that as an institution, we were encouraging our reporters to go talk to young people about how to read the news and how to be smarter consumers about it,” Harris said.

Connecting Washington Post, classroom

Once he signed up, Harris, who lives in Washington, D.C., connected with Cade Elkins, a high school teacher in Huntington, West Virginia. Initially, the session was going to take place over Skype with Elkins’ students in the room to ask questions of the journalist. But then COVID-19 hit and the school went remote. Elkins solicited questions from his students and presented them to Harris during an insightful, hour-long video conversation.

You can watch it here.

Shane Harris, Washington Post reporterHarris was not surprised that the students asked pointed questions. He said he always enjoys speaking with young people, because they think of topics he’s not normally asked in everyday conversations about his job.

“They tend to ask like really frank questions that oftentimes go to just core basic issues about what we do, how we do what we do,” Harris said. “And I was also really impressed that they were clearly discerning. They had skeptical questions, too, about how we report and what they read and what they see.”

Elkins and Harris talked at length about an issue that Harris sees as critically important to having an informed public that trusts standards-based reporting — demonstrating how quality journalism is done.

Lifting the curtain on journalism

“I think that sometimes we hide behind the mystique of what we do to kind of convey some level of authority to it of, ‘Just trust us, we’re professionals,’” Harris said. “Well, you know, to a degree, yes, we’re professionals, but that doesn’t mean we’re above explaining how we do things to people.”

That’s a big part of what the Newsroom to Classroom program — and our NewsLitCamp® — are all about. If students have a better understanding of how reporting works and that a story doesn’t start and end in a day but develops over time, Harris noted, they will be more likely to trust good reporting and believe in its importance.

Journalists also can gain trust by helping students — and the general public —  understand that they are doing their jobs to inform readers and are not insiders armed with special information that they choose to disclose based on personal motivations.

“I think the more that we can convey to people that what we do is a job and that we’re moving through the world just like everybody else, it makes it seem, I hope, that what we’re doing is more trustworthy and certainly more transparent,” Harris said.

News literacy a must

In high school, Harris’ favorite class was civics. He loved learning about government. But a lack of civics knowledge and education is a problem in the U.S. Almost half of adults questioned about civics couldn’t name all three branches of the government, according to the 2020 Annenberg Constitution Day Civics Survey. Given that reality and the immensely challenging information landscape, Harris thinks news literacy is a must for students and the public.

“We’re talking about basic tools that you have to equip people with not just to be good citizens participating in a democracy, but to ensure that they’re not exploited and they’re not taken advantage of,” he said. “I feel really strongly that news literacy is an important part of that kind of curriculum.”

The Newsroom to Classroom program is part of the solution.

If you are a teacher signed up for Checkology®, reach out to a journalist in the directory. If you are a reporter interested in talking about your profession to students, learn more about the program here.

Upon Reflection: How to spot and avoid spreading fake news

Cartoonist Walt Kelly coined the phrase "we have met the enemy and he is us" for an anti-pollution Earth Day poster in 1970 and used it again in an Earth Day cartoon in 1971. In the accompanying illustration, we’ve taken the liberty to apply it to today’s online pollution.

Cartoonist Walt Kelly coined the phrase “we have met the enemy and he is us” for an anti-pollution Earth Day poster in 1970 and used it again in an Earth Day cartoon in 1971. In the accompanying illustration, we’ve taken the liberty to apply it to today’s online pollution.

Note: This is the first in a periodic series of personal reflections on journalism, news literacy, education and related topics by  NLP’s founder and CEO Alan C. Miller. Columns will be posted here at 10 a.m. ET every other Thursday. This initial piece was published in the Chicago Tribune on Sept. 28:

It’s time that we recognize one of the great challenges confronting our democracy: We are at an inflection point where facts may no longer continue to matter.

The notion of “alternative facts” is no longer so far-fetched. Emotions and opinions threaten to supplant evidence, and conspiracy theories and viral rumors can overwhelm reason. This is especially pernicious on social media — today’s no-holds-barred public square.

The corrosive threat of misinformation permeates every aspect of our civic life. It undercuts our ability to protect ourselves and others from COVID-19. It undermines trust in the news media and in our democratic institutions — and, in particular, the right of citizens to cast their ballots.

Indeed, with Election Day on Nov. 3 fast approaching, we’re being deluged with news reports, opinion columns and commentary, social media posts, images, videos and other communications about candidates, campaigns and the act of voting itself. But we don’t need to wait for the ballots to be counted to make one call: Much of what we’re reading, watching and hearing is not intended to inform us, or even persuade us. Instead, it’s created to misinform us, inflame us and divide us.

For the entire piece, please see Commentary: How to spot and avoid spreading fake news.

Free news literacy resources for the public

Since 2008, NLP has helped students across the U.S. and beyond learn to sort fact from fiction. Now, to meet the urgent need for news literacy among people of all ages, we are unveiling free tools and resources for the public. This includes a customized version of our signature e-learning platform, Checkology®.

This expansion of our mission comes in response to the growing crisis of false information in America.

“We believe misinformation and a lack of news literacy skills and knowledge pose an existential threat to our democracy,” said Alan C. Miller, NLP’s founder and CEO. “We recognize the critical need for people of all ages to have the ability to determine what news and information to trust and to understand the importance of a free press as informed and engaged participants in a democracy.”

News literacy lessons for all

We have developed a version of Checkology that provides the public with a comprehensive news literacy program. And it is now available at no cost. Launched in 2016, Checkology is widely used by educators to teach middle and high school students news literacy skills, habits and mindset.

This new public version includes foundational lessons, supplemental practice opportunities and fact-checking tools for reverse image searches, geolocation and more. In addition, it teaches users how to identify credible information, seek out reliable sources, understand media bias — as well as their own. It also helps users learn to apply critical thinking skills to differentiate fact-based content from falsehoods. And users gain an understanding of the importance of the First Amendment and the watchdog role of a free press.

Learn more by watching our video:

New podcast

And today, we launched the podcast Is that a fact?, featuring experts who address the question, “How can American democracy survive and thrive in our toxic information environment?” The first episode, featuring writer and professor Brendan Nyhan of Bright Line Watch, is available on our website and on various podcast platforms. Upcoming guests include Kara Swisher of Recode and The New York Times, Maria Ressa of Rappler and Michael Luo of The New Yorker.

The 10-episode season is hosted by Darragh Worland, NLP’s vice president of creative services. The show will include conversations with leading American thinkers, journalists, foreign policy experts, psychologists and authors. It will seek to help listeners understand how they can become part of the solution to the misinformation crisis. Future segments  will drop every Wednesday.

Additional resources

Also, starting Tuesday, Sept. 22, we will publish a free weekly newsletter for the public called Get Smart About News. This publication is adapted from our popular free newsletter for educators, The Sift®. It will highlight and debunk timely examples of the most widespread conspiracy theories, hoaxes and rumors. Readers will find tips and tools to help navigate today’s complex information landscape. Get Smart About News will arrive in subscribers’ inboxes every Tuesday.

Finally, in 2019, we launched a free mobile app Informable®. Updated in 2020 to address COVID-19 misinformation, Informable helps people of all ages practice four distinct news literacy skills in a game-like format using real-world examples.

PSAs to help voters learn to navigate election misinformation

NLP and The Open Mind Legacy Project (OMLP) released public service announcements today to educate voters on how to avoid being misinformed about the November elections. Comcast, The E.W. Scripps Company and public media stations will air the video and audio PSAs, which also will be featured in a paid and organic digital ad campaign on social media and other streaming platforms.

As the election approaches, misinformation and disinformation about the voting process by both domestic and foreign sources have the potential to undermine the democratic process. U.S. intelligence officials have issued warnings that other countries are already using such tactics to sow confusion and interfere in the election.

The initiative aims to prevent voters from being misled by false information, such as being told that they can vote by text or by phone, that the election is canceled or that polling places are closed or have been moved.

PSAs in English and Spanish

The PSAs include four 30-second and two 15-second videos in English and Spanish, as well as audio versions of the spots. They will debunk myths about voting, address the need for voters to break out of their filter bubbles and advise them to verify facts before sharing social media posts. The PSAs will drive viewers to a special webpage created to help the public understand how misinformation can influence elections. The page will include real-time examples of falsehoods, free resources for the public, blog posts with tips on understanding election-related data, downloadable graphics that show people how to identify misinformation, and quizzes and other tools to help build news literacy skills in the weeks leading up to the election.

The PSA campaign will focus on communities targeted in previous election-related misinformation campaigns that remain vulnerable to voter suppression tactics, including Black and Latinx populations. The effort is expected to reach millions of Americans.

You can watch and listen to the spots here. Anyone interested in airing them can download them or contact NLP for more information at www.newslit.org.

About The Open Mind Legacy Project

The Open Mind Legacy Project, a civic education and media nonprofit, produces The Open Mind, a weekly public affairs broadcast and daily podcast, supporting fact-based discourse, deliberative democracy and engagement of ideas.


Making sense of data: How to be savvy about data in the news

Making sense of data: How to be savvy about data in the news is is the conclusion of a series, presented by our partner SAS, exploring the role of data in understanding our world. SAS is a pioneer in the data management and analytics field.

Data is one of the best ways to understand our world, but it can also be one of the most challenging things to get right. Over the course of this series we’ve looked at the power and shortcomings of data and data visualizations. If you aren’t paying close attention, data easily can be used to mislead you. Common areas of concern arise in each step of data collection, analysis and presentation. Additionally, data can also be misleading in infographics and social media, two increasingly popular ways people are exposed to data. But, with this series and more practice, you can feel more confident about the way you are exposed to and consume data.

As we all experience increased exposure to media and data-driven messaging, we need to stay vigilant in how we read and respond to those messages. So,here are the top six takeaways from this series on how to be a savvy data consumer:

Be a critical thinker

What is the author trying to convey with the data? Does the data match the argument? Is more information needed to better understand the data?

Go beyond the numbers and search for context

Also, where did the numbers come from? What exactly does each number represent? Gather all the information you can about how the data was collected and why to help you understand what it really means.

Ask and answer your own questions

So, what does the data make you curious about? Does it prompt you to want to learn more? Are you able to answer your questions with the data that’s been provided? Try doing your own analysis on the data and see what you find.

Check important chart elements

And remember to double-check the most important elements of the chart to see if the data visualizations may be misleading in any way. This includes checking the scale on both axes as well as legends and labels to ensure they match what you expect.

Look past design

Is the visualization you’re looking at focused more on being beautiful or on accurately presenting information?  Focus your attention on the underlying data and not the visual elements.

Be skeptical of flashy messaging

Big, bold titles and messages can be misleading. Confirm whether eye-catching content really matches the data presented. Even when you see subtle mentions of the author’s conclusions, it’s good to double-check those, too.

And keep in mind that data is all around us. It’s  not only important to be a savvy consumer of data but also to give data the context it deserves when presenting your own findings. Continue to think critically about the data you see and be sure to take a closer look if something appears misleading. With these guidelines in mind, you have the power to use data to better understand and describe the world.

Test yourself: Take our data quiz (here or below)!

Related articles:

powered by Typeform

SAS logoAbout SAS: Through innovative analytics software and services, SAS helps customers around the world transform data into intelligence.

Making sense of data: Spotlight on data in social media

Making sense of data: Spotlight on data in social media is the sixth in a series, presented by our partner SAS, exploring the role of data in understanding our world. SAS is a pioneer in the data management and analytics field.

Like infographics, social media and other forms of user-generated content pose unique challenges regarding data. Many news outlets and journalists have checks and balances in place to ensure that information reported (especially information based on data) is as accurate as possible. But those same checks and balances do not hold true for those creating and sharing content. Social media gives user-generated content a wider reach and greater influence than ever before. Sharing content on social media poses additional challenges. For example, it can be difficult to identify the original source, and consequently, its credibility. In this post we’ll examine common issues that arise with user-generated and shared social content.

Social media posts are often criticized for allowing users to share pieces of information without putting it in context. This is particularly problematic with data. While the numbers and charts may represent real information, the reader needs context to interpret information correctly. Readers may inadvertently share questionable charts or statistics because social media platforms make it easy to do so. Images of graphs and charts pulled from research articles can quickly be shared without citing the source. These images might then circulate widely without context.

Intended for sharing

Even thoughtfully designed charts and graphs taken out of context can be problematic, but what happens when content is designed specifically for social media? Authors of social content know that readers often prefer brief messages instead of a large amount of information. Knowing shorter content is more likely to quickly spread to a wider audience, authors may intentionally design their social posts with content that will catch the eye of the audience they wish to reach.

In earlier posts, we’ve discussed features that can make data and data visualizations hard to interpret correctly. Whether it’s logarithmic scales or truncated axes, we’ve seen how these may make data easier to read, but may unintentionally cause readers to draw inaccurate conclusions. While these mistakes are often unintentional, some users may intentionally manipulate visualizations to reinforce a specific point of view.

A fresh look

Figure 2. Global Life Expectancy (full axis)

Figure 2. Global Life Expectancy (full axis)

Figure 1. Global Life Expectancy (truncated axis)

Figure 1. Global Life Expectancy (truncated axis)

Let’s take another look at some of the data we examined before. These two charts show life expectancy in different countries. We demonstrated how truncating the axes allow you to see the differences in data, but also might lead readers to view those differences as more significant than they really are. Now suppose this isn’t an honest mistake, and the author wanted to create content to convince you of something – perhaps that you should move to France to live longer. If that were the case, the author might create a chart like the one below.

bar graph

Figure 3. Life Expectancy Social Media Post

Does this seem like something you might see on your social media feed? The is the same data as in previous charts, but it’s been heavily manipulated so that you focus on what the author wants you to see. First, it includes a large title with the conclusion the author wants you to make – that moving to France will lead you to live a longer life. It doesn’t mention that this is solely the life expectancy of the country’s residents. A variety of cultural and lifestyle factors come into play, and would not apply if you simply move to a place. Also, removing gridlines and shrinking the size of the font showing age makes it harder to read the scale of the data. The choice of colors also further conveys the purpose — implying that the U.S. is worse than France.

Manipulating data to make a point

Let’s take a look at another chart. Here’s data from the same source, presented two different ways.

line graph showing homicide death rate falling over fourty years in the U.S.

Figure 4. Homicide Rate Line Chart

line graph showing homicides increasing in U.S. over a six year period

Figure 5. Homicide Rate Social Media Post

The first shows the death rate by homicide relative to the population. It charts the rate over 36 years and indicates a mostly steady decline. Despite some bumps and dips, but overall, the trend is downward. However, look how the data can be manipulated to prove a specific point. The second graph uses a different unit, the raw number of homicides each year. It dos not average it out as population increases. The scale of years also is manipulated slightly. The first half of the graph represents four years, while the second half represents only two years. However, this isn’t brought to the reader’s attention. The axes are truncated, and the width is even reduced to further exacerbate the angle of the line. Again, the chart has a bold title with a random (and in this case meaningless) statistic tossed in for good measure.

There are many ways to manipulate data to prove a specific point. These graphs are just two examples of the ways people may present data to make a particular point, especially when sharing on social media. If you see a graph like the one above, with a catchy headline and no additional context, take a careful look and see if you can find how the author may have manipulated information to emphasize a point.

Disregard suspect data

It is challenging and sometimes impossible to conduct further research on social media posts or find the source or context of data presented. If you can’t find a reputable source that provides this context, it’s best to ignore the information. Approach it as an opportunity to be a responsible consumer of data — don’t share or “like” such posts! If you’re particularly curious about a point being made, see if you can find your own data to back it up and perhaps create your own content in a thoughtful, context-driven way.

To learn more see Diving into charts and graphs in the news or our full list of related articles below.

Test yourself: Take our data quiz (here or below)!

Related articles:

powered by Typeform

SAS logoAbout SAS: Through innovative analytics software and services, SAS helps customers around the world transform data into intelligence.

Making sense of data: Special look at issues with infographics

Making sense of data: Special look at issues with infographics is the fifth in a series, presented by our partner SAS, exploring the role of data in understanding our world. SAS is a pioneer in the data management and analytics field.

Infographics are one of the most visual ways to tell stories with data. They are designed to catch the reader’s eye, and they use visuals to provide a lot of information in a small amount of space. However, we’ve learned there are many ways data can mislead a reader, and those same issues often come up when using infographics. Let’s explore some of the special features of infographics that make these representations particularly challenging to interpret correctly.

Over-designing infographics

Infographics are intended to attractively display information and often incorporate several design elements. But sometimes those design elements overpower the data and distract the reader from the underlying information. Common design elements inspire the infographic below.

infographic depicting causes of accidental deaths

Figure 1. Example Infographic

This infographic communicates facts regarding the causes of accidental deaths in the U.S. in 2013. The charts are designed to be visually appealing, and they are. However, the data itself could be communicated in a clearer and more concise manner if the creator used traditional graphs and charts.

Let’s start with the pie chart. It shows the main causes of accidental death. There isn’t much of a difference between the rates of poisoning (30%), falls (26%) and motor vehicle accidents (23%). Yet, the enlarged wedges imply that poisoning deaths far exceed deaths in motor vehicle accidents. This type of design violates the principle of proportional ink.

The next graphic shows a rounded bar chart depicting motor vehicle deaths by age. Here again, this type of display alters the perception of the data. Because the inner circle has a smaller radius, the yellow bar appears to go farther than any other bar. But if you look at the actual data, it should be much smaller than the purple or blue bars. The manipulation of the graphic makes it appear more significant than it is.

Consider size and scale

The use of icons to represent data, as seen in the chart showing accidental deaths of infants, is a common infographic design. However, if the creators aren’t careful, the size and scale of the icons can make the data difficult to accurately interpret. In this graphic, the drowning icon is smaller in scale than the others, hiding how many more deaths are caused by drowning compared to other causes. This graphic would be much easier to interpret correctly if the scale of the icons was equal.

The next bar chart, showing drowning by age, uses a different design treatment — a 3D view. This makes the chart more appealing than a traditional bar chart, but again, the design makes it harder to read. The 3D view doesn’t display axes so the only thing the reader can do is compare the different categories, not make judgments on the total numbers. The perspective on the chart is misleading as well. The scale of the first bar (ages 5-25) looks significantly larger than the last bar that’s further away (65+) but, the actual difference in these values is not substantial (710 and 554 respectively). This is another case where the design treatment of the graph makes it easy to draw incorrect conclusions.

The last graphic, accidental deaths by firearm, uses icons to communicate a statistic, but the graphic isn’t particularly helpful. Here each image icon represents 10 people accidentally killed by firearms, but the use of the graphic doesn’t help the user understand the scale of deaths. It is simply there to add visual display to a statistic.

Prioritizing design

In addition to the design issues above, important information may be left out of an infographic. The creator isn’t intentionally trying to mislead readers in this case, but rather, he or she is  prioritizing design because the additional data might detract from the overall product. This can include important features such as axes, labels and other elements that are critical for reading and understanding a chart. Consider the 3D bar chart in our example. Without an axis labeling the value of those bars, it is impossible to interpret the chart accurately.

Context is often one of the most important things infographics can lack. We’ve talked in detail about the importance of asking questions about the data you’re reading, such as “what did the survey ask?” and “where does this data come from?” That information is often present in longer articles able to describe the methodology and sources of data. Infographics may not have this context to help you answer critical questions. A good infographic will provide at least a reference to where the data comes from, or where you can find more information, but this isn’t always the case. You may see infographics with data from a variety of sources, which means it may have been collected using different methodologies or samples, making it challenging to piece together a cohesive picture.

Overall, infographics can be a nice way to engage readers with data and information. However, issues arise when the design of the infographic takes precedence over accurately communicating information. Creators of infographics should carefully focus on readability first, before considering design. And readers should recognize that the issues we have covered about interpreting data in reports, articles and surveys hold true when interpreting infographics, and may be especially challenging in this case.

Test yourself: Take our data quiz (here or below)!

Related articles:

powered by Typeform

SAS logoAbout SAS: Through innovative analytics software and services, SAS helps customers around the world transform data into intelligence.

Making sense of data: Evaluating claims made from data

Making sense of data: Evaluating claims made from data is the fourth in a series, presented by our partner SAS, exploring the role of data in understanding our world. SAS is a pioneer in the data management and analytics field.

Every day people use data to better understand the world. This helps them make decisions and measure impacts. But how do we take raw numbers and turn them into information that we can easily understand?

We make claims, or statements, about what we think the data tells us. And we often get our information from what the media report about data. And authors use data to support arguments or inform the public. However, the claims may depend heavily on how and what type of data was collected. We discussed some issues with data collection in the last post, but let’s go into more detail about issues that may arise when authors use data to build arguments or make claims.

Control or comparison groups 

People often use data to draw comparisons between different types of groups or behaviors. It’s important to pay attention to the questions asked and who is being asked. And, it’s equally important to ask what researchers are comparing their claims against.

Imagine we poll people and find that 60% say green is their favorite color and they eat vegetables every day. We could claim that liking the color green means a person is more likely to eat vegetables. While the survey indicates that most people who like the color green eat vegetables daily, it’s hard to draw further conclusions without a comparison group.

If only 20% of people who don’t like the color green eat their vegetables, this might suggest an interesting relationship. However, it might also be true that 60% of people who don’t like green eat their vegetables – the exact same proportion. In this case, liking the color green has absolutely nothing to do with vegetable consumption. You might also find that 90% of people who don’t like the color green eat vegetables daily. This shows  a negative relationship between liking the color green and eating vegetables daily.

As you can see, if an author presents only one number without comparison, interpreting the true value of the statement can be difficult. We need to compare that number against another group to get a better picture of its significance.


Figure 1. Comparison groups

Interventional studies

This is especially important when researchers conduct interventional studies in areas such as medicine, fitness, food, and other areas to investigate a particular outcome. In these cases, researchers use a control group. This is a special population that serves as a comparison and does not receive whatever is being tested. Designing an appropriate control group can be tricky because things can change simply because the group is aware it is being observed.

Imagine we conduct a trial for a new weight loss drug. Participants are split in half. One group receives the new drug, and the other group receives nothing. Our goal is to determine which group loses the most weight. It may be that the people who receive nothing realize that the situation remains the same and they are not receiving treatment. It’s unclear how this might affect their behavior. Similarly, the people taking the drug assume something might happen, and perhaps this affects their attitude and behavior. We can’t tell if the drug had the impact or if the act of taking a pill changed their behavior. For this reason, many studies include a placebo, a pill that doesn’t contain any chemicals at all, as part of the comparison. This works well especially if researcher and participants do not know who gets the placebo and who gets the real drug.

More challenging studies

This can be much harder in studies that don’t involve medication. Perhaps my intervention is an exercise program. How can I design a fair control group? Do I give my control group advice to not exercise? Do I ask them to do exercise that is similar to my program but not exactly the same? Each of these decisions directly impacts the comparisons I might be able to make and how I will communicate results.

When reading such study data, it’s important to look at the actions of both the experimental group and the control group. Ask yourself if the comparison seems fair, or if it seems as though one group had an advantage or disadvantage. What other factors might explain the results you see?

Bias in research  

Finally, why the studies are being conducted and who is interested in the research are of particular concern in data and research studies. Specifically, organizations collecting data on their own products may be motivated to prove those products work or that people like them. They may unintentionally make decisions in designing their research questions or comparison groups that increase the chances  they get the results they’d like to see. Researchers who receive money from a specific organization may also make decisions that favor the group that funded their research. As we’ve already learned, there are many complicated decisions that go into collecting data that will inevitably impact the quality of the final result, so it’s important to critically analyze these choices.

While data allows us to measure and better understand our world, it is not a perfect representation of reality. There are many opportunities for bias or flaws when collecting data that can impact the quality of the results. Critical thinking is the best way to find these biases or flaws when reading about data. Ask how the data was collected and from whom. Look for who was included and who was left out. If there is something that seems impressive, ask “compared to what?” To gather those answers, you may have to do some digging of your own, but it will help you determine what data you can trust and recognize data that may be too biased or flawed and should be disregarded.

Test yourself: Take our data quiz (here or below)!

Related articles:

powered by Typeform

SAS logoAbout SAS: Through innovative analytics software and services, SAS helps customers around the world transform data into intelligence.

Making sense of data: Understanding complications with data collection 

Making sense of data: Understanding complications with data collection is the third in a series, presented by our partner  SAS, exploring the role of data in understanding our world. SAS is a pioneer in the data management and analytics field.  

As we have seen, statistics and visual representations of data can be misleading. But what happens when the data itself is misleading? And if data is supposed to be based on fact, you might wonder how data can be misleading. It comes down to the way it is collected. It is essential to have a strict process of collecting data before analyzing or presenting it. To ensure the data is accurate and as representative as possible, we must pay special attention to how data is collected.  

Here are some of the most important questions to consider when understanding how data is collected:  

  1. Who or what is represented in this data?  
  2. What questions are being asked?  

Sample selection  and  data collection  

Without collecting data on an entire population, it’s nearly impossible to report it with complete accuracy because of sampling limitations.  Suppose we want to better understand the eating habits  of AmericansThe only way to ensure we  have an accurate picture of  American eating habits is to monitor every single American, every second of the day, and record  everything they eat. Since this  is impossible, researchers will oft en use a sample, or a small portion of the population of interest. When the sample selected isn’t representative of  the larger group, you get misleading data. 

Consider how this might play out if someone was conducting a dietary study of Americans. In this case, the study asks 100 people about their eating habits.  But how are those people selected? Options are endless: 

  • Collect data from 100 friends. That’s a convenient sample, but  most people’s friends are about their age and eat similar types of foods.  
  • Gather data from a local restaurant or grocery store. Again, this might impact the type of data collected. For example, surveying people in a fast-food restaurant may give very different answers than surveying people in an upscale restaurant or a health food store.  
  • Conduct surveys at a non-food establishment, such as a library. This could be problematic, as librarygoers might eat differently than the rest of the population. But even more concerning, those librarygoers all come from the same area. The type of food people eat varies by locale. Those who live in cities likely eat different foods than those who live in rural areas. Food preferences can also vary depending on a person’s background or culture 

All of these are confounding factors or present possible issues with data.  If we want a representative sample, we need to gather data from a cross section of age, gender, race, residence, income level, and so on. Finding such a representative sample can be incredibly difficult, and  so it doesn’t often happen. Researchers typically report the population used in samples. This helps the reader understand who is reflected in the sample and the impact that might have on the results. As a consumer of data, it’s important to pay close attention to this piece of information. Ask yourself if the results presented by the researchers apply to the whole population or if those results only apply to the population sampled.   


Figure 1. Biased Sample


Additionally, there can be issues when how the data is collected, or the questions asked, only tell part of the story. We said before that the best way to see what people are eating is to consistently monitor what they do, but getting firsthand access to information like this is often impossible or unethical. Instead, researchers design studies or questions to gather similar information. Consider the following scenarios:  

  1. Researchers ask participants to keep a food log for a week that details everything they eat and track total servings of fruits, vegetables, meat, etc. 
  2. Researchers ask participants “In general how many servings per day do you have of fruits, vegetables, meat, etc.?” 
  3. Researchers ask participants “What kind of foods do you usually eat?” 

Each of these scenarios is trying to answer the same question: What do people eat? But the information is being gathered in very different ways.  

Scenario 1 seems closest to our observation study, but there are some ways that the data may be biased. One concern is that people know they’re recording their foods, and this may lead them to eat differently for the duration of the study.  The data could also vary depending on the time of year. Many people make different food choices in the summer compared to the winter.   

Scenario 2 also presents problems. This question asks people to think more holistically but relies on memory and judgment. Individual estimates of what is typical may vary from what is actually eaten. People may intentionally or accidentally make themselves appear to be healthier eaters than they really are. It can also be difficult to accurately judge your own behavior.   

In Scenario 3, the question isn’t specific enough to gather good information. While people might report the amount of fruits and vegetables they eat, the question leaves room for general or unrelated answers, such as cuisine type (Italian, Mexican or others), or a preference to eat out or at home.  

people of different races discussing something on a tablet screen

Figure 2. Conducting a Survey


As you can see, the way questions are asked, and who is asked those questions, makes a big difference in the kind of information collected. Some questions are better than others. When interpreting data, see if you can find the questions asked by the researchers. Are they good questions? And are the results influenced by how the researcher asked them or how they gathered the data?  

Test yourself: Take our data quiz (here or below)!

Related articles:

powered by Typeform

SAS logoAbout SAS: Through innovative analytics software and services, SAS helps customers around the world transform data into intelligence.

Making sense of data: Exploring statistics in the media

Making sense of Data: Exploring statistics in the the media is the second in a series, presented by our partner SAS, exploring the role of data in understanding our world. SAS is a pioneer in the data management and analytics field.

“Numbers don’t lie” is a phrase we often hear to support the idea that something must be true if you can cite data or statistics about it. But even accurate numbers can paint a misleading picture, particularly if people don’t know what to look for. Several common ways to report metrics and statistics can easily mislead readers. Let’s explore how statistics can be misinterpreted.

Mean vs. Median 

Mean and median are two statistical concepts that often get muddled. Both are measures of central tendency, meaning the values are intended to represent the “middle” of the data. But this can be done in different ways, which is important to understand.

To find the mean, or average, add together the value for every member of a group and divide it by the total number of members. Say you and four friends were trying to figure out the average amount of money in your wallets. You’d count up the total amount of all your money and divide it by five. Median, on the other hand, represents the value in the very middle. In this case, you’d arrange each person’s money from the least amount to the greatest. The median is the amount in the middle. Two people have larger amounts, two people have smaller amounts. The person with the median amount of money is smack in the middle.

infographic showing median and mean hourly income

Figure 1. Median vs. Mean

As you can see in the image above, the average and the median are not always the same. In fact, with heavily skewed data, they can be quite different. When we talk about skewed data, we’re referring to data with a heavy concentration of values on one end and only a few values on the other end. This often happens when discussing income, which is easily skewed because most of us have middle- or low-income levels while only a  small percentage have very large incomes. Take U.S. income data from 2014, for example.

bar graph showing distribution of household income in the U.S.

Figure 2. Distribution of Annual Household Income in the United States (2014)

Don’t be misled

The median income in the U.S. is approximately $33,000, while the mean (average) income is approximately $50,000. This data shows half the U.S. population makes $33,000 or less. However, if you consider the total of 2014 incomes across the population and divide by income-level breakdowns, the average salary is $50,000.  This average salary might lead you to believe that the “average American” is doing better than he or she actually is. In reality, people who make the median salary would love the 50% raise the average salary represents. The value reported can make a big difference in how people understand information.

Impact on decision-making

A solid understanding of these metrics is especially important when statistics influence decision-making. Consider how life expectancy is often reported. Though we rarely see the term “average” used, that is what the data show. The average age at death in the U.S. is around 79, but the median age at death is about 83. This difference has a big impact on decision-making, such as retirement planning. It is quite different to say, “the average person dies at 79” compared to “half of adults live 83 years or longer.” Our retirement funds would likely be more of a priority in the second instance than in the first.

So, which measure is the best one to give when explaining data? Whenever possible, the answer is both. Knowing both mean and median gives the reader a better understanding and clearer picture of the data at hand and helps them draw accurate conclusions. But having people apply this thinking is not always in the best interest of the person reporting the numbers. You’ll notice that lottery tickets often advertise the “average winnings,” which can be enticing, compared to the “median winnings,” which are usually $0.

Percent Change 

Changes, when given as a percentage, are another type of statistic commonly misused or misinterpreted, and can cause confusion.

One particularly confusing case arises when the value that is changing is, itself, a percentage. Suppose a local politician is performing “fairly well” with an overall approval rating of 50%. Then, this politician opposes a bill to fill all the potholes in town, and with that, the politician’s public approval plummets. If the politician’s approval rating dropped by 20 percent, that is 20% of the initial 50%, giving the politician a 40% approval rating. The narrative changes if the politician’s approval rating drops 20 percentage points. This brings the rating of 50% down to 30% — 20 points lower. The language sounds very similar but the resulting numbers are quite different.

Another common source of confusion arises when something increases over 100%, or doubles. Imagine we’re talking about a garden and how many more tomatoes were grown this year compared to last. If 100% more tomatoes were grown this year compared to last year, that means the yield doubled — 10 tomatoes became 20. What if the yield went from 10 tomatoes to 15? The yield increased by 50%, but the total yield is 150% of the previous year. Confusing? Yes! That’s why statistics can be difficult to interpret.

Avoiding confusion

Sometimes authors confuse the two and report that something increased 150% when it only increased 50%. Say the garden produced 25 tomatoes this year, an increase of 150%. But someone might interpret that to mean the garden had 150% of the yield and only had 15 tomatoes. It’s often confusing, and when reading someone else’s reporting it can be hard to tell if the correct percentage is being cited since these terms are often and easily misused accidentally, or perhaps by design. If reporting an increase, an easy way to solve this problem and help the reader fully understand the metric being used is to provide all the numbers up front. One could report, “the garden’s yield increased from 10 to 25 tomatoes. That’s a 150% increase!”

infographic showing increase in crop yield

Figure 3. Tomato yield example

Another tricky area when interpreting data involves the unit an author chooses to present. Sometimes, the same information can look very different depending on the unit presented. For example, during the COVID-19 pandemic we’ve seen different sources report new cases, total cases and cases per capita, all of which have very different patterns of behaviors across states and countries. Authors also might talk about total deaths, deaths per capita or deaths as a proportion of positive cases. This post provides more detail about the differences these numbers represent and how those differences in reporting can impact how we understand data.

A significant influence

The types of statistics and metrics authors use to communicate data can have a significant influence on how information is interpreted, especially if the author is not careful to fully explain the reasoning behind them. Authors can also make mistakes when presenting and interpreting statistics because of nuanced differences in language and calculations.

To ensure you understand data you encounter, carefully consider whether the metric used is the relevant one, given what the author is trying to communicate. If you don’t feel it is, try to find more information about what might be missing. If you are presenting your own data, the safest route is to carefully explain the metrics being used and the motivation behind those metrics. Working through this explanation helps you as well, providing the opportunity to confirm you’re reporting information in the best way possible.

Test yourself: Take our data quiz (here or below)!

Related articles:

powered by Typeform

SAS logoAbout SAS: Through innovative analytics software and services, SAS helps customers around the world transform data into intelligence.