What Giant Data Breaches Mean for You

The security expert who created Have I Been Pwned? shares advice for protecting sensitive data

Illustration of a man with short blond hair, behind him is a cartoon-like character working at a computer and a bandit sitting atop a camera — Shideh Ghandeharizadeh

At some point in the past year, one or more hackers quietly breached a background-checking company called National Public Data (NPD) and exposed millions of U.S. Social Security numbers (SSNs), names, phone numbers, and e-mail and mailing addresses. The accuracy and significance of some of these data, much of which NPD had probably scraped from public governmental records, is questionable. The reality of the leak itself is not: in August, NPD acknowledged an incident involving “a third-party bad actor that was trying to hack into data in late December 2023,” according to a notice on the data-aggregating company’s website. The stolen information appears to have been put up for sale online beginning in the spring.

It is deeply unpleasant to imagine SSNs and other sensitive information circulating like so much digital plankton along the web’s darker currents. And what has spilled out can’t be recalled. So what does one do when this happens?

You can freeze your credit—preventing anyone from opening a new credit account in your name until the freeze is lifted—via the major reporting bureaus. (The three main agencies used in the U.S. are Equifax, Experian and TransUnion.) Such an incident offers a sobering reminder to practice good password hygiene going forward. Don’t reuse passwords—their complexity and uniqueness are powerful—and consider a manager such as 1Password. Americans can check whether their SSN was exposed via a tool such as Pentester’s NPD Breach Check. At the website Have I Been Pwned?, launched by Australia-based online security consultant Troy Hunt in 2013, visitors can see whether their e-mail addresses have been exposed in data breaches; an offshoot project similarly evaluates passwords.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

In its decade-plus of existence, Have I Been Pwned? has ballooned to include six billion unique e-mail addresses. Each account has been breached slightly more than twice on average. “I had no idea it would become big—I wouldn’t have given it such a stupid name,” Hunt says. To “pwn” (pronounced “pone” as a pun on “own”) someone is to utterly defeat them, per Internet slang that enjoyed a heyday in the early 2000s. It also means to take unauthorized control of someone’s computer hardware or, say, e-mail account.

As Hunt puts it, the risk of a data breach is simply a cold reality that comes with being online. If the Internet is the information superhighway, leaks are among the roadside wreckage. “It’s terrible that we have a road toll—and objectives toward zero are fantastic,” he says. “But while we hurtle around at 100 kilometers an hour in metal machines, this is what’s going to happen.” Hunt talked with Scientific American about how to make sense of the potentially alarming amount of data involved in massive breaches and what an increasingly online world means for our private information.

An edited transcript of the conversation follows.

Some major data breaches have been revealed this year, such as the AT&T breach, which exposed data from 73 million former and current customers. More recently, there was this National Public Data fiasco. One clear difference between those breaches is that NPD was this little-known data aggregator that sells services such as background checks—not a big, familiar corporation. Maybe there’s no blueprint for what a standard data breach is, but you tell me: Is this an unusual case?

I’m looking at the list of the big breaches in the Have I Been Pwned? database, and often they are from the likes of data aggregators. People don’t know who data aggregators are. Most of us, I think, are not too keen on the idea of data aggregation. We don’t like the idea of organizations siphoning up and monetizing our data—without our knowledge or informed consent. I’ll add that caveat.

When an incident like this happens, it’s quite frustrating for people because they’re going, “Who is this organization? Why do they have my data? What can I do?” And really, you don’t have any recourse.

There are a few standard suggestions for what to do after a breach, such as placing freezes with credit reporting bureaus and making sure your passwords are robust. Is there anything else people should be doing?

There’s no discrete thing you can do directly about this kind of incident. It’s not like when infidelity dating website Ashley Madison got breached, where you should change your password and probably have a conversation with your spouse or partner. In this case, it’s all the fundamental stuff you really should be doing anyway. You should have whatever freezes you have available on credit until you actually need to apply for it. For identity-theft monitoring services, you have to spend some money, but it’s not a bad idea. And then, of course, use strong, unique passwords and multifactor authentication.

Then just have that consciousness of “What are the things you should be looking out for that might indicate these data are being abused?”—phone calls from a bank, for example, asking about an application you’ve made that you have no idea about. The guidance doesn’t change because of this breach. It just reinforces it.

There were rumors of the NPD breach trickling around for a few months before it percolated into the mainstream media. When it first hit, some of the headlines described 2.9 billion hacked accounts, which was off base. The breach actually appears to consist of 2.9 billion or so rows of data. Also complicating matters, bad actors behind a leak or sale might not be trustworthy—they’ll boast and inflate file sizes or combine already exposed data from multiple breaches to make a leak look huge. How should we think critically about big, scary numbers in headlines about data breaches?

We’ve seen this so many times before. Over the summer we were seeing headlines of the biggest password dump ever, which had 10 billion records. But when bad actors include every word from the dictionary and every combination of it, does the average person have to worry about it? No!

Earlier this year there was another one going around. It’s called the “mother of all breaches.” That was 20 billion records or something. Well, it’s just that someone siphoned up a whole bunch of breaches and put them all together. Add one more, and now you’ve got the bigger mother of all breaches.

By the same token, the truth is always there in the data. The number of records in total is an important figure. But without the context of what that actually means, it’s hard to understand it. So I think a much fairer metric is how many people are impacted. And clearly, if it’s just U.S. Social Security numbers, it’s going to be in the low hundreds of millions at the absolute upper limit.

Do we know for sure that every American Social Security number is in the NPD breach?

No, we don’t know that for sure. Investigative journalist Brian Krebs has written some good stuff about this. There are lots of different places where these data might be published, and then they all get aggregated together. If you haven’t, for example, been arrested or ended up in a public record somewhere, then you may not be in there.

The thing that really frustrates me about this is that clearly NPD had a breach. I don’t think there’s any question about that anymore. And when you look at its disclosure notice, there’s basically nothing there. The company has really given us nothing of any substance.

[Editor’s note: Scientific American repeatedly e-mailed NPD to ask whether it had taken additional actions to contact affected individuals. The company did not respond. In a recorded message on its breach hotline, NPD says it “will try to notify you if there are further significant developments applicable to you.”]

You wrote about lackluster corporate disclosure in a June blog post entitled “The State of Data Breaches.” People might be surprised to know that data-breach laws can have exceptions related to notification. In Florida, where NPD is based, if a security breach involves more than 500,000 people, what is legally sufficient is a notice “in print and to broadcast media” and “a conspicuous notice on the Internet website of the covered entity.” How would you improve disclosure?

Disclosure doesn’t necessarily mean letting the individuals know. It’s usually disclosure to the regulator unless we’re talking about sensitive personal information—health data, for example. The carve-out in Australia is that there’s got to be a likelihood of causing serious harm. The Florida situation, like you just mentioned, is a notice. California’s got the California Consumer Privacy Act, but I believe even under that, companies can still decide whether or not to notify individuals in the vast majority of cases.

There are all of these people who get pissed when they don’t get told—like, really, really pissed, understandably! And I’m sitting here at Have I Been Pwned? going, “Well, I’ve got your data. I can let you know.” But it shouldn’t be my job, right? I should be completely redundant because organizations should notify people early.

You’ve worked on Have I Been Pwned? for more than a decade. But let’s look ahead. Where are data breaches headed in the next 10 years?

If you think about the factors that lead to data breaches or amplify data breaches, we’ll have more people. We’ll have more systems. We’ll have more devices that have data; we’ve seen a lot of Internet of Things–related data breaches. There are data collected from CloudPets toys—teddy bears that talk—in Have I Been Pwned? We’re also going down a path where we’re seeing a lot of breaches—such as all the ones from this year related to cloud data storage company Snowflake—where we’re so dependent on external services that a flaw or a practice on behalf of threat actors can get reapplied over and over and over again to everyone using that particular platform. So all these factors combine to amplify the problems we’re having now. In summary, I think it’s getting worse.