Appendices
Please provide the information below to view the online Verizon Data Breach Investigations Report.
Thank You.
Gracias.
You will soon receive an email with a link to confirm your access, or follow the link below.
Gracias.
You may now close this message and continue to your article.
Hello, and welcome first-time readers! Before you get started on the 2024 DBIR, it might be a good idea to take a look at this appendix first. We have been doing this report for a while now, and we appreciate that the verbiage we use can be a bit obtuse at times. We use very deliberate naming conventions, terms and definitions and spend a lot of time making sure we are consistent throughout the report. Hopefully this section will help make all of those more familiar.
Appendix A: How to read this report
VERIS Framework resources
The terms “threat actions,” “threat actors” and “varieties” will be referenced often. These are part of the Vocabulary for Event Recording and Incident Sharing (VERIS), a framework designed to allow for a consistent, unequivocal collection of security incident details. Here is how they should be interpreted:
Threat actor: Who is behind the event? This could be the external “bad guy” who launches a phishing campaign or an employee who leaves sensitive documents in their seat back pocket.
Threat action: What tactics (actions) were used to affect an asset? VERIS uses seven primary categories of threat actions: Malware, Hacking, Social, Misuse, Physical, Error and Environmental. Examples at a high level are hacking a server, installing malware or influencing human behavior through a social attack.
Variety: More specific enumerations of higher-level categories—e.g., classifying the external “bad guy” as an organized criminal group or recording a hacking action as SQL injection or brute force.
Learn more here:
https://github.com/vz-risk/dbir/tree/gh-pages/2024—includes DBIR facts, figures and figure data
https://verisframework.org—features information on the framework with examples and enumeration listings
https://github.com/vz-risk/veris—features information on the framework with examples and enumeration listings
Incident vs. breach
We talk a lot about incidents and breaches and we use the following definitions:
Incident: A security event that compromises the integrity, confidentiality or availability of an information asset.
Breach: An incident that results in the confirmed disclosure—not just potential exposure—of data to an unauthorized party. A Distributed Denial of Service (DDoS) attack, for instance, is most often an incident rather than a breach, since no data is exfiltrated. That doesn’t make it any less serious.
Industry labels
We align with the North American Industry Classification System (NAICS) standard to categorize the victim organizations in our corpus. The standard uses two- to six-digit codes to classify businesses and organizations. Our analysis is typically done at the two-digit level, and we will specify NAICS codes along with an industry label. For example, a chart with a label of Financial (52) is not indicative of 52 as a value. “52” is the NAICS code for the Financial and Insurance sector. The overall label of “Financial” is used for brevity within the figures. Detailed information on the codes and the classification system is available here:
Being confident of our data
Starting in 2019 with slanted bar charts, the DBIR has tried to make the point that the only certain thing about information security is that nothing is certain. Even with all the data we have, we’ll never know anything with absolute certainty. However, instead of throwing our hands up and complaining that it is impossible to measure anything in a data-poor environment or, worse yet, just plain making stuff up, we get to work. This year, you’ll continue to see the team representing uncertainty throughout the report figures.
The examples shown in Figures 80, 81, 82 and 83 and 4 all convey the range of realities that could credibly be true. Whether it be the slant of the bar chart, the threads of the spaghetti chart, the dots of the dot plot or the color of the pictogram plot, all convey the uncertainty of our industry in their own special way.
The slanted bar chart will be familiar to returning readers. The slant on the bar chart represents the uncertainty of that data point to a 95% confidence level (which is standard for statistical testing).
In layman’s terms, if the slanted areas of two (or more) bars overlap, you can’t really say one is bigger than the other without angering the math gods.
Much like the slanted bar chart, the spaghetti chart represents the same concept: the possible values that exist within the confidence interval; however, it’s slightly more involved because we have the added element of time. The individual threads represent a sample of all possible connections between the points that exists within each observation’s confidence interval. As you can see, some of the threads are looser than others, indicating a wider confidence internal and a smaller sample size.
The dot plot is another returning champion, and the trick to understanding this chart is to remember that the dots represent organizations. If, for instance, there are 200 dots (like in Figure 82), each dot represents 0.5% of organizations. This is a much better way of understanding how something is distributed among organizations and provides considerably more information than an average or a median. We added more colors and callouts to those in an attempt to make them even more informative.
The pictogram plot, our relative newcomer, attempts to capture uncertainty in a similar way to slanted bar charts but is more suited for a single proportion.
We hope they make your journey through this complex dataset even smoother than previous years.
Appendix B: Methodology
One of the things readers value most about this report is the level of rigor and integrity we employ when collecting, analyzing and presenting data. Knowing our readership cares about such things and consumes this information with a keen eye helps keep us honest. Detailing our methods is an important part of that honesty.
First, we make mistakes. A column transposed here, a number not updated there. We’re likely to discover a few things to fix. When we do, we’ll list them on our corrections page: https://verizon.com/business/resources/reports/dbir/2024/corrections.
Second, science comes in two flavors: creative exploration and causal hypothesis testing. The DBIR is squarely in the former. While we may not be perfect, we believe we provide the best obtainable version of the truth (to a given level of confidence and under the influence of biases acknowledged below). However, proving causality is best left to randomized control trials. The best we can do is correlation. And while correlation is not causation, they are often related to some extent and often useful.
Non-committal disclaimer
We would like to reiterate that we make no claim that the findings of this report are representative of all data breaches in all organizations at all times. Even though we believe the combined records from all our contributors more closely reflect reality than any of them in isolation, it is still a sample. And although we believe many of the findings presented in this report to be appropriate for generalization (and our conviction in this grows as we gather more data and compare it to that of others), bias exists.
The DBIR process
Our overall process remains intact and largely unchanged from previous years.104 All incidents included in this report were reviewed and converted (if necessary) into the VERIS framework to create a common, anonymous aggregate dataset. If you are unfamiliar with the VERIS framework, it is short for Vocabulary for Event Recording and Incident Sharing, it is free to use, and links to VERIS resources appear throughout this report.
The collection method and conversion techniques differed between contributors. In general, three basic methods (expounded below) were used to accomplish this:
Direct recording of paid external forensic investigations and related intelligence operations conducted by Verizon using the VERIS Webapp
Direct recording by partners using VERIS
Converting partners’ existing schema into VERIS
All contributors received instruction to omit any information that might identify organizations or individuals involved.
Some source spreadsheets are converted to our standard spreadsheet formatted through automated mapping to ensure consistent conversion. Reviewed spreadsheets and VERIS Webapp JavaScript Object Notation (JSON) are ingested by an automated workflow that converts the incidents and breaches within into the VERIS JSON format as necessary, adds missing enumerations, and then validates the record against business logic and the VERIS schema. The automated workflow subsets the data and analyzes the results. Based on the results of this exploratory analysis, the validation logs from the workflow and discussions with the partners providing the data, the data is cleaned and reanalyzed. This process runs nightly for roughly two months as data is collected and analyzed.
104 As does this sentence
Incident data
Our data is non-exclusively multinomial, meaning that a single feature, such as “Action,” can have multiple values (i.e., “Social,” “Malware” and “Hacking”). This means that percentages do not necessarily add up to 100%. For example, if there are five botnet breaches, the sample size is five. However, since each botnet used phishing, installed keyloggers and used stolen credentials, there would be five Social actions, five Hacking actions and five Malware actions, adding up to 300%. This is normal, expected and handled correctly in our analysis and tooling.
Another important point is that when looking at the findings, “unknown” is equivalent to “unmeasured.” Which is to say that if a record (or collection of records) contains elements that have been marked as “unknown” (whether it is something as basic as the number of records involved in the incident or as complex as what specific capabilities a piece of malware contained), it means that we cannot make statements about that particular element as it stands in the record—we cannot measure where we have too little information. Because they are unmeasured, they are not counted in sample sizes. The enumeration “Other,” however, is counted because it means that the value was known but not part of VERIS (or not one of the other bars if found in a bar chart). Finally, “Not Applicable” (normally “n/a”) may be counted or not counted depending on the claim being analyzed.
This year we have made liberal use of confidence intervals to allow us to analyze smaller sample sizes. We have adopted a few rules to help minimize bias in reading such data. Here we define “small sample” as less than 30 samples.
Sample sizes smaller than five are too small to analyze.
We won’t talk about count or percentage for small samples. This goes for figures too and is why some figures lack the dot for the median frequency.
For small samples, we may talk about the value being in some range or values being greater/less than each other. These all follow the confidence interval approaches listed above.
Incident eligibility
For a potential entry to be eligible for the incident/breach corpus, a couple of requirements must be met. The entry must be a confirmed security incident defined as a loss of confidentiality, integrity or availability. In addition to meeting the baseline definition of “security incident,” the entry is assessed for quality. We create a subset of incidents (more on subsets later) that pass our quality filter. The details of what is a “quality” incident are:
The incident must have at least seven enumerations (e.g., threat actor variety, threat action category, variety of integrity loss, et al.) across 34 fields OR be a DDoS attack. Exceptions are given to confirmed data breaches with less than seven enumerations.
The incident must have at least one known VERIS threat action category (Hacking, Malware, etc.).
In addition to having the level of details necessary to pass the quality filter, the incident must be within the time frame of analysis (November 1, 2022, to October 31, 2023, for this report). The 2023 caseload is the primary analytical focus of the report, but the entire range of data is referenced throughout, notably in trending graphs. We also exclude incidents and breaches affecting individuals that cannot be tied to an organizational attribute loss. If your friend’s laptop was hit with Trickbot, it would not be included in this report.
Lastly, for something to be eligible for inclusion into the DBIR, we have to know about it, which brings us to several potential biases we will discuss below.
Acknowledgment and analysis of bias
Many breaches go unreported (though our sample does contain many of those). Many more are as yet unknown by the victim (and thereby unknown to us). Therefore, until we (or someone) can conduct an exhaustive census of every breach that happens in the entire world each year (our study population), we must use sampling. Unfortunately, this process introduces bias.
The first type of bias is random bias introduced by sampling. This year, our maximum confidence is +/- 0.5% for incidents and +/- 0.8% for breaches, which is related to our sample size. Any subset with a smaller sample size is going to have a wider confidence margin. We’ve expressed this confidence in the complementary cumulative density (slanted) bar charts, hypothetical outcome plot (spaghetti) line charts and quantile dot plots.
The second source of bias is sampling bias. We strive for “the best obtainable version of the truth” by collecting breaches from a wide variety of contributors. Still, it is clear that we conduct biased sampling. For instance, some breaches, such as those publicly disclosed, are more likely to enter our corpus, while others, such as classified breaches, are less likely.
The four figures above are an attempt to visualize potential sampling bias. Each radial axis is a VERIS enumeration, and we have stacked bar charts representing our data contributors. Ideally, we want the distribution of sources to be roughly equal on the stacked bar charts along all axes. Axes only represented by a single source are more likely to be biased. However, contributions are inherently thick tailed, with a few contributors providing a lot of data and a lot of contributors providing a few records within a certain area. Still, we mostly see that most axes have multiple large contributors with small contributors adding appreciably to the total incidents along that axis.
You’ll notice rather large contributions on many of the axes. While we’d generally be concerned about this, they represent contributions aggregating several other sources, not actual single contributions. It also occurs along most axes, limiting the bias introduced by that grouping of indirect contributors.
The third source of bias is confirmation bias. Because we use our entire dataset for exploratory analysis, we cannot test specific hypotheses. Until we develop a collection method for data breaches beyond a sample of convenience, this is probably the best that can be done.
As stated above, we attempt to mitigate these biases by collecting data from diverse contributors. We follow a consistent multiple-review process, and when we hear hooves, we think horses, not zebras.105 We also try to review findings with subject matter experts in the specific areas ahead of release.
105 A unique finding is more likely to be something mundane, such as a data collection issue, than an unexpected result.
Data subsets
We already mentioned the subset of incidents that passed our quality requirements, but as part of our analysis, there are other instances where we define subsets of data. These subsets consist of legitimate incidents that would eclipse smaller trends if left in. These are removed and analyzed separately, though may not be written about if no relevant findings were, well, found. This year we have two subsets of legitimate incidents that are not analyzed as part of the overall corpus:
We separately analyzed a subset of web servers that were identified as secondary targets (such as taking over a website to spread malware).
We separately analyzed botnet-related incidents.
Both subsets were separated the last seven years as well.
Finally, we create some subsets to help further our analysis. In particular, a single subset is used for all analysis within the DBIR unless otherwise stated. It includes only quality incidents as described above and excludes the aforementioned two subsets.
Non-incident data
Since the 2015 issue, the DBIR includes data that requires analysis that did not fit into our usual categories of “incident” or “breach.” Examples of non-incident data include malware, patching, phishing and DDoS. The sample sizes for non-incident data tend to be much larger than the incident data but from fewer sources. We make every effort to normalize the data (for example weighing records by the number contributed from the organization so all organizations are represented equally). We also attempt to combine multiple partners with similar data to conduct the analysis wherever possible. Once analysis is complete, we try to discuss our findings with the relevant partner or partners so as to validate it against their knowledge of the data.
Appendix C: U.S. Secret Service
By Assistant Director Brian Lambert and Assistant Special Agent in Charge Krzysztof Bossowski, United States Secret Service
Combating Cybercrime Amid Technological Change
The U.S. Secret Service worked to combat fraud through traditional methods while identifying new threats driven by emerging technology in 2023. Ransomware continued to feature prominently in data breaches impacting U.S. companies. Meanwhile, transnational cybercriminals were increasingly successful in finding innovative ways to enable their fraud schemes. Artificial Intelligence (AI) captured the world’s attention and imagination, and cybercriminals were among the early adopters. The Secret Service investigated numerous cybercriminals experimenting with these generative new tools to commit fraud. In response, the agency also partnered with the same technology companies these fraudsters relied upon for their schemes. This proved a valuable strategy to detect scams and hold bad actors accountable.
The Secret Service is built on a foundation of protecting the integrity of our nation’s financial system. The agency was created in 1865 to address a surge in counterfeiting following the Civil War. Today, the agency continues to fight counterfeiting while also battling computer fraud and abuse, bank fraud, payment card fraud, identity theft, financial extortion, wire fraud, and more. Additionally, the Secret Service is charged with providing investigative assistance to local law enforcement and the National Center for Missing & Exploited Children. The continued success of the Secret Service’s investigative mission depends on partnerships with law enforcement agencies and private sector experts. The Secret Service operates a network of Cyber Fraud Task Forces (CFTF) throughout the country, which fosters these interactions with our partners. Long-term partnerships are the best mechanism to prevent and mitigate cybercrime.
The use of ransomware to exploit businesses again played a significant role in major data breaches. The criminal organizations behind these attacks heavily leveraged the crime-as-a-service business model, including threatening to publish stolen data. The Secret Service, alongside its law enforcement and private sector partners, fought against these criminals. The team approach foiled several ransomware campaigns and protected a number of targeted American companies and organizations. Agents also infiltrated these criminal organizations and developed tangible information for IT administrators. This enabled IT teams to implement countermeasures to protect their corporate infrastructure, significantly reducing data breaches and financial losses. Industry reports on ransomware show mixed trends in the prevalence and revenue generated through ransomware scams in 2023. Our work continues as we strive to end the profitability of such schemes.
Generative AI remains a hot topic. ChatGPT became a technological hit in January 2023 with 100 million registered active users. Legitimate customers used the AI tool to write papers and answer questions. But within weeks, criminals also leveraged AI tools in fraud and extortion schemes. For example, a Secret Service investigation led to the arrest of a group of individuals who used AI-powered translation tools. These individuals did not speak English or have any advanced computer skills. Yet, these bad actors used the new tools to create transnational romance and extortion plots to defraud victims of millions of dollars. The victims in these cases were not aware the translation was taking place or even that they were interacting with someone in a foreign country.
To stay ahead of the criminal element, the Secret Service is increasingly partnering with technology companies to ensure new technology aids in preventing—rather than enabling—crime. This includes measures that companies can implement to detect misuse of their tools and explore how these technologies can appropriately aid investigations. For example, our research teams and investigators increasingly face difficulty analyzing large digital data sets. However, new data analytic techniques can significantly improve our ability to detect and address illicit activity. These new techniques were used successfully in investigating a large-scale fraud scheme impacting the state of California. Within a few weeks of work on this case, investigators identified patterns in the fraud schemes that resulted in Secret Service agents arresting five criminals withdrawing tens of thousands of dollars from ATMs using information stolen from California-based users of Electronic Benefit Transfer (EBT) cards.106 This case demonstrated how new data tools aid in analysis and have the potential to quickly detect and address illicit activity in both the public and private sectors.
Whether battling ransomware, credit card fraud, or protecting minors from online child predators, the Secret Service works to stay on the cutting edge of technology. New technology enables criminals and investigators alike, and our private sector and law enforcement partnerships are the key to detecting and preventing illicit activity. Our network of Cyber Fraud Task Forces will continue to foster regular interaction with our partners to promote the prevention and mitigation of cybercrime with the critical goal of protecting America’s financial interests. Working together, we can identify and implement ways to use technology effectively to prevent crime.
Appendix D: Using the VERIS Community Database (VCDB) to Estimate Risk
By HALOCK Security Labs and the Center for Internet Security (CIS)
The VCDB was a leap forward in incident sharing. For CIS and HALOCK it’s been a solid foundation for risk analysis. One of the biggest challenges in conducting risk assessments is estimating the likelihood that an incident will occur. The VCDB contains a lot of structured incident data, so we were sure we could use it to somehow help us solve that challenge.
When we started exploring the VCDB together, it held about 7,500 incident records—each with about 2,500 data points—telling us how each incident occurred. But that’s almost 19 million data points! How could we shape that data to help the CIS community estimate risks?
We experimented and discovered many useful aggregations that brought shape and meaning to the mass of recorded incidents. By focusing on the attack varieties in the recordset, we could see how commonly (or uncommonly) certain attacks were used. Shifting our attention to attack vectors or vulnerabilities helped us understand how certain weaknesses have contributed to incidents. Aggregating data based on industries (right down to the NAICS codes) showed how attack methods are correlated to the distribution of assets that are common in types of organizations.
We realized that the data could be shaped to answer more complex questions, like what industries are more or less susceptible to which kinds of attacks, or what attack methods are most or least commonly associated with which asset classes. If you were patient and skilled you could also find out what kinds of attacks trended higher or lower year-over-year, or which assets and methods are most frequently correlated with each other in attacks.
If your heart rate went up while reading that previous paragraph, then you’re our kind of people. But as much fun as we were having, we had to focus on our purpose: find the simplest way to model risk probability for the widest population.
We settled on a simple correlation between the VCDB data and the CIS Controls when we noticed how commonly certain asset classes were exploited in attacks. Because the CIS Controls safeguards are associated with asset classes and the VCDB shows the assets involved in each incident, we could tie the VCDB incidents to the CIS safeguards that would help prevent types of attacks. We were then able to bake that into our risk assessment method, CIS RAM,107 to help enterprises estimate the likelihood portion of their risk analysis. The more commonly an asset appeared in incident records, the more likely it would be the cause of an eventual incident, unless its corresponding safeguards were strong. This insight became our “Expectancy” score to automatically estimate risk likelihood.
These two diagrams illustrate that Expectancy correlation. Figure 88 depicts a correlation between the commonality of an asset in the VCDB and the maturity of a CIS Controls safeguard that would protect that asset. A low asset commonality matched with a high maturity control would make the expectancy score low (in this illustration, ‘2’ out of ‘5’).
Conversely, Figure 89 shows how a high Expectancy score would result from a high asset commonality and a low control maturity.
If we stated this correlation in plain language, we would say that the more commonly an asset is compromised, the more capable our controls for that asset should be.
But no risk analysis is complete without also considering the impact of an incident. CIS RAM uses additional methods to help enterprises estimate impact scores, so when paired with the Expectancy scores, they have evidence-based risk analysis. And in the spirit of the VCDB community, CIS RAM could freely provide that analysis to anyone who needs it.
Risk analysts might wonder about our use of the word “expectancy” rather than “likelihood” or “probability.” This was a careful choice driven by what the VCDB can tell us.
The word “probability” is best suited for statistical analysis that results in a calculated percentage range or value within a time period (e.g. “between a 12% and 22% chance,” or “12% probability in a year”). “Likelihood” is typically used more colloquially or for less rigorous estimation processes (“very likely,” “not likely”, etc.) but still implies a time period or frequency.
The Expectancy score, however, does not consider a time frame. It says that we accept that an incident of some kind will occur, and that the higher the Expectancy score, the more we expect that asset and control to be involved. The lower the Expectancy score, the less we expect the asset and control to be involved.
This helps each enterprise prioritize the improvement of safeguards that could reduce risk the most.
Our correlation is not the only way that organizations can use the VCDB to estimate the likelihood of attacks. Even CIS and HALOCK use our own aggregations of the data given our different purposes. Consider how you would manage your cyber security program if you knew what attack methods were most common in your industry, or what attack methods correspond to what assets, or what was trending higher over time.
Take time to explore the VCDB for your risk analysis uses. You’ll be impressed with what you find.
The VERIS Community Database
https://verisframework.org/vcdb.html
Appendix E: Contributing organizations
A
Akamai Technologies
Ankura
Apura Cyber Intelligence
B
Balbix
bit-x-bit
Bitsight
BlackBerry
C
Censys, Inc.
Center for Internet Security (CIS)
Cequence Security
CERT Division of Carnegie Mellon University’s Software Engineering Institute
CERT – European Union (CERT-EU)
CERT Polska
Check Point Software Technologies Ltd.
Chubb
City of London Police
Coalition
Coveware
Cowbell Cyber Inc.
CrowdStrike
Cyber Security Agency of Singapore
Cybersecurity and Infrastructure Security Agency (CISA)
CyberSecurity Malaysia, an agency under the Ministry of Communications and Multimedia (KKMM)
Cybersixgill
CYBIR
Cyentia Institute
D
Defense Counterintelligence and Security Agency (DCSA)
DomainTools
E
Edgescan
Emergence Insurance
EUROCONTROL
EVIDEN
F
Federal Bureau of Investigation – Internet Crime Complaint Center (FBI IC3)
G
Global Resilience Federation
GreyNoise
H
Halcyon
HALOCK Security Labs
I
Information Commissioner’s Office (ICO)
Irish Reporting and Information Security Service (IRISS-CERT)
Ivanti
J
JPCERT/CC
K
K-12 Security Information Exchange (K-12 SIX)
Kaspersky
KnowBe4
KordaMentha
L
Legal Services Information Sharing and Analysis Organization (LS-ISAO)
M
Maritime Transportation System ISAC (MTS-ISAC)
Mimecast
mnemonic
No
National Crime Agency
National Cyber-Forensics & Training Alliance (NCFTA)
National Fraud Intelligence Bureau
NetDiligence®
NETSCOUT
O
Okta
OpenText Cybersecurity
P
Palo Alto Networks
Q
Qualys
R
Recorded Future, Inc.
Resilience
ReversingLabs
S
S21sec by Thales
Securin, Inc.
SecurityTrails, a Recorded Future Company
Shadowserver Foundation
Shodan
Sistemas Aplicativos
Sophos
Swisscom
U
U.S. Secret Service
V
VERIS Community Database
Verizon Cyber Risk Programs
Verizon Cyber Security Consulting
Verizon DDoS Defense
Verizon Network Operations and Engineering
Verizon Threat Research Advisory Center (VTRAC)
Vestige Digital Investigations
W
WatchGuard Technologies, Inc.
Z
Zscaler