A recent report reveals that the amount of confidential data stored by organizations in non-production environments such as development, testing, analytics, and AI/ML is on the rise. Executives are growing more anxious about safeguarding this data, especially with the integration of new AI products exacerbating the situation.
The “Delphix 2024 State of Data Compliance and Security Report” disclosed that 74% of entities managing confidential data have augmented the amount stored in non-production environments over the past year. Additionally, 91% express concern about the heightened exposure, which exposes them to the risks of breaches and penalties for non-compliance.
The surge in the volume of consumer data retained by organizations stems from the increasing online customer base and ongoing digital evolution endeavors. According to IDC projections, by 2025, the global datasphere will expand to 163 zettabytes, a tenfold rise from the 16.1 zettabytes generated in 2016.
Consequently, the quantity of sensitive data like personally identifiable information, protected health records, and financial specifics being stored is also climbing.
Typically, sensitive data is generated and stored in production environments like CRM or ERP systems, which boast stringent controls and restricted access. Nevertheless, standard IT operations often entail duplicating data multiple times into non-production settings, granting more employees access and escalating the risk of breaches.
The report’s conclusions were drawn from a poll of 250 top-tier employees at organizations with a minimum of 5,000 personnel handling sensitive consumer data. The survey was conducted by software provider Perforce.
SEE: National Public Data Breach: 2.7bn Records Leaked on Dark Web
Over 50% of Corporations Have Faced Data Breaches
More than half of the respondents admitted to experiencing a breach involving sensitive data stored in non-production environments.
Further evidence suggests a deteriorating scenario: an Apple study revealed a 20% surge in data breaches between 2022 and 2023. Notably, 61% of Americans acknowledged experiencing compromises or breaches of their personal data at some point.
As per the Perforce report, 42% of the participating organizations have encountered ransomware attacks. This form of malware is a growing threat worldwide; a recent study by Malwarebytes indicated that global ransomware attacks surged by 33% in the past year.
An escalating concern is that global supply chains are lengthening and becoming more intricate, thereby increasing the potential entry points for cyber attackers. A report from the Identity Theft Resource Center showcased a significant surge of more than 2,600 percentage points in organizations affected by supply chain attacks between 2018 and 2023. Furthermore, payouts exceeded $1 billion (£790 million) for the first time in 2023, rendering it an increasingly profitable exploit for malefactors.
AI: The Primary Contributor to Insecure Consumer Data
With the integration of AI into business workflows increasing, monitoring data flows is becoming a formidable challenge.
AI systems often necessitate sensitive consumer data for training and operation. The intricate algorithms and potential integration with external systems introduce new attack vectors that are strenuous to regulate. Notably, the report points out that AI and ML are the primary drivers of the rising volume of sensitive data in non-production environments, according to 60% of respondents.
“AI environments may have weaker governance and protection compared to production environments,” noted the report’s authors, highlighting their susceptibility to compromise.
Concerned about potential risks, 85% of business decision-makers express anxieties about non-compliance with regulations in AI ecosystems. Though many regulations specific to AI are in nascent stages, GDPR dictates that personal data used in AI applications must be lawfully and transparently processed, with various state-level laws in the U.S. also applicable.
SEE: AI Executive Order: White House Releases 90-Day Progress Report
The implementation of the E.U. AI Act in August imposed stringent regulations on AI usage for facial recognition and instituted safeguards for general AI systems. Companies breaching these regulations face fines ranging from €35 million ($38 million USD) or 7% of annual global turnover to €7.5 million ($8.1 million USD) or 1.5% of turnover, based on the violation and company size. It is anticipated that similar AI-centric regulations will emerge in other regions soon.
Over 80% of respondents in the Perforce study expressed apprehensions about various issues related to sensitive data in AI environments, including utilizing poor quality data for AI models, re-identification of personal data, and theft of model training data, which can encompass intellectual property and trade secrets.
Firms are Alarmed by the Financial Implications of Insecure Data
One of the primary concerns compelling large enterprises to fret about data security is the potential hefty fines for non-compliance. Given the expanding regulations governing consumer data, like GDPR and HIPAA, compliance can be challenging due to evolving and intricate requirements.
Several regulations, such as GDPR, impose penalties based on annual revenue, thereby subjecting larger entities to more substantial fines. According to the Perforce report, 43% of
However, the expenses of a data breach extend beyond just the penalty, as part of the revenue loss stems from suspended operations. A recent study by Splunk revealed that the primary cause of downtime incidents was errors by cybersecurity-related human factors, such as engaging with a phishing link..
Unplanned downtime costs the largest enterprises globally around $400 billion annually, which stems from factors like direct revenue reduction, decreased shareholder value, stalled productivity, and harm to reputation. Indeed, it is anticipated that ransomware damages could surpass $265 billion by 2031.
As indicated by IBM, the average cost of a data breach in 2024 is $4.88 million, a 10% increase from 2023. The report from the renowned tech company also highlighted that 40% of breaches involved data stored across various environments, like public cloud and on-premises setups, costing more than $5 million on average and requiring the longest time to detect and contain. This underscores the concerns of business leaders regarding data proliferation.
SEE: Nearly 10 Billion Passwords Leaked in Biggest Compilation of All Time
Options for securing data in non-production settings that require significant resources
Various methods can be used to protect data stored in non-production environments, such as obscuring sensitive data. Nevertheless, according to the Perforce report, organizations have multiple reasons for their reluctance to implement these measures. These include the perceived complexity and time-consuming nature of the process, as well as concerns about potential organizational slowdowns.
- Almost one-third worry that it may hinder software development since securely replicating production databases to non-production environments could take weeks.
- 36% believe that masked data may be unrealistic and, therefore, affect software quality.
- 38% posit that stringent security measures could impede the company’s ability to track and adhere to regulations.
The report also revealed that 86% of companies allow exceptions to data compliance in non-production environments to avoid the challenges of secure storage. These exceptions may involve using a limited dataset, data minimization, or obtaining consent from data subjects.
Ways to secure sensitive data in non-production settings
The Perforce team outlined the top four strategies for businesses to safeguard their sensitive data in non-production environments:
- Fixed data masking: Permanently substituting sensitive values with fictitious yet plausible equivalents.
- Data loss prevention (DLP): A security strategy that identifies potential breaches and theft and seeks to prevent them.
- Data encryption: Temporarily converting data into a code that only authorized users can access.
- Strict access control: A policy that classifies users by roles and other attributes and designs their dataset access based on these classifications.
The authors emphasized: “Protecting sensitive data can be challenging in general. The intricacies of AI/ML only add to that complexity.
“Tools specialized in safeguarding sensitive data in non-production environments – such as development, testing, and analytics – are well-placed to assist in the protection of your AI environment.”
