Securing Trained Models in Confidential Federated Learning

The post belongs to a sequence on confidentiality-centered federated learning. The collaborative series involves NIST and the UK government’s Responsible Technology Adoption Unit (RTA), which was previously known as the Centre for Data Ethics and Innovation. For further information and access to all previously published posts, visit the NIST’s Privacy Engineering Collaboration Space or RTA’s blog.

Previous entries in our series outlined strategies for safeguarding input privacy in confidentiality-centered federated learning concerning horizontally and vertically segmented data. To establish a comprehensive confidentiality-focused federated learning system, these methods should be integrated with an approach for result confidentiality to restrict the amount of identifiable information in the training data post-model training.

As detailed in the subsequent segment of our discussion on privacy vulnerabilities in federated learning, trained models have the potential to disclose significant information concerning their training data, including complete images and text excerpts.

Training with Enhanced Privacy Measures

The most robust form of result confidentiality is differential privacy, a structured privacy framework with broad applications; delve into NIST’s blog series for further insights on this subject, particularly the discussion on differentially private machine learning.

Techniques for differentially private machine learning introduce random noise during the training phase to thwart privacy breaches. This noise prevents the model from retaining detailed information from the training data, thereby safeguarding against the extraction of training data from the model later on. For instance, compliance with differential privacy during training successfully foiled the attempt by Carlini et al. to extract sensitive training information like social security numbers from trained language models.

Integrating Differential Privacy into Confidentiality-Centered Federated Learning

In a centralized training scenario, where training data is consolidated on a central server, the server can undertake training operations and introduce noise for differential privacy simultaneously. In confidentiality-driven federated learning, determining who should incorporate the noise and the appropriate method is more challenging.

For confidentiality-focused federated learning with horizontally partitioned data, the approach introduced by Kairouz et al. in a variation of the FedAvg method, as outlined in our fourth post. In this visualization, each participant conducts local training, incorporates a small dosage of random noise into their model update prior to aggregation with others. Correct noise addition by each participant ensures that the aggregated model comprises sufficient noise for differential privacy. This methodology ensures result confidentiality, even in the presence of a malicious aggregator. The Scarlet Pets group utilized a modified version of this technique in their victorious entry for the UK-US PETs Prize Challenges.

In situations involving vertically partitioned data, achieving differential privacy can be complex. The necessary noise for differential privacy should not be appended before entity alignment to prevent data attributes from properly aligning. Consequently, the noise must be added post-entity alignment, either by a trusted party or through methodologies like homomorphic encryption or multiparty computation.

Training Highly Accurate Models with Enhanced Privacy Measures

The random noise essential for differential privacy can impact model accuracy. While increased noise generally enhances privacy but detracts from accuracy, this tradeoff is commonly referred to as the privacy-utility tradeoff.

For certain machine learning models such as linear regression models, logistic regression models, and decision trees, navigating this tradeoff is straightforward – the earlier described approach is typically effective in training highly accurate models with differential privacy. Both the PPMLHuskies and Scarlet Pets teams in the UK-US PETs Prize Challenges utilized similar methodologies to train highly accurate models with differential privacy.

In the domain of neural networks and deep learning, the colossal size of the model presents challenges in employing differential privacy during training – larger models necessitate more noise for privacy, which can substantially diminish model accuracy. Although these models were not part of the UK-US PETs Prize Challenges, their relevance is growing across all generative AI applications, including extensive language models.

Recent findings have demonstrated that models pre-trained on publicly accessible data (devoid of differential privacy) and subsequently fine-tuned with differential privacy can achieve accuracy levels comparable to models trained sans differential privacy. Notably, Li et al. revealed that pre-trained language models can achieve near equivalent precision levels when fine-tuned with differential privacy as opposed to training without it. These results indicate the feasibility of confidentiality-centered federated learning that attains both privacy and utility in domains with publicly available data suitable for pre-training, such as language and image recognition models.

This strategy does not provide privacy protection for the public data involved in pre-training, necessitating compliance with relevant privacy and intellectual property rights (the legal and ethical dilemmas in this regard are beyond the scope of this blog series).

Upcoming

In our forthcoming publication, we will explore the challenges encountered when implementing confidentiality-centered federated learning in practical scenarios.

About Author

AndyC

Andy Curtis is an award-winning security consultant, researcher and public speaker. He has been working in the computer security industry since the early 1990s, having been employed by state and federal government, leading healthcare and banking providers across three continents. He has given talks about computer security for some of the world’s largest companies, worked with law enforcement agencies on investigations into hacking groups, and is a regular voice on TV and radio explaining IT security threats.

See author's posts