Controlling the Untamed West of Machine Learning: Applicable Model Verification with Sigstore
Teaming up with NVIDIA and HiddenLayer, as a segment of the Open Source Security Foundation, we are introducing the inaugural stable edition of our model signing toolkit. Employing electronic endorsements akin to those provided by Sigstore, we permit users to validate that the model employed by the application is indeed the precise model originated by the developers. In this article, we will elucidate the significance of this release from the perspective of Google.
With the emergence of LLMs, the arena of machine learning has stepped into a phase of swift progression. We have witnessed impressive advancements culminating in weekly deployments of diverse applications that assimilate ML models to execute operations encompassing customer service, software engineering, and even the execution of security-critical functions.
Nevertheless, this has also ushered in a fresh wave of security vulnerabilities. Model and data manipulation, indirect injection of prompts, leaking prompts plus dodge maneuver have surfaced as some of the dangers recently highlighted in the media. Less noticeable is the threat posed by the ML supply chain process: given that models consist of an opaque set of weights (sometimes accompanied by arbitrary code), an attacker can interfere with them and cause significant disruptions to the users of these models. Users, developers, and professionals must ponder a crucial query in their risk evaluation process: “is this model reliable?”
Since its inception, Google’s Secure AI Framework (SAIF) has introduced guidance and technological solutions for developing trustworthy AI applications. An initial step towards instilling trust in the model involves enabling users to validate its authenticity and origin, in order to prevent meddling throughout all stages from training to deployment, utilizing cryptographic signing.
The AI supply chain
To comprehend the necessity of the model authentication initiative, let’s explore the manner in which AI-driven applications are constructed, focusing on possible points where malicious interference might take place.
Applications leveraging advanced AI models typically undergo development in a minimum of three distinct phases. Initially, a foundational model is trained on extensive datasets. Subsequently, a specialized ML team refines the model to enhance its performance on particular application tasks. Finally, this refined model is integrated into an application.
Creating an application implementing large language models involves three key steps.
Typically, these three phases are managed by distinct teams, and may even involve separate organizations due to the specialized skills needed at each stage. Transitioning models between phases is facilitated through the use of model repositories, which serve as storages for the models. Kaggle and HuggingFace are well-known open source alternatives, with the option of using internal repositories also available.
The division into distinct phases introduces various possibilities for interference by a malicious individual (or an external adversary who has infiltrated the internal systems) in the model. This interference can range from subtle modifications to the model’s parameters affecting its behavior, to inserting covert architecture vulnerabilities — entirely new functions and features that only activate under certain conditions. It’s also feasible to manipulate the serialization format and introduce unauthorized code execution within the model as stored on disk — our AI supply chain integrity whitepaper delves deeper into how common serialization libraries for models could be exploited. The subsequent illustration outlines the vulnerabilities throughout the machine learning supply chain for crafting a singular model, as detailed in the whitepaper.

A diagram showcasing the supply chain process of constructing a single model, highlighting certain risks within the supply chain (oval labels) and how model authentication can provide defense mechanisms against them (check marks)
The illustration demonstrates various points where the model may face vulnerabilities. Most of these could be precluded by endorsing the model during its training phase and confirming its integrity before any utilization, at each instance: validation of the signature should take place when the model is uploaded to a model repository, when it is chosen for deployment in an application (either embedded or through remote APIs), and when it acts as an intermediary in another training session. Assuming the trustworthiness of the training infrastructure and its non-compromised state, adopting this method ensures that every user of the model can rely on its authenticity.
Sigstore for Machine Learning models
The concept of signing models draws inspiration from code authentication, a crucial phase in conventional software development. A verified binary component aids in user identification of its originator and deters tampering post-publication. Nevertheless, the average developer is disinclined to oversee keys and their periodic rotation in the event of compromise.
These obstacles are tackled through the utilization of Sigstore, an assemblage of utilities and amenities that ensure secure and straightforward code signing. By tethering an OpenID Connect token to an operator or developer identity, Sigstore eliminates the necessity to regulate or rotate enduring secrets. Moreover, the signing process is made visible, allowing for the examination of signatures on malevolent components in a public transparency registry, accessible by all. This guarantees the prevention of split-view attacks, ensuring that all users receive an identical model. These attributes are the rationale behind our endorsement of Sigstore’s signing mechanism as the primary methodology for authenticating Machine Learning models.
The open-source community is officially launching the stable v1.0 release of our model signing library as a Python bundle supporting Sigstore and traditional sign-off techniques. Especially designed to address the extensive size of ML models (often larger compared to standard software elements), this model signing library handles the authentication of models represented in a hierarchical structure. The package offers command-line utilities enabling users to sign and validate model signatures for individual models. Furthermore, the package can operate as a library that we intend to seamlessly integrate directly into the applications.
Express model center transfer flows along with in ML frameworks.
Upcoming objectives
We can perceive model verification as establishing the base of confidence in the ML environment. We visualize broadening this strategy to also encompass information sets and other ML-related objects. Subsequently, we intend to expand on endorsements, towards utterly tamper-resistant metadata records, that can be interpreted by both individuals and machines. This holds the potential to mechanize a notable portion of the effort required to execute incident response in the event of a breach in the ML realm. Ideally, an ML developer wouldn’t need to make any coding alterations to the training program, while the framework itself would manage model verification and validation in a lucid way.
Should you have an interest in the future of this project, participate in the OpenSSF gatherings connected to the project. To mold the future of constructing tamper-resistant ML, engage in the Coalition for Secure AI, where we are organizing to work on forming the entire trust ecosystem collaboratively with the open source community. Teaming up with various industry associates, we are initiating a special interest cluster under CoSAI for delineating the future of ML validation and encompassing tamper-resistant ML metadata, such as model cards and evaluation outcomes.

