Key points from the report:
- Information enriched creation (IER) empowers organizations to construct tailor-made, effective, and economical applications based on confidential data. Nonetheless, studies unveil notable security concerns, like revealed vector storage and LLM-hosting platforms, which could result in data breaches, unauthorized entry, and potential system alteration if not adequately protected.
- Security challenges such as data validation glitches and denial-of-service assaults are widespread throughout IER elements. This is amplified by their swift development cycle, making it difficult to monitor and rectify vulnerabilities.
- Studies pinpointed 80 exposed llama.cpp servers, with 57 lacking authentication. These exposed servers were primarily found in the United States, followed by China, Germany, and France, showcasing global acceptance with varying security approaches.
- In addition to authentication, organizations need to incorporate TLS encryption and enforce zero-trust networking to ensure that inventive AI systems and their components are safeguarded against unauthorized access and manipulation.
“Innovate rapidly and innovate boldly” appears to be the prevailing mantra within the AI domain. Since the inception of ChatGPT in 2022, it seems that everyone is embracing this trend. While certain fields have been content with leveraging OpenAI’s services, many organizations have distinct requirements. As per Nick Turley, the head of product at OpenAI, LLMs serve as a “calculator for words,” opening up numerous opportunities for organizations. Nonetheless, effective utilization of this “word calculator” necessitates some engineering efforts. While awaiting advanced agentic AI systems, the current favored technology is information enriched creation (IER).
IER requires specific prerequisites for operation. It relies on a database of textual fragments and a mechanism for retrieval. Typically, a vector repository is employed for this purpose, storing text along with a set of numerical identifiers to locate the most pertinent textual segments. Armed with these components and an appropriate prompt, it is often possible to address queries or craft new texts stemming from private data sources, custom-fit for specific necessities. Evidently, IER is so potent that extensive language (LLM) models are not always imperative. To cut expenses and enhance response times, one can use in-house servers to host these condensed and lighter LLM models.
Analogously, the vector repository functions akin to a highly valuable librarian who not only selects relevant books but also highlights pertinent excerpts. Subsequently, the LLM acts as the researcher, utilizing these highlighted texts to compose documents or address queries. Together, they shape an IER application.
Vector repositories are not entirely novel but have experienced a revival in the past couple of years. While numerous hosted solutions like Pinecone exist, self-hosted alternatives such as ChromaDB or Weaviate (https://weaviate.io) are also available. These platforms offer developers the means to discover textual segments resembling input text, such as questions in need of solutions.
Self-hosting LLM models demands a considerable amount of memory and a robust GPU, requirements that cloud service providers can fulfill. For individuals equipped with capable laptops or PCs, LMStudio is a popular selection. In an enterprise context, llama.cpp and Ollama are frequently the primary choices. These solutions have witnessed rapid advancement rarely seen prior, hence encountering some inadvertent bugs.
Several of these imperfections in IER components manifest as conventional data validation discrepancies, like CVE-2024-37032 and CVE-2024-39720. Others trigger denial of service, resembling CVE-2024-39720 and CVE-2024-39721, or expose file existence, similar to CVE-2024-39719 and CVE-2024-39722. This list is non-exhaustive. llama.cpp has fewer documented issues, though CVE-2024-42479 was discovered this year, while CVE-2024-34359 affects the Python library reliant on llama.cpp. llama.cpp’s limited coverage may be due to its rapid update pace. Since its inception in March 2023, over 2,500 releases have been deployed, equating to approximately four releases daily. With such a dynamic target, identifying vulnerabilities becomes a challenging task.
In contrast, Ollama adheres to a more measured update cycle, with only 96 releases post-July 2023, averaging about one release per week. In comparison, Linux undergoes updates every few months, while Windows introduces fresh “Moments” quarterly.
ChromaDB, the vector repository, emerged in October 2022 and pushes updates approximately every two weeks. Remarkably, there are no directly linked CVEs. Weaviate, another vector repository, has exhibited security vulnerabilities (CVE-2023-38976 and CVE-2024-45846) when integrated with MindsDB. Weaviate has existed since 2019, establishing itself as an elder statesman within this technology suite while maintaining a weekly update schedule. The volatility of these update cycles bears fruit in the prompt resolution of discovered bugs, curtailing their exposure time.
LLMs, in isolation, might fall short of fulfilling all requirements and are registering incremental enhancements as public data sources dwindle. The future likely rests on an agentic AI realm, amalgamating LLMs, memory, tools, and workflows into more sophisticated AI-centric systems, championed by Andrew Ng. Essentially, this represents a new software development architecture, with LLMs and vector repositories retaining pivotal roles. Yet, along this trajectory, organizations risk exposure to system vulnerabilities if they neglect security measures.
The fear loomed that, in their rush, numerous developers might expose these systems to the web, prompting an investigation into the presence of selected IER components online in November 2024. The focus was on the four primary components integral to IER systems: llama.cpp, Ollama for LLM hosting, and ChromaDB and Weaviate for vector repository services.
Exposed llama.cpp Findings
llama.cpp serves as a host for a singular LLM model, operating as a REST service where the server communicates with clients via POST requests in compliance with the HTTP protocol. Our assessment revealed fluctuations in observed figures. However, the latest count revealed 80 exposed servers, with 57 exhibiting no apparent authentication measures. The likelihood exists that these figures are conservative, with potentially more servers flying under the radar yet vulnerable. The models hosted on llama.cpp servers predominantly emanated from Llama 3 roots, supplanted by Mistral models. While some were acknowledged jailbroken models, the majority were unfamiliar models, presumably refined for specific objectives.
