From cos(x+y) to GenAI Hallucinations: Why Zero Trust Needs a “Progressive Refinement Loop”

1. A School Identity Hidden Inside a 1 Km Circular Field
The other day, my son, Syon, was learning the angle-addition identity for cos⁡(x+y) and asked the familiar question that he always asks: where am I ever going to use this?
Physics is one answer. Engineering is another. But there is a stranger answer too, and we will come to it later.
Imagine a circular field of radius 1 km. You are standing at the center, and you want to reach a point on the boundary at an angle of x+y north of east. If the field is open, you can simply walk straight to the point.
But now imagine the city lays down a tilted street grid across the field. The main streets run at angle x, and the cross streets are perpendicular to them. You can no longer walk straight. You are forced to reach the same point by walking only along those two allowed directions.

Figure 1. The same endpoint on a 1 km circular field: first by a direct walk, then by a forced two-step walk on a tilted street grid.

The proof becomes simple if we compare only one quantity: how far east we end up.
Way 1: the direct walk. If we walk straight to the point at angle x+y on the unit circle, the eastward part of that 1 km walk is just the point’s x-coordinate: cos(x+y).
Way 2: the forced street grid walk. Relative to the tilted street direction, the same target point is now only at angle y. So the two forced legs have lengths cos y along the tilted street and sin y along the perpendicular street.
Eastward contribution of the first leg. A walk of length cos y along a road tilted by angle x contributes cos x cos y in the eastward direction.
Eastward contribution of the second leg. A walk of length sin y along the perpendicular street points partly backward relative to east, so its eastward contribution is −sin x sin y.
Put them together. Both routes reach the exact same endpoint, so their eastward distances must match.
cos(x+y) = cos x cos y − sin x sin y
That schoolbook identity is really a statement about two different ways of reaching the same place. And that is a useful mental model for the rest of this blog: the same underlying reality can be represented in more than one geometry.
2. When Words Become Vectors, Our Trig Identities Return
Modern AI does not begin with meanings. It begins with vectors.
A word like king, queen, man, or woman is first turned into an embedding: a point in a high-dimensional space. Once that happens, the model is no longer working with words in the ordinary sense. It is working with directions, distances, alignments, and differences between vectors.

Figure 2. Words are converted into embeddings: vectors in a high-dimensional space.3Blue1Brown, Transformers, the tech behind LLMs | Deep Learning Chapter 5https://www.3blue1brown.com/lessons/gpt

Once words are mapped into a vector space, nearby meanings often land near one another. A model does not need the exact same sentence every time. It can use geometric closeness to connect related ideas.

Figure 3. Nearby embeddings often correspond to nearby meanings.3Blue1Brown, Transformers, the tech behind LLMs | Deep Learning Chapter 5https://www.3blue1brown.com/lessons/gpt

And once we are talking about vectors, cosine returns in a new role. In Section 1, cosine measured the eastward part of a walk. Here, cosine measures alignment between vectors:
cos θ = (v · w) / (||v|| ||w||)
So even before we get to analogies, the same geometric instinct is already back in the story.

Figure 4. Cosine now measures how strongly two vectors align. 3Blue1Brown, Transformers, the tech behind LLMs | Deep Learning Chapter 5https://www.3blue1brown.com/lessons/gpt

But the geometry becomes more interesting still. Meaning is not only stored in where a word sits. It is also stored in the move from one concept to another.
The move from man to woman can itself be treated as a vector:
r⃗=E(woman)−E(man)r⃗ = E(woman) − E(man)
Now suppose the model has learned that this move is conceptually parallel to the move from king to queen. Then it can borrow that same geometric move and try it somewhere else. That is the idea behind the famous embedding relation:
E(queen)−E(king)≈E(woman)−E(man)E(queen) − E(king) ≈ E(woman) − E(man)

Figure 5. The move from king to queen is treated as similar to the move from man to woman.3Blue1Brown, Transformers, the tech behind LLMs | Deep Learning Chapter 5https://www.3blue1brown.com/lessons/gpt

This is where the old trigonometric picture quietly returns.
King and Queen define the plane. Let x be the global angle of Queen from the horizontal axis, and let y be the extra angle from Queen to King. So in the global frame, King sits at angle x+y.
Now bring in the borrowed relation vector r. Let θ be the angle that r makes with the Queen direction. Then its component along the Queen direction is rcos⁡ θ, and its perpendicular component is rsin⁡ θ.
So the local coordinates become:
along Queen direction: cos ⁡y + rcos⁡ (θ)perpendicular direction: sin⁡ y + rsin⁡ (θ)
Now project those local coordinates back into the global East–West and North–South axes.
Eastward distance:
E=(cos⁡y+rcos⁡θ)cos⁡x−(sin⁡y+rsin⁡θ)sin⁡xE=(cos⁡y+rcos⁡θ)cos⁡x-(sin⁡y+rsin⁡θ)sin⁡x=(cos⁡xcos⁡y−sin⁡xsin⁡y)+r(cos⁡xcos⁡θ−sin⁡xsin⁡θ)=(cos⁡x cos⁡y-sin⁡x sin⁡y)+r(cos⁡x cos⁡θ-sin⁡x sin⁡θ)=cos⁡(x+y)+rcos⁡(x+θ)=cos⁡(x+y)+rcos⁡(x+θ)
Northward distance:
N=(cos⁡y+rcos⁡θ)sin⁡x+(sin⁡y+rsin⁡θ)cos⁡xN=(cos⁡y+rcos⁡θ)sin⁡x+(sin⁡y+rsin⁡θ)cos⁡x=(sin⁡xcos⁡y+cos⁡xsin⁡y)+r(sin⁡xcos⁡θ+cos⁡xsin⁡θ)=(sin⁡x cos⁡y+cos⁡x sin⁡y)+r(sin⁡x cos⁡θ+cos⁡x sin⁡θ)=sin⁡(x+y)+rsin⁡(x+θ)=sin⁡(x+y)+rsin⁡(x+θ)
So the old identities come back naturally. The Queen direction contributes the x term, the King offset contributes the y term, and the borrowed relation vector contributes a second cosine–sine pair through its own angle θ, since it is not exactly perpendicular.
The old classroom identities
cos(x + y) = cos x cos y − sin x sin y
sin(x + y) = sin x cos y + cos x sin y
are not just about triangles on a homework sheet. They are one natural language for describing what happens when meaning becomes geometry, relations become vectors, and a borrowed move is broken into a useful component and a sideways error.
Once words become vectors, it is no longer surprising that trigonometry returns. It was already waiting there. And once you see that, it becomes less surprising that a real AI system solving a modular addition problem would rediscover the same identity for itself.
3. How AI Rediscovers Our Trig Identity
By now, trigonometry showing up inside AI should feel less strange. The more interesting surprise is how directly it shows up.
In mechanistic-interpretability work on grokking, researchers studying a small transformer trained on modular addition found that the model did not just memorize answers. It built wave-like internal structure, learned sine- and cosine-like components, and then combined them using the same identity we just met on the circular field:
cos(x+y) = cos x cos y − sin x sin y
That is a striking result. The model was not handed a trigonometry lesson. It was trained only to predict the right answer for modular addition.
And yet, after Section 2, it also feels a little less mysterious.
Once a model represents information geometrically, compares patterns through alignment, and learns reusable structure in vector space, trigonometric identities stop looking like random classroom artifacts. They start to look like one natural way geometry expresses itself.
Modular addition is exactly the kind of problem where that hidden geometry has room to emerge. Addition modulo n lives on a circle, like a clock with n ticks. So, the model does not have to learn “addition” first in the schoolbook sense. It can learn wave-like components for x and y, combine them, and recover the diagonal x+y structure through the old angle-addition identity.
In plain English: the model first learns rotations and then uses those rotations to learn addition.

Figure 6. Welch Labs visualization of grokking modular addition: wave-like components combine with the samecos(x+y) = cos x cos y – sin x sin y identity.Welch Labs, The most complex model we actually understandhttps://www.youtube.com/watch?v=D8GOeCFFby4

This is what makes the result feel so beautiful. The model has rediscovered a real piece of geometry. And it hints at a broader pattern: when a system has to reason about circular or periodic structure, the language of sine, cosine, and angle-addition identities can quietly reappear — even inside AI. But that does not mean geometry alone guarantees truth. And that is where the next part of the story begins.
4. Why Geometry Alone Can Still Hallucinate
Geometry is a big part of why modern AI works so well. Embeddings place related words, sentences, and concepts near one another in high-dimensional space, and cosine-style similarity helps the model judge what is semantically nearby.
But semantic nearness is not the same thing as truth.
That gap is one of the roots of hallucination.
A useful geometric taxonomy separates three different failure modes. Type I drifts away from the provided grounding. Type II wanders into semantically foreign territory. Type III is the dangerous one for our discussion: it stays inside the correct conceptual frame and still gets the fact wrong.

Figure 7. Type III stays inside the plausible semantic region while still being wrong.Javier Marín, A Geometric Taxonomy of Hallucinations in LLMs (arXiv:2602.13224)https://arxiv.org/abs/2602.13224

This is exactly the kind of failure practitioners see all the time. “The firewall blocked the attack” and “The firewall allowed the attack” live in almost the same conceptual neighborhood. “Saturn is the 6th planet” and “Saturn is the 7th planet” live in the same topic, the same frame, and almost the same wording. A retrieval system can even bring back a document chunk about a 2022 vulnerability for a question about a 2026 issue because the surrounding language looks semantically similar.
In every case, the answer is not nonsense. It is plausible, fluent, and near the target in semantic space — yet still wrong.
That is the limit of geometry alone. Type I and Type II often move away in ways geometry can help detect. Type III does not. It stays inside the plausible region. That is why cosine similarity and embedding geometry can be excellent signals for relevance, but they cannot, by themselves, be the final truth mechanism.
For low-stakes assistance, that may be acceptable. For high-stakes policy generation, it is not. In Zero Trust microsegmentation, a plausible answer can still break an application, preserve unnecessary lateral paths, or worse block legitimate paths. That is why ‘vanilla LLM’ generation cannot be the last step. It must be followed by verification.
5. From “Vanilla” LLM Generation to a Progressive Refinement Loop
Once we accept that semantic geometry alone cannot be the final authority, the architecture has to change.
A useful example comes from mathematics. In the Aletheia paper, the system is not treated as a single-shot answer engine. It is organized into three subagents — a Generator, a Verifier, and a Reviser — that keep interacting until the Verifier accepts the solution or the attempt is abandoned. The paper motivates this loop directly from the limits of current models: research-level mathematics exposes superficial understanding and hallucination in ways that cannot be solved by fluent generation alone.
The paper is also careful about how it talks about autonomy. It does not collapse everything into “AI solved it.” It introduces autonomy levels and human-AI interaction cards to document contribution and novelty more responsibly. That is a useful lesson beyond mathematics: once AI enters a high-stakes workflow, the surrounding artifacts and human checkpoints matter as much as the generated candidate itself.

Figure 8. Aletheia’s generator–verifier–reviser pattern: generate, verify, revise, or restart.Luong et al., Towards Autonomous Mathematics Research (Aletheia, arXiv:2602.10177v3)https://arxiv.org/html/2602.10177v3

That same architectural lesson matters in Zero Trust microsegmentation, but the domain-specific checks are different. Here the problem is not “does this proof hold?” It is “does this policy both preserve legitimate traffic and reduce breach impact?”
“In Zero Trust, plausible is not enough. The candidate policy has to be checked against business reality.”
That is why the Xshield loop begins not with policy writing, but with an attack scenario and a breach impact analyzer. The analyzer produces impact assessment artifacts and a reduction target.

Figure 9. Xshield’s progressive refinement loop: analyze, propose, evaluate, restart, refine, or implement.

From there, the system enters the progressive refinement loop: it proposes a candidate sub-segment with a policy layer and then evaluates it. If the proposal breaks legitimate traffic, it fails syntactically and must go back for revision. If it preserves traffic but only makes partial progress on breach-impact reduction, it is not discarded; it loops back as semantic progress and asks for proposal refinement (additional sub-segments). Only when the reduction target is safely achieved does the loop move to implementation.
That is the important parallel. Aletheia shows the general pattern: generation is separated from acceptance, and revision is built into the system. Xshield applies the same idea to Zero Trust policy generation, but with domain-specific evaluators tied to application behavior and breach reduction rather than mathematical correctness.
6. From cos(x + y) and Zero Trust
So where does cos(x+y) end up in real life?
It shows up in a school proof about reaching the same horizontal truth by taking a different geometric route. It shows up in the hidden layers of a neural network learning addition. And it shows up in the high-dimensional vector spaces where modern AI judges what is nearby, relevant, and semantically aligned.
But the same geometry that gives AI much of its generalized power also creates one of its deepest limits. Cosine similarity can tell a model what concepts are nearby. It cannot, by itself, guarantee what is factually true, operationally safe, or native to a live environment.
That is why, in Zero Trust microsegmentation, geometry alone is not enough. What matters is the loop around it: analyzers, artifacts, simulation, evaluation, refinement, and clear human checkpoints before implementation.
If this is a problem you are actively wrestling with, I will be discussing these ideas further at RSAC 2026 Conference in a short session at the ColorTokens booth:
Stop Trusting “Vanilla” LLMs: AI-Designed Microsegmentation That Won’t Break Your Business
Tuesday, March 24, 2026
11:00 – 11:30 AM PDT
Booth #1933, South Expo Hall, Moscone Center
We will walk through the Xshield architecture in more detail and show how a progressive refinement loop can help automate Zero Trust policy design without breaking legitimate business traffic.
If you would like to explore how this could work in your own environment, feel free to contact us.

The post From cos(x+y) to GenAI Hallucinations: Why Zero Trust Needs a “Progressive Refinement Loop” appeared first on ColorTokens.

*** This is a Security Bloggers Network syndicated blog from ColorTokens authored by Satyam Tyagi. Read the original post at: https://colortokens.com/blogs/microsegmentation-ai-hallucinations-rsac-2026/

About Author

AndyC

Andy Curtis is an award-winning security consultant, researcher and public speaker. He has been working in the computer security industry since the early 1990s, having been employed by state and federal government, leading healthcare and banking providers across three continents. He has given talks about computer security for some of the world’s largest companies, worked with law enforcement agencies on investigations into hacking groups, and is a regular voice on TV and radio explaining IT security threats.

See author's posts