
For digital marketers and SEO professionals, focusing on rankings alone is no longer enough. Understanding the infrastructure that powers large language models (LLMs) is the key to future-proofing visibility in this new search ecosystem.
Let’s break down how these models process content, where Retrieval-Augmented Generation (RAG) fits in, and what it all means for the shift from traditional SEO to Generative Engine Optimisation (GEO).
How LLMs actually process content
LLMs will feel like a different beast if you’re used to optimising for traditional search engines. While search engines crawl and index, LLMs read, understand, and generate.
Step one: tokenisation
LLMs don’t read like we do. They break down content into bite-sized bits called “tokens” – think parts of words, punctuation, or spaces. For example, “optimisation” might be tokenised as “optim” + “isation”.
Why does this matter? Because tokenisation powers everything. It feeds into how meaning is constructed, how context is retained, and how long a model can “hold a thought.”
The standard process:
Break the input into digestible chunks
Convert those chunks into tokens
Turn tokens into numbers (vectors, for the techy folk)
Feed these numbers into the neural network for processing
Beyond keywords: semantic understanding
Traditional search hinges on keywords. LLMs? They’re all about meaning. These models position words in a massive multi-dimensional space through a method called embeddings. Words and phrases that “mean” similar things are clustered close together.
So, while “automobile purchase guidance” and “buying advice for cars” may look nothing like SEO tools, they’re neighbours to an LLM.
This means keyword stuffing is yesterday’s news. Content must now show depth, clarity and context across a topic – not just repeat the right phrase.
The context window
LLMs operate within a “context window” – a limit on how much text they can consider in one go. Depending on the model, this might be 8,000 or up to 128,000 tokens.
What’s the implication? Structure matters. If your key messages are buried deep or scattered, they might get missed entirely. So, front-load the good stuff, use clear headings, and build a logical content flow even in snippets.
Enter RAG: Retrieval-Augmented Generation
RAG is the tech that brings real-time information into AI conversations. It powers tools like ChatGPT with Web, Perplexity, and Google AI Overviews, helping LLMs cite up-to-date info from the web instead of relying solely on training data.
How RAG Works
A typical RAG system includes:
Retriever: Searches the web or database for relevant info
Generator: Uses an LLM to respond using that info
Augmentation Layer: Weaves it all together into a coherent answer
Here’s the flow:
Understand the query
Retrieve external info
Assemble context
Engineer the prompt
Generate the response
Link citations back to sources
Each platform handles this differently, but the end goal is the same: accurate, timely, contextual answers.
The technical hurdles
RAG isn’t flawless. The biggest technical challenges include:
Sifting truly relevant info from the noise
Condensing retrieved text to fit within the model’s limits
Reconciling contradictions between live data and pre-trained knowledge
Accurately citing sources
Balancing freshness vs authority
From a content creation perspective, understanding these hurdles helps you create content that’s more likely to be surfaced and cited by AI systems.
SEO vs GEO: what’s technically different?
This isn’t just a new flavour of SEO – it’s a different operating system. Here’s how traditional search and LLM-driven search diverge on a technical level.
Crawling and indexing
Traditional SEO: Relies on crawlers like Googlebot to read and index everything from HTML to JavaScript.
GEO: Most AI crawlers don’t fully render JavaScript. If your key content is client-side only, it might never be seen.
Keep essential information in static HTML. It may not exist in the AI if it’s not there at page load.
Structured data & schema
Traditional SEO: Schema.org markup (like JSON-LD) helps trigger rich results in the SERP.
GEO: LLMs might ignore schema but will understand well-written, semantically clear copy.
That means clear headings, ordered lists, and logical grouping aren’t just nice to have – they’re vital for LLM comprehension.
Authority signals
Traditional SEO: Backlinks, domain authority, site speed, mobile usability
GEO: Mentions, context, and relationships matter more than link juice
LLMs look for how your brand or topic fits into broader semantic ecosystems. Who are you associated with? What context surrounds your name?
Query handling
Traditional SEO: Match queries to ranked pages
GEO: Interpret conversational intent and generate answers directly
Your content isn’t showing up as a blue link anymore. It’s being paraphrased, summarised, and cited – or not.
GEO implementation strategies: what you can actually do
Here’s how to technically optimise for generative search systems.
1. Map entity relationships
Think beyond keywords. Define relationships between your brand and:
Related topics
Industry challenges
Specific tools, features or use cases
Competitive comparisons
Contextualise your offering within an ecosystem. Don’t just describe what you do – explain how it connects to your audience’s problems.
2. Engineer for citation
Make your content citation-friendly:
Include unique research or stats
Format insights clearly with proper attribution
Highlight definitional or explanatory paragraphs
Use structured but readable formats
Aim to be the source that LLMs want to quote. That’s how you show up in AI search results.
3. Prioritise preprocessing
Make it easy for AI crawlers to process your content:
Avoid JavaScript-dependant elements for key content
Use semantic HTML elements: headings, bullets, blockquotes
Create clear visual and logical content hierarchies
Maintain a healthy text-to-code ratio for technical pages
4. Manage context windows
Structure content with LLM constraints in mind:
Front-load essential information
Build standalone sections with complete ideas
Use repetition (not redundancy) to reinforce concepts
Maintain consistent terminology for clarity
In short, say what you mean early and clearly. Don’t leave your best insights buried at the bottom.
What does Passion Digital make of all of this?
The move from SEO to GEO isn’t just technical; it’s transformational. LLMs are changing not just what gets surfaced in search but also how it’s evaluated and delivered.
If SEO is about getting found, GEO is about getting referenced.
By understanding how LLMs think, what they struggle with, and how they choose what to cite, we can create content that performs brilliantly in both traditional and generative search environments.
This isn’t a “one or the other” situation. It’s a case of expanding our toolkit. At Passion Digital, we combine performance and imagination to help brands stay visible, relevant, and ready for whatever comes next.
Because the future of search isn’t coming, it’s already here.