Five AI engineering patterns from SUGCON Europe 2026
A post-event summary of Think Fresh Digital’s SUGCON Europe 2026 session on building an AI code analysis engine.

SUGCON Europe 2026 brought the Sitecore community together in London for two days of practical sessions and technical discussion. For Think Fresh Digital, it was also an opportunity to share something we have been working on closely: a production-grade AI migration analyser for teams moving from Sitecore JSS to the Sitecore Content SDK.
Our session was called Beyond the chatbot: five AI engineering patterns from building a Sitecore code analysis engine. The core argument was simple: the interesting part of enterprise AI is rarely the model call itself. The harder problem is everything around it, retrieval, orchestration, validation, observability, reporting and regression testing.
This article summarises the session for anyone who could not attend, and records the engineering patterns we think are worth reusing in other AI-enabled developer tools.
Why this topic matters now
Sitecore’s own Content SDK documentation describes the SDK as a major upgrade from previous JavaScript Services SDK implementations for SitecoreAI, reducing the size and complexity of starter applications and removing redundant functionality that is not needed for SitecoreAI implementations.
That was the immediate technical backdrop for the session, but it was not the whole point of the talk. We were not arguing that every automated code migration needs a custom AI platform, or that this is the only sensible way to migrate from JSS to the Content SDK. There are other valid approaches, including using today’s AI coding tools with the Sitecore documentation and the Content SDK starter kit as context.
The reason we built the analyser was to explore the problem from a different angle: what does it take to make AI-assisted code analysis repeatable, evidence-backed and usable across multiple repositories, customers and delivery contexts?
The patterns we covered, retrieval, structured outputs, orchestration, routing and evals, apply to a much wider class of AI-enabled developer tools.
Our response was not to build an agent that rewrites the application automatically. That is deliberately out of scope for this phase. Instead, we built an analyser that scans a local project, sends files to a backend, retrieves relevant documentation, produces structured per-file findings, and aggregates those findings into executive and technical reports.
In other words: analysis first, automation later.
The system we presented
At a high level, the analyser follows this flow:
- The CLI scans the local project.
- Each relevant file is sent to the backend.
- The backend embeds the file context and retrieves relevant Sitecore documentation and custom knowledge-base content.
- The model analyses the file using the retrieved context.
- The response is forced into a strict JSON schema.
- Findings are stored and aggregated into repository-level recommendations.
- The system produces an executive summary and a technical markdown report that developers can use in their IDE workflow.
The important design point is that the system is not treated as a single prompt. It is treated as a pipeline with trust boundaries. Retrieval can fail. Model output can drift. Schemas can change. Queues can back up. Reports can overstate certainty. A production-grade implementation has to expect those failure modes and design around them.

Pattern 1: RAG and smart chunking
The first pattern was retrieval augmented generation. We chose RAG because code in isolation is not enough. A model can inspect a file and make a plausible recommendation, but without the right documentation it is much harder to know whether that recommendation is current, relevant or defensible.
The goal was to bind the right evidence to the right file. A migration report that says “change this file” is less useful than a report that says “change this file because this documented migration rule applies here”.
The main engineering challenge is chunking. If documentation is split too coarsely, retrieval brings back broad pages with too much noise. If it is split too aggressively, the model loses the surrounding meaning. The same applies to code. A component, route handler or configuration file often needs to be interpreted as a unit, not as arbitrary token windows.
Our preferred approach is structure-aware chunking: documentation chunks retain headings, product/version metadata and source references; code chunks preserve useful context such as file path, framework conventions and neighbouring imports.
This still has failure modes. The retriever can return the wrong Sitecore version, a weakly related documentation chunk, or only part of the relevant context. That is why retrieval quality has to be observable. We use metadata filters, source ranking, reranking, confidence scoring and fallback behaviour so that weak evidence is surfaced rather than hidden.
The alternatives are valid in different circumstances. Fine-tuning can help with repeated patterns and tone, but it is less attractive when documentation freshness and citation traceability matter.
async function retrieveMigrationContext(request) {
// 1. Embed the file/query so we can search by meaning,
// not only by exact keywords.
const queryVector = await embeddingModel.embed(request.queryText);
// 2. Filter retrieval to the current migration scenario.
// This reduces the risk of pulling guidance for the wrong
// product, version or file type.
const filters = {
product: request.product,
fromVersion: request.fromVersion,
toVersion: request.toVersion,
scope: request.fileName ? ["global", `file:${request.fileName}`] : ["global"]
};
// 3. Search the migration knowledge base for relevant chunks.
// These chunks come from official docs and curated migration notes.
const rawResults = await vectorSearch.search({
vector: queryVector,
filters,
nearestNeighbours: 5
});
// 4. Remove weak matches and duplicate chunks.
// Weak retrieval is safer to expose than hide.
const primaryResults = deduplicateByChunk(rawResults)
.filter(chunk => chunk.score >= 0.6)
.sort((a, b) => b.score - a.score)
.slice(0, 10);
// 5. Fetch neighbouring chunks from the same source.
// This protects examples or explanations that were split
// across chunk boundaries.
const expandedResults = await includeSiblingChunks(primaryResults, {
before: 1,
after: 1
});
// 6. Keep the final context small and relevant before sending
// it into the analysis prompt.
return expandedResults
.sort((a, b) => b.score - a.score)
.slice(0, 15);
}Pattern 2: Structured generation
The second pattern was structured generation. For this type of tool, prose alone is not enough. The backend needs predictable fields: file path, finding type, severity, confidence, evidence, suggested change, estimated effort and citations.
That means the model output is treated as a contract. The model can generate the content, but the application decides whether that content is valid.
In our implementation, LangChain gives us a useful abstraction for structured output. Where the underlying model supports native structured output, we can lean on that capability. Where it does not, LangChain can use tool-calling style strategies to request a structured object that conforms to a schema.
That does not remove the need for validation. Some models still return malformed JSON, wrap output in markdown code fences, omit required fields or drift away from the agreed shape. For those cases, we keep a post-response validation layer: strip unsafe wrappers, parse the response, validate against the schema, and only then allow the finding into the rest of the pipeline.
import { z } from "zod";
export const MigrationFindingSchema = z.object({
filePath: z.string(),
findingType: z.enum([
"deprecated-package",
"configuration-change",
"rendering-pattern",
"tracking-or-personalisation",
"manual-review"
]),
severity: z.enum(["low", "medium", "high"]),
confidence: z.number().min(0).max(1),
summary: z.string(),
evidence: z.array(z.object({
sourceTitle: z.string(),
sourceUrl: z.string().url(),
quoteOrReference: z.string()
})),
suggestedAction: z.string(),
estimatedEffort: z.enum(["small", "medium", "large", "unknown"]),
requiresHumanReview: z.boolean()
});The practical lesson is that structured generation is a safety boundary. If a recommendation has no evidence, if the severity is outside the allowed range, or if the model invents a field the application does not understand, the system should fail safely rather than quietly producing a polished but unreliable report.
Pattern 3: Async orchestration
The third pattern was async orchestration. A synchronous HTTP request might work for a demo repo. It does not hold up well when the repository contains hundreds or thousands of files, each with its own retrieval and model invocation step.
Moving file analysis into a queue-based workflow gives the system backpressure, retry behaviour and operational visibility. It also allows the CLI to submit work, receive a job ID and poll for status, rather than holding open a long-running request.
type AnalyseFileMessage = {
jobId: string;
correlationId: string;
filePath: string;
contentHash: string;
repositorySnapshotId: string;
attempt: number;
};
async function handleAnalyseFile(message: AnalyseFileMessage) {
const existing = await findingsStore.findByHash(
message.jobId,
message.filePath,
message.contentHash
);
if (existing) {
return existing;
}
const retrievalContext = await retriever.getRelevantContext(message);
const rawFinding = await modelRouter.analyseFile(message, retrievalContext);
const finding = MigrationFindingSchema.parse(rawFinding);
await findingsStore.save({
...finding,
jobId: message.jobId,
correlationId: message.correlationId
});
}Queues bring their own responsibilities. Duplicate messages, partial completion, poison messages and premature finalisation all have to be handled deliberately. That is why the workflow uses idempotency keys, content hashes, correlation IDs, retry limits and dead-letter handling.
We also looked at alternatives to Azure Storage Queues. Azure Durable Functions is particularly relevant for this kind of workload because it supports a native fan-out/fan-in orchestration pattern.
In this context, fan-out/fan-in means taking one parent job, splitting it into many parallel child tasks, waiting for those child tasks to finish, and then running the next step once. For a code analyser, that maps naturally to “analyse every file independently, then aggregate the results into a final report”.
That is architecturally cleaner than asking the CLI to decide when the backend is ready to finalise a job. It also makes the job lifecycle more explicitly owned by the backend. The trade-off is that Durable Functions introduces a different programming model and a more opinionated orchestration runtime. For our current implementation, Storage Queues were the simpler path. For a future version, Durable Functions is a strong candidate for simplifying finalisation and making fan-out/fan-in behaviour first-class.
Parallel HTTP from the CLI is another possible approach. It is simple at first, but it pushes retry logic, lifecycle coordination and finalisation responsibility into the client. That was not the direction we wanted for a repeatable tool.
Pattern 4: Model router
The fourth pattern was the model router. Model choice changes faster than workflow logic. Context windows, pricing, latency, structured-output support and quality vary across vendors and versions. Hard-coding a single model into every analysis path makes the system harder to change safely.
On Azure, the model router is deployed like another model deployment. Once deployed, it can be configured to optimise routing for quality, cost or a balanced profile. The application calls the deployment; the routing decision sits behind that endpoint rather than being scattered through application code.
That separation matters. We want provider and model choice to be policy, not business logic. The analyser should not need to know that one provider is preferred for a short configuration file while another is better for a complex rendering file with multiple dependencies.
This is especially relevant when analysing a codebase because each file is fundamentally different. A small config file, a package manifest, a rendering component and a custom integration file do not necessarily need the same model. Smaller, more efficient models may be sufficient for simple files and can reduce latency because there is less model capacity involved. More capable models can be reserved for files with higher complexity, more ambiguity or a greater risk of expensive mistakes.
The risk is that different models can produce different recommendations for the same input, or that a cost-optimised route silently reduces analysis quality. That is why routing needs to be paired with model/version logging and automated evals. A router gives flexibility, but it does not remove accountability for output quality.
Pattern 5: Automated evals
The final pattern was automated evaluation. We described this as “unit tests for AI”, with one important caveat: they are not unit tests in the deterministic software sense. They are regression checks that help detect whether prompt, retrieval, model or schema changes have made the system worse.
The foundation is a golden set: a collection of representative inputs with known expected outcomes. For this analyser, that means sample files, expected findings, expected citation behaviour, known edge cases and examples where the correct answer is to downgrade confidence or request human review.
Azure AI Foundry provides general-purpose evaluators for LLM quality, safety and reliability. Those are useful starting points, particularly when experimenting in the portal and deciding what “good” looks like. But generic evaluators cannot fully understand whether a Sitecore migration recommendation is correct for a specific file. For domain correctness, we still need custom evaluators that measure whether the recommendation matches the expected migration behaviour.
Our preferred workflow is to start in the Azure portal because it gives fast feedback while the team is still shaping the evaluation approach. Once the golden set and evaluator definitions are useful, the next step is to move them into CI/CD so prompt changes, retrieval changes and model changes can be checked before they reach users.
The goal is not to prove perfection. It is to detect drift before users do. If a newer model starts omitting citations, if a prompt change makes recommendations less specific, or if a retrieval change starts pulling the wrong documentation version, the evals should make that visible.
Human review still matters, especially for high-risk migration advice. Automated evals do not replace expert judgement; they protect the system from quiet regressions between expert reviews.
This evaluator checks domain correctness rather than generic language quality. It compares the analyser’s JSON findings against a known ground truth answer and scores whether the expected migration changes were identified, whether the recommendations are accurate, and whether the advice is specific enough to be useful.
This is the kind of custom evaluator needed when general-purpose LLM quality checks are not enough. A generic evaluator can tell us whether an answer is coherent; this evaluator checks whether the migration advice is actually correct for the file being analysed.
---
name: Migration Correctness Evaluator
description: Evaluates whether the analyser correctly identified required migration changes
model:
api: chat
parameters:
temperature: 0
inputs:
response:
type: string
ground_truth:
type: string
outputs:
score:
type: int
---
system:
You are an expert evaluator for Sitecore JSS to Content SDK migration analysis.
You will receive:
1. A migration analysis response (JSON with findings)
2. A ground truth description of what the analysis should contain
Score the response from 1 to 5:
- 5: All expected migration findings are present, no incorrect recommendations, advice is actionable and specific
- 4: Most expected findings present (>=80%), no incorrect recommendations
- 3: At least half of expected findings present, no incorrect recommendations
- 2: Some expected findings present but significant gaps, or contains incorrect recommendations
- 1: Most expected findings missing, or multiple incorrect recommendations
Evaluate by comparing the response findings against the ground truth description:
1. Does the response identify the key migration changes described in the ground truth?
2. Are the recommendations consistent with what the ground truth expects?
3. Does the response include any recommendations that contradict the ground truth?
4. Are the recommendations specific and actionable (not vague)?
Return ONLY a JSON object: {"score": <1-5>, "reason": "<brief explanation>"}
user:
Response: {{response}}
Ground Truth: {{ground_truth}}Planning a Sitecore JSS to Content SDK migration?
Think Fresh Digital combines senior Sitecore engineering experience with AI-assisted analysis to help teams assess migration scope, identify risk and plan a controlled path to the Content SDK.
What we wanted people to take away
The talk was not about claiming that AI can replace migration expertise. It was about showing how AI can be placed inside an engineered workflow that makes expert review more repeatable.
The strongest pattern across the session was this: enterprise AI is increasingly orchestration. Calling a model is the smallest part of the system. The reliability comes from the surrounding controls, retrieval quality, source versioning, schema contracts, idempotent processing, observability, fallback behaviour and regression testing.
Our three main takeaways were:
- Calling models is easy; coordinating work around them is harder. The useful engineering work happens in the pipeline, not just the prompt.
- These systems are probabilistic, so design the boundaries as if failure is normal. Validate outputs, surface uncertainty and avoid pretending the model is deterministic.
- The goal is repeatable behaviour within acceptable boundaries. That means measurable quality gates, traceable recommendations and clear escalation to human review.
What this means for Sitecore teams
For teams still running JSS implementations, the practical message is not that they must build a system like this before they can migrate. Today’s AI coding tools are capable enough that, for a one-off migration, a team may reasonably point a tool such as Claude Code at the Sitecore documentation and the Content SDK starter kit, run the work in a branch, and put the result through human review.
In one internal test, we used the analyser to produce a combined migration artefact: the general migration guidance, plus the file-by-file findings from the target repository. We then gave that artefact to Claude Code. For a reasonably standard implementation with a set of components, the analysis took a few minutes and Claude Code completed the migration changes for review in 22 minutes. That is not a universal benchmark, but it is a useful example of how analysis and code-generation workflows can work together.
If you have one small repository and a capable engineering team, you may not need a repeatable analysis engine. There are many valid ways to approach the migration.
What we built is different: a repeatable tool that can analyse codebases across different levels of complexity, produce consistent evidence-backed findings, and do that over and over again across multiple customers and clients.
That was the main point of our SUGCON session. We started with the Sitecore Content SDK migration problem, approached it from a different angle, and used it to explore five engineering patterns that apply far beyond Sitecore codebases.
A set of practical patterns for making AI-assisted engineering work more repeatable, auditable and safe enough to use.

Exploring your options?
We’re here to help you think through what’s possible, at any stage of your project.
FAQs
SUGCON Europe 2026 session FAQs
The session covered five AI engineering patterns learned while building a Sitecore Content SDK migration analyser: RAG and smart chunking, structured generation, async orchestration, model routing and automated evals.
Yes. The CLI part of the analyser is available in a public GitHub repository at think-fresh-digital/content-sdk-migration-cli. The backend is not public yet, but we are working on making more of the implementation available. If you want to try the system, get in touch for an API key using the contact form or email connect@thinkfresh.digital.
No. The current version focuses on analysis, documentation-backed findings and reporting. It does not directly apply code changes itself. However, one useful output is a larger migration artefact that combines the general migration guidance with the file-by-file analysis. That artefact can then be given to an LLM coding tool as context for a branch-based migration, ready for human review.
RAG allows the analyser to retrieve relevant Sitecore documentation and custom knowledge-base content at analysis time. Code alone is not enough: the system needs the right documentation context so it can bind the right evidence to the right file and make recommendations easier to review.
Strict JSON makes findings easier to validate, aggregate and turn into reports. In practice, this still needs post-response validation because not every model supports native structured output equally well. The system should parse, validate and safely reject or repair output before it reaches the reporting layer.
Automated evals are regression checks for AI behaviour. In this context, they use golden sets of representative inputs with known expected outcomes to test schema validity, citation behaviour, recommendation correctness and whether the system surfaces uncertainty when evidence is weak.
