From Declarative Agent to Source-Grounded Legal Copilot

When teams first introduce a Microsoft 365 declarative agent into an existing system, the goal is usually straightforward: expose backend capabilities through a Copilot-friendly interface.

That’s useful—but not yet transformative.

The real shift happens when the agent stops being just a thin wrapper over an API and starts driving interaction based on grounded evidence. Instead of returning only an answer, the system begins suggesting the next valid step—and does so deterministically, based on real legal sources.

This article walks through that transition from an engineering perspective.


The problem with “one-shot” Copilot interactions

A typical declarative agent interaction looks like this:

  • The user asks a question
  • The backend returns an answer
  • Sources are displayed
  • The conversation stops

At that point, the burden shifts back to the user: What should I ask next?

In domains like legal or policy workflows, this is a serious limitation. The system already has the context, the sources, and the structure—but the interaction model doesn’t leverage it.

The challenge becomes:

How do we guide the user forward without letting the model improvise or hallucinate the next step?


Design goals

The solution was built around a few strict constraints:

  • Only generate follow-up prompts when they can be grounded in real legal sources
  • Keep the API contract stable (no Copilot-specific hacks)
  • Make prompts directly actionable in the UI
  • Avoid redundant or repeated suggestions

This leads to a key principle:

The system should not invent the next step. It should derive it from evidence.


Architecture overview

The implementation builds on top of an existing “rich answer” pipeline.

High-level flow:

  1. Backend returns an answer (text + sources)
  2. A parser extracts structured data from the response
  3. Source metadata is analyzed
  4. A prompt builder evaluates whether grounded suggestions can be generated
  5. If conditions are met → prompts are emitted
  6. UI renders them as clickable actions

This keeps the system data-first:

  • Backend = logic + structure
  • UI = rendering only

No hidden heuristics in the frontend.


Stable API contract

Instead of introducing a new response shape, the system extends the existing one:

public sealed record CopilotSuggestedPrompt(string Text, string Kind);
public sealed record CopilotRichAnswerResponse(
string Answer,
CopilotSectionReference? SectionReference,
IReadOnlyList<CopilotSource> Sources,
IReadOnlyList<CopilotSuggestedPrompt> SuggestedPrompts);

This is important for a few reasons:

  • Works for Copilot, plugins, and tests alike
  • Keeps responsibilities clean
  • Enables testability without UI

The endpoint remains standard and connector-friendly, while simply returning richer structured data.


Parse once, reuse everywhere

A subtle but important design decision:

Parse the answer once—and use that structured result everywhere.

var parsed = ParseSourcesWithMeta(sourcesBlock);
var sources = parsed.Select(p => p.Source).ToList();
var sectionReference = BuildSectionReferenceFromSources(parsed);
var suggestedPrompts = await CopilotSuggestedPromptBuilder.BuildAsync(
answer,
originalQuestion,
sectionReference?.Celex,
parsed.Select(p => p.Lang).ToList(),
getExtractionResultAsync,
ct);

This ensures:

  • No duplicate parsing logic
  • No UI-specific interpretation layer
  • Full consistency between answer, sources, and suggestions

Grounding follow-up prompts in legal evidence

Here’s the core innovation.

Instead of asking an LLM:

“What’s a good next question?”

The system asks:

  • Is this answer tied to a known regulation?
  • Does it include a defined legal term?
  • Can that term be traced back to a source?
  • Is the user already asking about it?

Only if all checks pass → generate a prompt.

if (!IsRegulation(celex))
return Array.Empty<CopilotSuggestedPrompt>();
var extraction = await getExtractionResultAsync(lang, ct);
if (!extraction.Success || extraction.Terms.Count == 0)
return Array.Empty<CopilotSuggestedPrompt>();
var matched = FindMatchedTerms(answerText, extraction.Terms);
if (matched.Count == 0)
return Array.Empty<CopilotSuggestedPrompt>();

Generated prompt example:

var text = lang == "SL"
? $"Kaj pomeni \"{canonical}\" po členu 3 Uredbe (ES) št. 1333/2008?"
: $"What does \"{canonical}\" mean under Article 3 of Regulation (EC) No 1333/2008?";

This is not a suggestion.

It is a validated next step.


Extracting legal terms from the source

Instead of maintaining a static list of definitions, the system extracts them directly from EUR-Lex HTML.

private const string ConsolidatedExternalId = "02008R1333-20250731";
var html = await loadHtmlAsync(lang, ct);
if (string.IsNullOrWhiteSpace(html))
{
return new ExtractionResult(
Success: false,
FailureReason: "source_file_not_found",
Terms: Array.Empty<DefinedTerm>());
}

Extraction is intentionally strict and regex-based:

private static readonly Regex EnDefinitionRegex = new(
"^\\s*[\"“](?<term>[^\"”]{3,180})[\"”]\\s+(?:means|shall\\s+mean)",
RegexOptions.Compiled | RegexOptions.IgnoreCase);

Why this matters:

  • No manual drift from legal sources
  • Transparent failure modes
  • Fully deterministic behavior

UI: rendering instead of reasoning

The frontend (Adaptive Card) simply renders what it receives:

{
"type": "Action.Submit",
"title": "${suggestedPrompts[0].text}",
"data": {
"msteams": {
"type": "imBack",
"value": "${suggestedPrompts[0].text}"
}
}
}

This is critical:

The UI does not generate logic. It only exposes it.


Why this works better than generic suggestions

Most AI-driven “next prompt” systems fail in regulated domains because they:

  • Suggest plausible but unsupported questions
  • Repeat user intent
  • Drift into unrelated legal areas
  • Change language unexpectedly

This implementation avoids that by enforcing strict conditions:

A prompt exists only if:

  • The source corpus is known
  • The concept exists in the answer
  • The concept is extracted from source text
  • The language is deterministic
  • The prompt is not redundant

That makes it production-safe.


Tests matter more than the feature

The behavior is locked with unit tests:

rich.SuggestedPrompts.Should().ContainSingle(p =>
p.Kind == "defined_term" &&
p.Text == "Kaj pomeni \"živilo z zmanjšano energijsko vrednostjo\" po členu 3 Uredbe (ES) št. 1333/2008?");

And equally important:

  • If the user already asks for a definition → no prompt

This avoids the most common UX failure: repetition.


What actually changed

From a user perspective:

  • Answers still look the same
  • Sources are still visible
  • But now → next steps are clickable

From an engineering perspective:

  • Backend owns interaction logic
  • UI stays simple
  • Legal sources drive the flow
  • The system becomes guided, not just reactive

Key takeaways

  • Declarative agents become powerful when they return interaction-ready data, not just text
  • In regulated domains, prompts must be source-grounded, not generated freely
  • Stable API contracts enable safe evolution
  • UI should render decisions, not make them

Final thought

The biggest improvement didn’t come from making the system more generative.

It came from making it more constrained.

The strongest Copilot experiences are built by limiting model freedom and maximizing the evidence path.

That’s all folks!

Cheers!
Gašper Rupnik

{End.}

Leave a comment

Website Powered by WordPress.com.

Up ↑