From Declarative Agent to Source-Grounded Legal Copilot

When teams first introduce a Microsoft 365 declarative agent into an existing system, the goal is usually straightforward: expose backend capabilities through a Copilot-friendly interface.

That’s useful—but not yet transformative.

The real shift happens when the agent stops being just a thin wrapper over an API and starts driving interaction based on grounded evidence. Instead of returning only an answer, the system begins suggesting the next valid step—and does so deterministically, based on real legal sources.

This article walks through that transition from an engineering perspective.

The problem with “one-shot” Copilot interactions

A typical declarative agent interaction looks like this:

The user asks a question
The backend returns an answer
Sources are displayed
The conversation stops

At that point, the burden shifts back to the user: What should I ask next?

In domains like legal or policy workflows, this is a serious limitation. The system already has the context, the sources, and the structure—but the interaction model doesn’t leverage it.

The challenge becomes:

How do we guide the user forward without letting the model improvise or hallucinate the next step?

Design goals

The solution was built around a few strict constraints:

Only generate follow-up prompts when they can be grounded in real legal sources
Keep the API contract stable (no Copilot-specific hacks)
Make prompts directly actionable in the UI
Avoid redundant or repeated suggestions

This leads to a key principle:

The system should not invent the next step. It should derive it from evidence.

Architecture overview

The implementation builds on top of an existing “rich answer” pipeline.

High-level flow:

Backend returns an answer (text + sources)
A parser extracts structured data from the response
Source metadata is analyzed
A prompt builder evaluates whether grounded suggestions can be generated
If conditions are met → prompts are emitted
UI renders them as clickable actions

This keeps the system data-first:

Backend = logic + structure
UI = rendering only

No hidden heuristics in the frontend.

Stable API contract

Instead of introducing a new response shape, the system extends the existing one:

			
public sealed record CopilotSuggestedPrompt(string Text, string Kind);
public sealed record CopilotRichAnswerResponse(
    string Answer,
    CopilotSectionReference? SectionReference,
    IReadOnlyList<CopilotSource> Sources,
    IReadOnlyList<CopilotSuggestedPrompt> SuggestedPrompts);
    

		

This is important for a few reasons:

Works for Copilot, plugins, and tests alike
Keeps responsibilities clean
Enables testability without UI

The endpoint remains standard and connector-friendly, while simply returning richer structured data.

Parse once, reuse everywhere

A subtle but important design decision:

Parse the answer once—and use that structured result everywhere.

			
var parsed = ParseSourcesWithMeta(sourcesBlock);
var sources = parsed.Select(p => p.Source).ToList();
var sectionReference = BuildSectionReferenceFromSources(parsed);
var suggestedPrompts = await CopilotSuggestedPromptBuilder.BuildAsync(
    answer,
    originalQuestion,
    sectionReference?.Celex,
    parsed.Select(p => p.Lang).ToList(),
    getExtractionResultAsync,
    ct);
    

		

This ensures:

No duplicate parsing logic
No UI-specific interpretation layer
Full consistency between answer, sources, and suggestions

Grounding follow-up prompts in legal evidence

Here’s the core innovation.

Instead of asking an LLM:

“What’s a good next question?”

The system asks:

Is this answer tied to a known regulation?
Does it include a defined legal term?
Can that term be traced back to a source?
Is the user already asking about it?

Only if all checks pass → generate a prompt.

			
if (!IsRegulation(celex))
    return Array.Empty<CopilotSuggestedPrompt>();
var extraction = await getExtractionResultAsync(lang, ct);
if (!extraction.Success || extraction.Terms.Count == 0)
    return Array.Empty<CopilotSuggestedPrompt>();
var matched = FindMatchedTerms(answerText, extraction.Terms);
if (matched.Count == 0)
    return Array.Empty<CopilotSuggestedPrompt>();
    

		

Generated prompt example:

			
var text = lang == "SL"
    ? $"Kaj pomeni \"{canonical}\" po členu 3 Uredbe (ES) št. 1333/2008?"
    : $"What does \"{canonical}\" mean under Article 3 of Regulation (EC) No 1333/2008?";

This is not a suggestion.

It is a validated next step.

Extracting legal terms from the source

Instead of maintaining a static list of definitions, the system extracts them directly from EUR-Lex HTML.

			
private const string ConsolidatedExternalId = "02008R1333-20250731";
var html = await loadHtmlAsync(lang, ct);
if (string.IsNullOrWhiteSpace(html))
{
    return new ExtractionResult(
        Success: false,
        FailureReason: "source_file_not_found",
        Terms: Array.Empty<DefinedTerm>());
}

		

Extraction is intentionally strict and regex-based:

			
private static readonly Regex EnDefinitionRegex = new(
    "^\\s*[\"“](?<term>[^\"”]{3,180})[\"”]\\s+(?:means|shall\\s+mean)",
    RegexOptions.Compiled | RegexOptions.IgnoreCase);

Why this matters:

No manual drift from legal sources
Transparent failure modes
Fully deterministic behavior

UI: rendering instead of reasoning

The frontend (Adaptive Card) simply renders what it receives:

			
{
  "type": "Action.Submit",
  "title": "${suggestedPrompts[0].text}",
  "data": {
    "msteams": {
      "type": "imBack",
      "value": "${suggestedPrompts[0].text}"
    }
  }
}

		

This is critical:

The UI does not generate logic. It only exposes it.

Why this works better than generic suggestions

Most AI-driven “next prompt” systems fail in regulated domains because they:

Suggest plausible but unsupported questions
Repeat user intent
Drift into unrelated legal areas
Change language unexpectedly

This implementation avoids that by enforcing strict conditions:

A prompt exists only if:

The source corpus is known
The concept exists in the answer
The concept is extracted from source text
The language is deterministic
The prompt is not redundant

That makes it production-safe.

Tests matter more than the feature

The behavior is locked with unit tests:

			
rich.SuggestedPrompts.Should().ContainSingle(p =>
    p.Kind == "defined_term" &&
    p.Text == "Kaj pomeni \"živilo z zmanjšano energijsko vrednostjo\" po členu 3 Uredbe (ES) št. 1333/2008?");

And equally important:

If the user already asks for a definition → no prompt

This avoids the most common UX failure: repetition.

What actually changed

From a user perspective:

Answers still look the same
Sources are still visible
But now → next steps are clickable

From an engineering perspective:

Backend owns interaction logic
UI stays simple
Legal sources drive the flow
The system becomes guided, not just reactive

Key takeaways

Declarative agents become powerful when they return interaction-ready data, not just text
In regulated domains, prompts must be source-grounded, not generated freely
Stable API contracts enable safe evolution
UI should render decisions, not make them

Final thought

The biggest improvement didn’t come from making the system more generative.

It came from making it more constrained.

The strongest Copilot experiences are built by limiting model freedom and maximizing the evidence path.

That’s all folks!

Cheers!
Gašper Rupnik

{End.}

From Declarative Agent to Source-Grounded Legal Copilot

The problem with “one-shot” Copilot interactions

Design goals

Architecture overview

Stable API contract

Parse once, reuse everywhere

Grounding follow-up prompts in legal evidence

Extracting legal terms from the source

UI: rendering instead of reasoning

Why this works better than generic suggestions

Tests matter more than the feature

What actually changed

Key takeaways

Final thought

Leave a comment Cancel reply

Follow me on Twitter

Follow Me

RSS

The problem with “one-shot” Copilot interactions

Design goals

Architecture overview

Stable API contract

Parse once, reuse everywhere

Grounding follow-up prompts in legal evidence

Extracting legal terms from the source

UI: rendering instead of reasoning

Why this works better than generic suggestions

Tests matter more than the feature

What actually changed

Key takeaways

Final thought

Share this:

Related

Leave a comment Cancel reply

Follow me on Twitter

Follow Me

RSS