Building a Custom Engine Agent with Vertex AI for Microsoft 365 Copilot

How to design a backend-owned Copilot agent and integrate Google Vertex AI cleanly across cloud boundaries.


Introduction

Extending Microsoft 365 Copilot often starts with declarative agents or lightweight plugins. That works well—until you need full control.

In this project, the goal was different:

  • keep orchestration inside a backend we fully own
  • integrate an external model provider (Google Vertex AI)
  • expose the result through Microsoft 365 surfaces like Teams and Copilot

This led to a Custom Engine Agent architecture, where:

Microsoft handles the channel.
Your backend owns the behavior.

This article focuses on two key areas:

  • how to structure a Custom Engine Agent properly
  • how to integrate Vertex AI as a first-class backend component

Why a Custom Engine Agent?

Instead of a declarative setup, the system is built as a backend-driven agent that can:

  • accept activities from Teams or Copilot
  • orchestrate prompt preparation in C#
  • call Vertex AI directly
  • persist generated assets and metadata
  • expose public download URLs
  • evolve independently of Microsoft 365 packaging

The key architectural decision:

Keep Microsoft 365 at the boundary — everything else is regular backend code.


High-Level Architecture

Copilot / Teams
Custom Engine Agent host (Microsoft Agents SDK)
Agent (activity → domain translation)
Engine (use case logic)
Orchestrator (prompt preparation)
Vertex AI client
Storage (assets + metadata)
Response (image + URL)

Key design rule

  • Host → thin
  • Agent → channel-aware
  • Engine → business logic
  • Vertex client → provider integration

Project Structure

src/
agents/
AgentHost/
infra/
AppHost/
ServiceDefaults/
libs/
Contracts/
Engine/
Infrastructure/
Integrations.VertexAI/
Orchestration/
Prompts/
Storage/
tests/
tools/

This separation enables:

  • independent backend testing
  • fast debugging
  • clear ownership of responsibilities

The Custom Engine Agent Host

var builder = WebApplication.CreateBuilder(args);
var requireAuth = builder.Configuration.GetValue<bool>("AgentSdk:EnableAuthentication");
builder.AddServiceDefaults();
builder.Services.AddHttpClient();
builder.Services.AddControllers();
builder.Services.AddProblemDetails();
builder.Services.AddCore(builder.Configuration, builder.Environment.ContentRootPath);
builder.AddAgentApplicationOptions();
builder.AddAgent<ImageAgent>();
builder.Services.AddSingleton<IStorage, MemoryStorage>();
builder.Services.AddA2AAdapter();
builder.Services.AddAgentAuthentication(builder.Configuration, requireAuth);
var app = builder.Build();
if (requireAuth)
{
app.UseAuthentication();
app.UseAuthorization();
}
app.MapAgentApplicationEndpoints(requireAuth);
app.MapA2AEndpoints(requireAuth);

The host also exposes HTTP endpoints like:

  • /images/generate
  • /images/edit
  • /assets/...

This allows testing without Teams, which is critical.


Translating Activities into Backend Logic

public sealed class ImageAgent : AgentApplication
{
private readonly IImageGenerationEngine _engine;
public ImageAgent(AgentApplicationOptions options, IImageGenerationEngine engine)
: base(options)
{
_engine = engine;
OnActivity(ActivityTypes.Message, OnMessageAsync);
}
private async Task OnMessageAsync(ITurnContext context, ITurnState state, CancellationToken ct)
{
var message = context.Activity.Text?.Trim();
if (string.IsNullOrWhiteSpace(message))
{
await context.SendActivityAsync("Provide a prompt.");
return;
}
var result = await _engine.GenerateImageAsync(
new GenerateImageRequest(message), ct);
await SendImageAsync(context, result, ct);
}
}

The agent should stay thin. All real logic belongs in the backend.


The Engine Layer

public sealed class ImageGenerationEngine(
IPromptWorkflowOrchestrator orchestrator,
IVertexAiImageClient vertexClient,
IAssetStorage storage)
{
public async Task<GenerateImageResponse> GenerateImageAsync(
GenerateImageRequest request,
CancellationToken ct = default)
{
ValidatePrompt(request.Prompt);
var prepared = await orchestrator.PrepareGenerateAsync(request, ct);
var generated = await vertexClient.GenerateAsync(prepared, ct);
var stored = await storage.SaveAsync(generated, ct);
return new GenerateImageResponse(
stored.AssetId,
stored.AssetUri,
stored.FileName,
stored.ContentType);
}
}

This is where the system becomes a real application, not just a bot.


Prompt Orchestration

public sealed class PromptWorkflowOrchestrator
{
public Task<PreparedPrompt> PrepareGenerateAsync(
GenerateImageRequest request)
{
var prompt = $"Create an image for prompt '{request.Prompt}'";
var metadata = new Dictionary<string, string>
{
["operation"] = "generate",
["style"] = request.Style ?? "default"
};
return Task.FromResult(new PreparedPrompt(prompt, metadata));
}
}

This layer enables:

  • prompt policies
  • safety rules
  • future workflows

Deep Dive: Vertex AI Integration

This is the most critical boundary in the system.

Microsoft handles transport.
Vertex AI handles generation.
Your backend owns everything in between.


Design Goals

  • explicit integration (no hidden SDK magic)
  • full control over payloads
  • provider-agnostic interface
  • testable in isolation
  • cloud-neutral

Client Interface

public interface IVertexAiImageClient
{
Task<GeneratedAsset> GenerateAsync(
PreparedPrompt prompt,
CancellationToken ct = default);
Task<GeneratedAsset> EditAsync(
EditImageRequest request,
CancellationToken ct = default);
}

Endpoint Construction

private string GetGenerateEndpoint()
{
return $"https://{_options.Region}-aiplatform.googleapis.com/v1/projects/{_options.ProjectId}/locations/{_options.Region}/publishers/google/models/{_options.Model}:predict";
}

Request Payload

public Task<GeneratedAsset> GenerateAsync(
PreparedPrompt prompt,
CancellationToken ct = default)
{
var aspectRatio = prompt.Metadata.TryGetValue("aspectRatio", out var value)
? value
: "1:1";
var payload = new
{
instances = new[]
{
new
{
prompt = prompt.Value,
negativePrompt = "low quality, blurry"
}
},
parameters = new
{
sampleCount = 1,
aspectRatio,
seed = 1234
}
};
return SendPredictRequestAsync(
GetGenerateEndpoint(),
payload,
"generate",
prompt.Value,
ct);
}

Sending Requests

private async Task<GeneratedAsset> SendPredictRequestAsync(
string endpoint,
object payload,
string operation,
string sourcePrompt,
CancellationToken ct)
{
var credential = await CreateCredentialAsync(ct);
var token = await credential.GetAccessTokenForRequestAsync();
using var request = new HttpRequestMessage(HttpMethod.Post, endpoint);
request.Headers.Authorization = new AuthenticationHeaderValue("Bearer", token);
var json = JsonSerializer.Serialize(payload);
request.Content = new StringContent(json, Encoding.UTF8, "application/json");
using var response = await _httpClient.SendAsync(request, ct);
if (!response.IsSuccessStatusCode)
{
var error = await response.Content.ReadAsStringAsync(ct);
throw new Exception($"Vertex AI error: {error}");
}
var responseContent = await response.Content.ReadAsStringAsync(ct);
return MapResponse(responseContent, operation, sourcePrompt);
}

Response Mapping

private GeneratedAsset MapResponse(string json, string operation, string sourcePrompt)
{
using var doc = JsonDocument.Parse(json);
var base64 = doc.RootElement
.GetProperty("predictions")[0]
.GetProperty("bytesBase64Encoded")
.GetString();
var bytes = Convert.FromBase64String(base64);
return new GeneratedAsset
{
Content = bytes,
FileName = $"{Guid.NewGuid()}.png",
ContentType = "image/png",
Provider = "VertexAI",
SourcePrompt = sourcePrompt,
Operation = operation
};
}

Authentication

private async Task<GoogleCredential> CreateCredentialAsync(CancellationToken ct)
{
if (!string.IsNullOrWhiteSpace(_options.CredentialsJson))
return GoogleCredential.FromJson(_options.CredentialsJson);
if (!string.IsNullOrWhiteSpace(_options.CredentialsJsonBase64))
{
var json = Encoding.UTF8.GetString(
Convert.FromBase64String(_options.CredentialsJsonBase64));
return GoogleCredential.FromJson(json);
}
return await GoogleCredential.GetApplicationDefaultAsync(ct);
}

Cross-Cloud Lesson

Local:

gcloud auth application-default login

✔ Works

Azure:

❌ Fails


Solution

VertexAi__CredentialsJsonBase64=...

Storage Layer

public async Task<StoredAsset> SaveAsync(GeneratedAsset asset)
{
var id = Guid.NewGuid().ToString("n");
var filePath = Path.Combine(_root, id, asset.FileName);
await File.WriteAllBytesAsync(filePath, asset.Content);
return new StoredAsset(id, $"/assets/{id}/{asset.FileName}");
}

Public URLs

public string ResolveAssetUrl(string uri)
{
return $"{PublicOrigin}{uri}";
}

Deployment with Aspire

var builder = DistributedApplication.CreateBuilder(args);
builder.AddProject<Projects.AgentHost>("agent-host")
.WithExternalHttpEndpoints()
.WithHttpHealthCheck("/health");
builder.Build().Run();

Debugging Reality

At one point:

  • Playground ✅
  • Backend ✅
  • Teams ❌

Error

The tenant admin disabled this bot

Root Cause

Broken Azure Bot identity.


Fix

  • new App Registration
  • new Azure Bot
  • same backend

➡️ everything worked instantly


Key Takeaways

  • Custom Engine Agent = backend-first architecture
  • Vertex AI = explicit integration layer
  • Separate everything aggressively
  • Debug by layers, not assumptions

Final Thoughts

This approach shifts the model:

From:

“Copilot calls AI”

To:

“Copilot calls your system, which uses AI”

That difference is what enables:

  • control
  • reliability
  • extensibility

That’s all folks!

Cheers!
Gašper Rupnik

{End.}

Leave a comment

Website Powered by WordPress.com.

Up ↑