Monday, 6 January 2025

Building an Agent for Microsoft 365 Copilot

Microsoft 365 Copilot is a powerful, general-purpose assistant that greatly improves personal productivity, helping users manage tasks like emails, calendars, document searches, and more.

However, the true potential of M365 Copilot lies in its extensibility. By building specialized "vertical" agents on top of M365 Copilot, you can unlock team productivity as well as automate business processes. These custom agents not only help in individual productivity, but also help in building workflows across groups of people.

Agents for Microsoft 365 Copilot leverage the same robust foundation—its orchestrator, foundation models, and trusted AI services—that powers M365 Copilot itself. This ensures consistency, reliability, and security at scale.

So in this post, let's take a look at how to build Agents on top of Microsoft 365 Copilot. 

We will be building a Declarative Agent with the help of the Teams toolkit. Before we start, we need the following prerequisites:

A Microsoft 365 Copilot license

Teams Toolkit Visual Studio Code extension

Enable side loading of Teams apps

Once everything is in place, we will go to Visual Studio Code 

In the Teams Toolkit extension, To create an agent, we will click on "Create new app"

Then, click on "Agent"


When the agent is created, we see a bunch of files getting created as part of the scaffolding.  So let's take a look the difference moving pieces of the agent:

manifest.json

If you have been doing M365 Apps (and Teams apps) for a while, you are familiar with this file. This is the file which represents our Agent in the M365 App catalog. It contains the various details like name, description and capabilities of the app.

However, you will notice the new property in this file which is "copilotAgents" This property will be pointing to a file containing the description of our new declarative agent. So let's look at how that file looks next:

declarativeAgent.json

Lots of interesting things are happening here. 

 
First the "instructions" property is pointing to a file which will contain the "System prompt" of our agent. We will have a look at this file later. 

Next, the "conversation_starters" property contains ready to go prompts which the user can ask the agent. Our agent will be trained to respond to these prompts. This is so that the user is properly onboarded when they land on our agent.

Finally there will be the actions and capabilities properties: 

Actions property contains connections to external APIs which the agent can invoke.

Capabilities property contains the different out of the box "Tools" which we want to allow in our agent. E.g. SharePoint and OneDrive search, image creation, code interpreter to generate charts etc 

We will talk about both these properties in more details in subsequent blog posts.

instruction.txt

And finally we have the instructions file where we can specify the system prompt for the agent. Here, we can guide the agent and assign it personality. We can make it aware of the tools and capabilities it has available and when the use them. We can provide one shot or few shot examples to "train" the agent on responding to users.

Once all files are in place, you can click "Provision" from the Teams toolkit extension:

And our new agent will be ready!

Here is how the agent will provide the conversation starters when we first lang on it:


Simple conversation using Web search capability:


Conversation using Web search and Code Interpreter capabilities:



Hope this was helpful! We will explore M365 Copilot Agents more in subsequent blog posts.

Monday, 9 December 2024

Search SharePoint and OneDrive files in natural language with OpenAI function calling and Microsoft Graph Search API

By now, we have seen "Chat with your documents" functionality being introduced in many Microsoft 365 applications. It is typically built by combining Large Language Models (LLMs) and vector databases. 

To make the documents "chat ready", they have to be converted to embeddings and stored in vector databases like Azure AI Search. However, indexing the documents and keeping the index in sync are not trivial tasks. There are many moving pieces involved. Also, many times there is no need for "similarity search" or "vector search" where the search is made based on meaning of the query. 

In such cases, a simple "keyword" search can do the trick. The advantage of using keyword search in Microsoft 365 applications is that the Microsoft Search indexes are already available as part of the service. APIs like the Microsoft Graph Search API and the SharePoint Search REST API give us "ready to consume" endpoints which can be used to query documents across SharePoint and OneDrive. Keeping these search indexes in sync with the changes in the documents is also handled by the Microsoft 365 service itself.

So in this post, let's have a look at how we can combine OpenAI's gpt-4o Large Language Model with Microsoft Graph Search API to query SharePoint and OneDrive documents in natural language. 

On a high level we will be using OpenAI function calling to achieve this. Our steps are going to be:

1. Define an OpenAI function and make it available to the LLM.  


2. During the course of the chat, if the LLM thinks that to respond to the user, it needs to call our function, it will respond with the function name along with the parameters.

3. Call the Microsoft Graph Search API based on the parameters provided by the LLM.

4. Send the results returned from the Microsoft Graph back to the LLM to generate a response in natural language.

So let's see how to achieve this. In this code I have used the following nuget packages:

https://www.nuget.org/packages/Azure.AI.OpenAI/2.1.0

https://www.nuget.org/packages/Microsoft.Graph/5.64.0

The first thing we will look at is our OpenAI function definition:

In this function we are informing the LLM that if needs to search any files as part of providing the responses, it can call this function. The function name will be returned in the response and the relevant parameter will be provided as well. Now let's see how our orchestrator function looks:

There is a lot to unpack here as this function is the one which does the heavy lifting. This code is responsible for handling the chat with OpenAI, calling the MS Graph and also responding back to the user based on the response from the Graph. 

Next, let's have a look at the code which calls the Microsoft Graph based on the parameters provided by the LLM. 

Before executing this code, you will need to have created an App registration. Here is how to do that: https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app 

Since we are calling the Microsoft Graph /search endpoint with delegated permissions, the app registration will need a minimum of the User.Read and Files.Read.All permissions granted. https://learn.microsoft.com/en-us/graph/api/search-query?view=graph-rest-1.0&tabs=http

This code get the parameters sent from the LLM and uses the Microsoft Graph .NET SDK to call the /search endpoint and fetch the files based on the searchQuery properties. Once the files are returned, their summary value is concatenated into a string and returned to the orchestrator function so that it can be sent again to the LLM. 

Finally, lets have a look at our CallOpenAI function which is responsible for talking to the Open AI chat api.
 
This code defines the Open AI function which will be included in our Chat API calls. Also, the user's search query is sent to the API to determine if the function needs to be called. This function is also called again after the response from the Microsoft Graph is fetched. At that time, this function contains the details fetched from the Graph to generate an output in natural language. This way, we can use Open AI function calling together with Microsoft Graph API to search files in SharePoint and OneDrive.

Hope this helps!

Tuesday, 5 November 2024

Working with OpenAI Assistants: Using code interpreter to generate charts

This is the fourth post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the code interpreter tool which allows us to generate charts based on some data. This is very powerful for scenarios where you have to do data analysis on JSON, csv or Microsoft Excel files and generate charts and reports based on them.

See the following posts for the entire series:

Working with the OpenAI Assistants API: Create a simple assistant

Working with the OpenAI Assistants API: Using file search 

Working with the OpenAI Assistants API: Chat with Excel files using Code interpreter 

Working with the OpenAI Assistants API: Using code interpreter to generate charts (this post) 

The Code Interpreter tool has access to a sandboxed python code execution environment within the Assistants API. This can provide very useful as the Assistants API can iteratively run code against the files provided to it and generate charts!

So in this post, let's see how we can generate charts based on an excel file with the code interpreter tool. The excel file we will be querying will be the same one we used in the last post. It contains details of customers like their name and the licenses purchased of a fictional product by them:

To generate charts using the Code interpreter, we have to use the following moving pieces: 

  • First, we need to upload the excel file using the Open AI File client 
  • Then, we need to connect the uploaded file to the Code Interpreter tool in either an assistant or a thread which would enable the assistant to generate a chart on the document.
For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

And this is the file generated by the code interpreter tool:

As you can see the code interpreter tool takes a few passes at the data. It tries to understand the document before generating the chart. This is a really powerful feature and the possibilities are endless! 

Hope this helps.

Monday, 4 November 2024

Working with OpenAI Assistants: Chat with Excel files using Code interpreter

This is the third post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the code interpreter tool which allows us to upload files to the Assistants API and write python code against them. This is very powerful for scenarios where you have to do data analysis on csv or Microsoft Excel files and generate charts and reports on them.

See the following posts for the entire series:

Working with the OpenAI Assistants: Create a simple assistant

Working with the OpenAI Assistants: Using file search 

Working with the OpenAI Assistants: Chat with Excel files using code interpreter (this post) 

Working with OpenAI Assistants: Using code interpreter to generate charts

The Retrieval Augmented Generation (RAG) pattern, which was discussed in previous posts, works great for text based files like Microsoft Word and PDF documents. However, when it comes to structured data files like csv or excel, it comes out short. An this where the Code Interpreter tool can come in very handy. It can repetitively run python code on documents until it is confident that the user's question has been answered.

So in this post, let's see how we can query an excel file with the code interpreter tool. The excel file we will be querying will contain details of customers like their name and the licenses purchased of a fictional product by them:

To upload and analyse documents using the Code interpreter, we have to use the following moving pieces: 

  • First, we need to upload files using the Open AI File client 
  • Then, we need to connect the uploaded file to the Code Interpreter tool in either an assistant or a thread which would enable the assistant to answer questions based on the document.
For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

As you can see the code interpreter tool takes a few passes at the data. It tries to understand the document before answering the question. This is a really powerful feature and the possibilities are endless! 

Hope this helps.

Monday, 14 October 2024

Working with OpenAI Assistants: Using file search

This is the second post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the file search capabilities which allows us to upload files to the Assistants API and chat with them. See the following posts for the entire series:

Working with OpenAI Assistants: Create a simple assistant

Working with OpenAI Assistants: Using file search (this post)

Working with OpenAI Assistants: Chat with Excel files using code interpreter

Working with OpenAI Assistants: Using code interpreter to generate charts

The file search API uses the Retrieval Augmented Generation (RAG) pattern which has been made popular recently. The added advantage of using the Assistants API for this is that the API manages document chunking, vectorizing and indexing for us. Whereas without the Assistants API we would have to use a separate service like Azure AI Search and manage the document indexing ourselves. 

To upload and chat with documents using the Assistants API, we have to use the following moving pieces: 

  • First, we need to create a Vector Store in the Assistants API.
  • Then, we need to upload files using the Open AI File client and add them to the vector store.
  • Finally, we need to connect the vector store to either an assistant or a thread which would enable to assistant to answer questions based on the document.

For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

Limitations


As per OpenAI docs, there are some limitations for the file search tool:

  • Each vector store can hold up to 10,000 files.
  • The maximum file size of a file which can be uploaded is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file).

When querying for the documents in the vector store, we have to be aware of the following things which are not possible right now. However, the OpenAI team are working on this and some of these features will be available soon:

  • Support for deterministic pre-search filtering using custom metadata.
  • Support for parsing images within documents (including images of charts, graphs, tables etc.)
  • Support for retrievals over structured file formats (like csv or jsonl).
  • Better support for summarization — the tool today is optimized for search queries.

Current supported files types can be found in the OpenAI docs

Hope this helps!

Monday, 7 October 2024

Working with OpenAI Assistants: Create a simple assistant

With OpenAI's recently released Assistants API, building AI bots becomes a lot easier. Using the API, an assistant can leverage custom instructions, files and tools (previously called functions) and answer user questions based on them.

Before the Assistants API, building such assistants was possible but for a lot of things, we had to use our own services e.g. vector storage for file search, database for maintaining chat history etc.

The Assistants API gives us a handy wrapper on top of all these disparate services and a single endpoint to work with. So in this series of posts, let's have a look at what the Assistants API can do.

Working with OpenAI Assistants: Create a simple assistant (this post)

Working with OpenAI Assistants: Using file search

Working with OpenAI Assistants: Chat with Excel files using code interpreter

Working with OpenAI Assistants: Using code interpreter to generate charts

The first thing we are going to do is build a simple assistant which has a "SharePoint Tutor" personality. It will be used to answer questions for users who are learning to use SharePoint. Before deep diving into the code, lets understand the different moving pieces of the Assistants API: 

An assistant is a container in which all operations between the AI and the user are managed.

A thread is a list of messages which were exchanged between the user and AI. The thread is also responsible for maintaining the conversation history.

A run is a single invocation of an assistant based on the history in the thread as well as the tools available to the assistant. After a run is executed, new messages are generated and added to the thread.

For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

This was a simple assistant creation just to get us familiar with the Assitants API. In the next posts, we will dive deeper into the API and explore the more advanced concepts. Stay tuned!

Monday, 23 September 2024

Using gpt-4o vision to understand images

OpenAI released gpt-4o recently, which is the new flagship model that can reason across audio, vision, and text in real time. It's a single model which can be provided with multiple types of input (multi modal) and it can understand and respond based on all of them. 

The model is also available on Azure OpenAI and today we are going to have a look at how to work with images using the vision capabilities of gpt-4o. We will be providing it with images directly as part of the chat and asking it to analyse the images before responding. Let's see how it works:

We will be using the Azure OpenAI service for working with the OpenAI gpt-4o and since we will be using .NET code, we will need the Azure OpenAI .NET SDK v2:

1. Basic image analysis

First, let's start with a simple scenario of sending an image to the model and asking it to describe it.


2. Answer questions based on details in images

Next, let's give a slightly more complex image of  some ingredients and ask it to create a recipe:

Image source: allrecipes.com

3. Compare images

This one is my favourite, let's give it 2 images and ask it to compare them against each other. This can be useful in scenarios where there is a single "standard" image and we need to determine if another image adheres to the standard.

4. Binary data

If the URL of the image is not accessible anonymously, then we can also give the model binary data of the image:


5. Data URI


We can also use Data URI's instead of direct URLs



6. Limitations

As per OpenAI docs, there are some limitations of the vision model that we should be aware of:

Medical images: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.

Non-English: The model may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.

Small text: Enlarge text within the image to improve readability, but avoid cropping important details.

Rotation: The model may misinterpret rotated / upside-down text or images.

Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.

Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.

Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.

Image shape: The model struggles with panoramic and fisheye images.

Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.

Counting: May give approximate counts for objects in images.

CAPTCHAS: For safety reasons, we have implemented a system to block the submission of CAPTCHAs.


Overall, I do think the ability to combine text and image input as part of of the same chat is a game changer! This could unlock a lot of scenarios which were not possible just with a single mode of input. Very excited to see what is next!

Hope you found the post useful!