Monday, 9 December 2024

Search SharePoint and OneDrive files in natural language with OpenAI function calling and Microsoft Graph Search API

By now, we have seen "Chat with your documents" functionality being introduced in many Microsoft 365 applications. It is typically built by combining Large Language Models (LLMs) and vector databases. 

To make the documents "chat ready", they have to be converted to embeddings and stored in vector databases like Azure AI Search. However, indexing the documents and keeping the index in sync are not trivial tasks. There are many moving pieces involved. Also, many times there is no need for "similarity search" or "vector search" where the search is made based on meaning of the query. 

In such cases, a simple "keyword" search can do the trick. The advantage of using keyword search in Microsoft 365 applications is that the Microsoft Search indexes are already available as part of the service. APIs like the Microsoft Graph Search API and the SharePoint Search REST API give us "ready to consume" endpoints which can be used to query documents across SharePoint and OneDrive. Keeping these search indexes in sync with the changes in the documents is also handled by the Microsoft 365 service itself.

So in this post, let's have a look at how we can combine OpenAI's gpt-4o Large Language Model with Microsoft Graph Search API to query SharePoint and OneDrive documents in natural language. 

On a high level we will be using OpenAI function calling to achieve this. Our steps are going to be:

1. Define an OpenAI function and make it available to the LLM.  


2. During the course of the chat, if the LLM thinks that to respond to the user, it needs to call our function, it will respond with the function name along with the parameters.

3. Call the Microsoft Graph Search API based on the parameters provided by the LLM.

4. Send the results returned from the Microsoft Graph back to the LLM to generate a response in natural language.

So let's see how to achieve this. In this code I have used the following nuget packages:

https://www.nuget.org/packages/Azure.AI.OpenAI/2.1.0

https://www.nuget.org/packages/Microsoft.Graph/5.64.0

The first thing we will look at is our OpenAI function definition:

"functions": [{
"name": "search_microsoft365_documents",
"description": "Search the Microsfot 365 documents from user's SharePoint and OneDrive",
"parameters": {
"type": "object",
"required": ["searchQuery"],
"properties": {
"searchQuery": {
"type": "string",
"description": "the text to search in the documents to get the required information"
}
}
}
}
]
In this function we are informing the LLM that if needs to search any files as part of providing the responses, it can call this function. The function name will be returned in the response and the relevant parameter will be provided as well. Now let's see how our orchestrator function looks:

static async Task Main(string[] args)
{
string endpoint = "<azure-openi-key>";
string key = "<azure-openi-endpoint>";
string deploymentName = "gpt-4o";
var azureOpenAIClient = new AzureOpenAIClient(new Uri(endpoint), new ApiKeyCredential(key));
//get userQuestion from the console
Console.WriteLine("What would you like to search?: ");
string userQuestion = Console.ReadLine();
//1. Call Open AI Chat API with the user's question.
var chatCompletionResponse = await CallOpenAIAPI(userQuestion, deploymentName, azureOpenAIClient);
//2. Check if the Chat API decided that for answering the question, a function call to the MS Graph needs to be made.
if (chatCompletionResponse.Value.FinishReason == ChatFinishReason.ToolCalls)
{
string functionName = chatCompletionResponse.Value.ToolCalls[0].FunctionName;
BinaryData functionArguments = chatCompletionResponse.Value.ToolCalls[0].FunctionArguments;
string toolCallId = chatCompletionResponse.Value.ToolCalls[0].Id;
Console.WriteLine($"Function Name: {functionName}, Params: {functionArguments}");
if (functionName == "search_microsoft365_documents")
{
//3. If the MS Graph function call needs to be made, the Chat API will also provide which parameters need to be passed to the function.
var searchParams = JsonSerializer.Deserialize<M365SearchQueryParams>(functionArguments);
//4. Call the MS Graph with the parameters provided by the Chat API
var functionResponse = await ExecuteMicrosoft365SearchWithGraph(searchParams.searchQuery);
Console.WriteLine($"Graph Response: {functionResponse}");
//5. Call the Chat API again with the function response.
var functionMessages = new List<OpenAI.Chat.ChatMessage>
{
new AssistantChatMessage(new List<ChatToolCall>() { ChatToolCall.CreateFunctionToolCall(toolCallId, functionName, functionArguments) }),
new ToolChatMessage(toolCallId, functionResponse)
};
chatCompletionResponse = await CallOpenAIAPI(userQuestion, deploymentName, azureOpenAIClient, functionMessages);
//6. Print the final response from the Chat API.
Console.WriteLine("------------------");
Console.WriteLine(chatCompletionResponse.Value.Content[0].Text);
}
}
else
{
//If the LLM decided that a function call is not needed, print the final response from the Chat API.
Console.WriteLine(chatCompletionResponse.Value.Content[0].Text);
}
}

There is a lot to unpack here as this function is the one which does the heavy lifting. This code is responsible for handling the chat with OpenAI, calling the MS Graph and also responding back to the user based on the response from the Graph. 

Next, let's have a look at the code which calls the Microsoft Graph based on the parameters provided by the LLM. 

Before executing this code, you will need to have created an App registration. Here is how to do that: https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app 

Since we are calling the Microsoft Graph /search endpoint with delegated permissions, the app registration will need a minimum of the User.Read and Files.Read.All permissions granted. https://learn.microsoft.com/en-us/graph/api/search-query?view=graph-rest-1.0&tabs=http

private static async Task<string> ExecuteMicrosoft365SearchWithGraph(string searchQuery)
{
// To initialize your graphClient, see https://learn.microsoft.com/en-us/graph/sdks/create-client?from=snippets&tabs=csharp
GraphServiceClient graphClient = GetGraphClient(["User.Read", "Files.Read.All"]);
var requestBody = new QueryPostRequestBody
{
Requests = new List<SearchRequest>
{
new SearchRequest
{
EntityTypes = new List<EntityType?>
{
EntityType.DriveItem,
},
Query = new SearchQuery
{
QueryString = searchQuery,
},
From = 0,
Size = 25,
},
},
};
var searchResults = await graphClient.Search.Query.PostAsQueryPostResponseAsync(requestBody);
var result = string.Empty;
foreach (var hit in searchResults.Value[0].HitsContainers[0].Hits)
{
var listItem = hit.Resource as DriveItem;
if (listItem != null)
{
//Using the summary of the search result. In production, it might be the case that the summary is not enough.
//In that situation, the solution could be to fetch the file contents through another graph call
result += hit.Summary;
}
}
return result;
}
This code get the parameters sent from the LLM and uses the Microsoft Graph .NET SDK to call the /search endpoint and fetch the files based on the searchQuery properties. Once the files are returned, their summary value is concatenated into a string and returned to the orchestrator function so that it can be sent again to the LLM. 

Finally, lets have a look at our CallOpenAI function which is responsible for talking to the Open AI chat api.
 
private static async Task<ClientResult<ChatCompletion>> CallOpenAIAPI(string userQuestion, string modelDeploymentName, AzureOpenAIClient azureOpenAIClient, IList<OpenAI.Chat.ChatMessage> functionMessages = null)
{
var chatCompletionOptions = new ChatCompletionOptions();
var messages = new List<OpenAI.Chat.ChatMessage>
{
new SystemChatMessage("You are a search assistant that helps find information. Only use the functions and parameters you have been provided with."),
new UserChatMessage(userQuestion)
};
if (functionMessages != null)
{
foreach (var functionMessage in functionMessages)
{
messages.Add(functionMessage);
}
}
chatCompletionOptions.Tools.Add(ChatTool.CreateFunctionTool(
functionName: "search_microsoft365_documents",
functionDescription: "Search the Microsfot 365 documents from user's SharePoint and OneDrive.",
functionParameters: BinaryData.FromString("{\"type\": \"object\",\"required\": [\"searchQuery\"],\"properties\": {\"searchQuery\": {\"type\": \"string\",\"description\": \"the text to search in the documents to get the required information\"}}}")
));
var chatCompletionResponse = await azureOpenAIClient.GetChatClient(modelDeploymentName).CompleteChatAsync(messages, chatCompletionOptions);
return chatCompletionResponse;
}
This code defines the Open AI function which will be included in our Chat API calls. Also, the user's search query is sent to the API to determine if the function needs to be called. This function is also called again after the response from the Microsoft Graph is fetched. At that time, this function contains the details fetched from the Graph to generate an output in natural language. This way, we can use Open AI function calling together with Microsoft Graph API to search files in SharePoint and OneDrive.

Hope this helps!

Tuesday, 5 November 2024

Working with OpenAI Assistants: Using code interpreter to generate charts

This is the fourth post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the code interpreter tool which allows us to generate charts based on some data. This is very powerful for scenarios where you have to do data analysis on JSON, csv or Microsoft Excel files and generate charts and reports based on them.

See the following posts for the entire series:

Working with the OpenAI Assistants API: Create a simple assistant

Working with the OpenAI Assistants API: Using file search 

Working with the OpenAI Assistants API: Chat with Excel files using Code interpreter 

Working with the OpenAI Assistants API: Using code interpreter to generate charts (this post) 

The Code Interpreter tool has access to a sandboxed python code execution environment within the Assistants API. This can provide very useful as the Assistants API can iteratively run code against the files provided to it and generate charts!

So in this post, let's see how we can generate charts based on an excel file with the code interpreter tool. The excel file we will be querying will be the same one we used in the last post. It contains details of customers like their name and the licenses purchased of a fictional product by them:

To generate charts using the Code interpreter, we have to use the following moving pieces: 

  • First, we need to upload the excel file using the Open AI File client 
  • Then, we need to connect the uploaded file to the Code Interpreter tool in either an assistant or a thread which would enable the assistant to generate a chart on the document.
For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

var azureClient = new AzureOpenAIClient(new Uri(endpoint), new ApiKeyCredential(key));
OpenAIFileClient fileClient = azureClient.GetOpenAIFileClient();
AssistantClient assistantClient = azureClient.GetAssistantClient();
OpenAIFile infoFile = await fileClient.UploadFileAsync("C:\\Users\\vardh\\Documents\\Customers.xlsx", FileUploadPurpose.Assistants);
AssistantCreationOptions assistantOptions = new()
{
Name = "CodeInterpreterProMAX",
Instructions =
"You are an assistant that looks up sales data and helps visualize the information based"
+ " on user queries. When asked to generate a graph, chart, or other visualization, use"
+ " the code interpreter tool to do so.",
Tools =
{
new CodeInterpreterToolDefinition()
},
ToolResources = new()
{
CodeInterpreter = new()
{
FileIds = { infoFile.Id },
}
}
};
Assistant assistant = assistantClient.CreateAssistant(deploymentName, assistantOptions);
ThreadCreationOptions threadOptions = new()
{
InitialMessages = { "Can you plot a bar graph for all customers and their purchases?" },
};
ThreadRun threadRun = assistantClient.CreateThreadAndRun(assistant.Id, threadOptions);
do
{
Thread.Sleep(TimeSpan.FromSeconds(1));
Console.WriteLine($"Thread run status: {threadRun.Status}");
threadRun = assistantClient.GetRun(threadRun.ThreadId, threadRun.Id);
} while (!threadRun.Status.IsTerminal);
CollectionResult<ThreadMessage> messages = assistantClient.GetMessages(threadRun.ThreadId, new MessageCollectionOptions() { Order = MessageCollectionOrder.Ascending });
foreach (ThreadMessage message in messages)
{
Console.Write($"[{message.Role.ToString().ToUpper()}]: ");
foreach (MessageContent contentItem in message.Content)
{
if (!string.IsNullOrEmpty(contentItem.Text))
{
Console.WriteLine($"{contentItem.Text}");
}
if (!string.IsNullOrEmpty(contentItem.ImageFileId))
{
OpenAIFile imageInfo = fileClient.GetFile(contentItem.ImageFileId);
BinaryData imageBytes = fileClient.DownloadFile(contentItem.ImageFileId);
using FileStream stream = File.OpenWrite($"{imageInfo.Filename}.png");
imageBytes.ToStream().CopyTo(stream);
Console.WriteLine($"<image: {imageInfo.Filename}.png>");
}
}
}

[USER]: Can you plot a bar graph for all customers and their purchases?

[ASSISTANT] : Sure, let's first inspect the contents of the uploaded file to understand its structure and extract the necessary data.

[ASSISTANT] : The dataset contains the following columns:

  • Customer Name
  • Email
  • Licenses Purchased
  • Total Amount Paid

To create a bar graph of all customers and their purchases, we will use the "Customer Name" and "Licenses Purchased" columns.

Let's proceed with creating the bar graph.

[ASSISTANT] : <image: ea8db0b2-2f7a-420e-9c77-c081b7bd0132.png>

Here is the bar graph showing the number of licenses purchased by each customer. If you need any further analysis or additional visualizations, please let me know!

view raw output.md hosted with ❤ by GitHub

And this is the file generated by the code interpreter tool:

As you can see the code interpreter tool takes a few passes at the data. It tries to understand the document before generating the chart. This is a really powerful feature and the possibilities are endless! 

Hope this helps.

Monday, 4 November 2024

Working with OpenAI Assistants: Chat with Excel files using Code interpreter

This is the third post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the code interpreter tool which allows us to upload files to the Assistants API and write python code against them. This is very powerful for scenarios where you have to do data analysis on csv or Microsoft Excel files and generate charts and reports on them.

See the following posts for the entire series:

Working with the OpenAI Assistants: Create a simple assistant

Working with the OpenAI Assistants: Using file search 

Working with the OpenAI Assistants: Chat with Excel files using code interpreter (this post) 

Working with OpenAI Assistants: Using code interpreter to generate charts

The Retrieval Augmented Generation (RAG) pattern, which was discussed in previous posts, works great for text based files like Microsoft Word and PDF documents. However, when it comes to structured data files like csv or excel, it comes out short. An this where the Code Interpreter tool can come in very handy. It can repetitively run python code on documents until it is confident that the user's question has been answered.

So in this post, let's see how we can query an excel file with the code interpreter tool. The excel file we will be querying will contain details of customers like their name and the licenses purchased of a fictional product by them:

To upload and analyse documents using the Code interpreter, we have to use the following moving pieces: 

  • First, we need to upload files using the Open AI File client 
  • Then, we need to connect the uploaded file to the Code Interpreter tool in either an assistant or a thread which would enable the assistant to answer questions based on the document.
For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

var azureClient = new AzureOpenAIClient(new Uri(endpoint), new ApiKeyCredential(key));
OpenAIFileClient fileClient = azureClient.GetOpenAIFileClient();
AssistantClient assistantClient = azureClient.GetAssistantClient();
OpenAIFile infoFile = await fileClient.UploadFileAsync("C:\\Customers.xlsx", FileUploadPurpose.Assistants);
AssistantCreationOptions assistantOptions = new()
{
Name = "CodeInterpreterProMAX",
Instructions =
"You are an assistant that looks up sales data and helps visualize the information based"
+ " on user queries. When asked to generate a graph, chart, or other visualization, use"
+ " the code interpreter tool to do so.",
Tools =
{
new CodeInterpreterToolDefinition()
},
ToolResources = new()
{
CodeInterpreter = new()
{
FileIds = { infoFile.Id },
}
}
};
Assistant assistant = assistantClient.CreateAssistant(deploymentName, assistantOptions);
ThreadCreationOptions threadOptions = new()
{
InitialMessages = { "Which customer has purchased the most licenses?" },
};
ThreadRun threadRun = assistantClient.CreateThreadAndRun(assistant.Id, threadOptions);
do
{
Thread.Sleep(TimeSpan.FromSeconds(1));
Console.WriteLine($"Thread run status: {threadRun.Status}");
threadRun = assistantClient.GetRun(threadRun.ThreadId, threadRun.Id);
} while (!threadRun.Status.IsTerminal);
CollectionResult<ThreadMessage> messages = assistantClient.GetMessages(threadRun.ThreadId, new MessageCollectionOptions() { Order = MessageCollectionOrder.Ascending });
foreach (ThreadMessage message in messages)
{
Console.Write($"[{message.Role.ToString().ToUpper()}]: ");
foreach (MessageContent contentItem in message.Content)
{
if (!string.IsNullOrEmpty(contentItem.Text))
{
Console.WriteLine($"{contentItem.Text}");
}
}
}

[USER]: Which customer has purchased the most licenses?

[ASSISTANT]: Let's first take a look at the content of the uploaded file to understand its structure. I'll load the file and inspect the first few rows of data.

[ASSISTANT]: The data contains information about customers, including their names, email addresses, the number of licenses they purchased, and the total amount they paid.

To identify the customer who has purchased the most licenses, we will find the customer with the maximum value in the "Licenses Purchased" column.

[ASSISTANT]: The customer who has purchased the most licenses is Alice Johnson, with a total of 25 licenses purchased.

view raw output.md hosted with ❤ by GitHub

As you can see the code interpreter tool takes a few passes at the data. It tries to understand the document before answering the question. This is a really powerful feature and the possibilities are endless! 

Hope this helps.

Monday, 14 October 2024

Working with OpenAI Assistants: Using file search

This is the second post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the file search capabilities which allows us to upload files to the Assistants API and chat with them. See the following posts for the entire series:

Working with OpenAI Assistants: Create a simple assistant

Working with OpenAI Assistants: Using file search (this post)

Working with OpenAI Assistants: Chat with Excel files using code interpreter

Working with OpenAI Assistants: Using code interpreter to generate charts

The file search API uses the Retrieval Augmented Generation (RAG) pattern which has been made popular recently. The added advantage of using the Assistants API for this is that the API manages document chunking, vectorizing and indexing for us. Whereas without the Assistants API we would have to use a separate service like Azure AI Search and manage the document indexing ourselves. 

To upload and chat with documents using the Assistants API, we have to use the following moving pieces: 

  • First, we need to create a Vector Store in the Assistants API.
  • Then, we need to upload files using the Open AI File client and add them to the vector store.
  • Finally, we need to connect the vector store to either an assistant or a thread which would enable to assistant to answer questions based on the document.

For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

string endpoint = "https://<myopenaiservice>.openai.azure.com/";
string key = "<my-open-ai-service-key>";
string deploymentName = "gpt-4o";
var azureClient = new AzureOpenAIClient(new Uri(endpoint), new ApiKeyCredential(key));
OpenAIFileClient fileClient = azureClient.GetOpenAIFileClient();
AssistantClient assistantClient = azureClient.GetAssistantClient();
VectorStoreClient vectorClient = azureClient.GetVectorStoreClient();
var vectorStore = vectorClient.CreateVectorStore(true, new VectorStoreCreationOptions()
{
Name = "focusworks_ai_vector_store",
//Make the documents expire after 3 days of inactivity.
ExpirationPolicy = new VectorStoreExpirationPolicy() {
Anchor = VectorStoreExpirationAnchor.LastActiveAt,
Days = 3
}
});
//Create and upload sample document
using Stream document = BinaryData.FromBytes(@"Focusworks AI is a versatile productivity tool designed to streamline your workflow and enhance collaboration within Microsoft Teams. With its internet-connected ChatGPT bot, you can engage in insightful conversations on any topic, leveraging a rich knowledge base to gain valuable insights. It also empowers you to create stunning AI-powered images effortlessly, simply by describing what you envision in your own words.
One of the standout features of Focusworks AI is its ability to interact with your data. You can upload documents, ask questions, and have a dynamic conversation with your information, uncovering details and insights you might have missed. The AI is also tailored to help you craft more effective Teams messages, improving communication quality and ensuring your ideas are clearly conveyed. Additionally, it can summarize both your personal and group chats, making it easy to extract key points and stay updated.
Sharing your generated content and insights with colleagues is made seamless through Focusworks AI. You can post directly to Teams channels and group chats, ensuring everyone stays informed. The intuitive dashboard allows you to view all your recently created content and quickly access the relevant chats or channels, keeping your workflow organized and efficient. With Focusworks AI, you can eliminate information overload and enjoy a more productive work environment. Try the app for free and conveniently upgrade to a subscription if it elevates your workflow!"u8.ToArray()).ToStream();
OpenAIFile infoFile = await fileClient.UploadFileAsync(document, "focusworks_ai.txt", FileUploadPurpose.Assistants);
await vectorClient.AddFileToVectorStoreAsync(vectorStore.VectorStoreId, infoFile.Id, true);
AssistantCreationOptions assistantOptions = new()
{
Name = "FileSearchPro",
Instructions =
@"You are FileSearchPro, an intelligent assistant designed to help users locate information within their uploaded files. Your primary function is to search through these documents and provide accurate, concise answers to users' questions. You understand various file types and can extract relevant data, ensuring users get the information they need quickly and efficiently.
Key Features:
Efficiently search all uploaded documents to extract precise information.
Provide clear, straightforward answers directly from the file contents.
Maintain confidentiality and security of all user data.
Offer guidance on effective search queries if needed.
Always strive to deliver accurate and helpful information, enhancing users' ability to access and utilize their stored documents effectively.",
Tools =
{
new FileSearchToolDefinition(),
},
ToolResources = new() //Files can be specified at the assistant level.
{
FileSearch = new()
{
VectorStoreIds = { vectorStore.VectorStoreId },
}
}
};
Assistant assistant = assistantClient.CreateAssistant(deploymentName, assistantOptions);
ThreadCreationOptions threadOptions = new()
{
InitialMessages = { "What is Focusworks AI?" },
//Files can also be specified at the thread level.
//ToolResources = new()
//{
// FileSearch = new()
// {
// VectorStoreIds = { vectorStore.VectorStoreId },
// }
//}
};
ThreadRun threadRun = assistantClient.CreateThreadAndRun(assistant.Id, threadOptions);
do
{
Thread.Sleep(TimeSpan.FromSeconds(1));
threadRun = assistantClient.GetRun(threadRun.ThreadId, threadRun.Id);
} while (!threadRun.Status.IsTerminal);
CollectionResult<ThreadMessage> messages = assistantClient.GetMessages(threadRun.ThreadId, new MessageCollectionOptions() { Order = MessageCollectionOrder.Ascending });
foreach (ThreadMessage message in messages)
{
Console.Write($"[{message.Role.ToString().ToUpper()}]: ");
foreach (MessageContent contentItem in message.Content)
{
if (!string.IsNullOrEmpty(contentItem.Text))
{
Console.WriteLine($"{contentItem.Text}");
if (contentItem.TextAnnotations.Count > 0)
{
Console.WriteLine();
}
// Include annotations, if any.
foreach (TextAnnotation annotation in contentItem.TextAnnotations)
{
if (!string.IsNullOrEmpty(annotation.InputFileId))
{
Console.WriteLine($"* File citation, file ID: {annotation.InputFileId}");
}
if (!string.IsNullOrEmpty(annotation.OutputFileId))
{
Console.WriteLine($"* File output, new file ID: {annotation.OutputFileId}");
}
}
}
}
}

[USER]: What is Focusworks AI?

[ASSISTANT]: Focusworks AI is a productivity tool designed to enhance collaboration and streamline workflows within Microsoft Teams. It features an internet-connected ChatGPT bot that allows users to engage in insightful conversations and gain valuable insights from a rich knowledge base. The tool can create AI-powered images by simply describing users' visions. A key feature of Focusworks AI is its ability to interact with users' data, enabling document uploads and dynamic conversations to uncover insights. It also improves communication by helping craft more effective Teams messages and by summarizing both personal and group chats.

  • File citation, file ID: assistant-VVwKBdUixwPyk6RuOEnJpixh
view raw output.md hosted with ❤ by GitHub

Limitations


As per OpenAI docs, there are some limitations for the file search tool:

  • Each vector store can hold up to 10,000 files.
  • The maximum file size of a file which can be uploaded is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file).

When querying for the documents in the vector store, we have to be aware of the following things which are not possible right now. However, the OpenAI team are working on this and some of these features will be available soon:

  • Support for deterministic pre-search filtering using custom metadata.
  • Support for parsing images within documents (including images of charts, graphs, tables etc.)
  • Support for retrievals over structured file formats (like csv or jsonl).
  • Better support for summarization — the tool today is optimized for search queries.

Current supported files types can be found in the OpenAI docs

Hope this helps!

Monday, 7 October 2024

Working with OpenAI Assistants: Create a simple assistant

With OpenAI's recently released Assistants API, building AI bots becomes a lot easier. Using the API, an assistant can leverage custom instructions, files and tools (previously called functions) and answer user questions based on them.

Before the Assistants API, building such assistants was possible but for a lot of things, we had to use our own services e.g. vector storage for file search, database for maintaining chat history etc.

The Assistants API gives us a handy wrapper on top of all these disparate services and a single endpoint to work with. So in this series of posts, let's have a look at what the Assistants API can do.

Working with OpenAI Assistants: Create a simple assistant (this post)

Working with OpenAI Assistants: Using file search

Working with OpenAI Assistants: Chat with Excel files using code interpreter

Working with OpenAI Assistants: Using code interpreter to generate charts

The first thing we are going to do is build a simple assistant which has a "SharePoint Tutor" personality. It will be used to answer questions for users who are learning to use SharePoint. Before deep diving into the code, lets understand the different moving pieces of the Assistants API: 

An assistant is a container in which all operations between the AI and the user are managed.

A thread is a list of messages which were exchanged between the user and AI. The thread is also responsible for maintaining the conversation history.

A run is a single invocation of an assistant based on the history in the thread as well as the tools available to the assistant. After a run is executed, new messages are generated and added to the thread.

For the demo code, we will be using the Azure OpenAI service for working with the OpenAI gpt-4o model and since we will be using .NET code, we will need the Azure OpenAI .NET SDK as well as Azure.AI.OpenAI.Assistants nuget packages.

string endpoint = "https://<myopenaiservice>.openai.azure.com/";
string key = "<my-open-ai-service-key>";
string deploymentName = "gpt-4o";
//We are using the Azure OpenAI Service so create an Azure OpenAI client
var azureClient = new AzureOpenAIClient(new Uri(endpoint), new ApiKeyCredential(key));
AssistantClient assistantClient = azureClient.GetAssistantClient();
//Create assistant
AssistantCreationOptions assistantOptions = new()
{
Name = "SharePoint Tutor",
Instructions =
"You are SharePoint Tutor, an expert assistant in guiding users through SharePoint. " +
"Your role is to provide clear, step-by-step instructions, troubleshooting tips, and best practices for managing and navigating SharePoint. " +
"Always respond in a friendly, approachable tone, and aim to make even complex tasks easy to understand. " +
"When users ask for help with specific tasks, provide detailed instructions. If they encounter issues, offer practical solutions and guidance. " +
"Ensure your responses are concise yet informative, and be ready to suggest tips that improve the user's overall SharePoint experience."
};
Assistant assistant = assistantClient.CreateAssistant(deploymentName, assistantOptions);
//Create thread
ThreadCreationOptions threadOptions = new()
{
InitialMessages = { "What is a webpart?" }
};
//Create a run
ThreadRun threadRun = assistantClient.CreateThreadAndRun(assistant.Id, threadOptions, new RunCreationOptions() { });
do
{
Thread.Sleep(TimeSpan.FromSeconds(1));
threadRun = assistantClient.GetRun(threadRun.ThreadId, threadRun.Id);
} while (!threadRun.Status.IsTerminal);
//Get messages from a thread. This includes the most recent created message as well.
CollectionResult<ThreadMessage> messages = assistantClient.GetMessages(threadRun.ThreadId, new MessageCollectionOptions() { Order = MessageCollectionOrder.Ascending });
foreach (ThreadMessage message in messages)
{
Console.Write($"[{message.Role.ToString().ToUpper()}]: ");
foreach (MessageContent contentItem in message.Content)
{
if (!string.IsNullOrEmpty(contentItem.Text))
{
Console.WriteLine($"{contentItem.Text}");
if (contentItem.TextAnnotations.Count > 0)
{
Console.WriteLine();
}
}
}
Console.WriteLine();
}

[USER]: What is a webpart?

[ASSISTANT]: A web part is a versatile and reusable building block in SharePoint that allows you to add dynamic content and functionality to your SharePoint pages. You can think of a web part as a widget that you can place on a page to display information or provide certain functionality, such as showing a calendar, a list of documents, news headlines, or even embedding video content.

view raw output.md hosted with ❤ by GitHub
This was a simple assistant creation just to get us familiar with the Assitants API. In the next posts, we will dive deeper into the API and explore the more advanced concepts. Stay tuned!

Monday, 23 September 2024

Using gpt-4o vision to understand images

OpenAI released gpt-4o recently, which is the new flagship model that can reason across audio, vision, and text in real time. It's a single model which can be provided with multiple types of input (multi modal) and it can understand and respond based on all of them. 

The model is also available on Azure OpenAI and today we are going to have a look at how to work with images using the vision capabilities of gpt-4o. We will be providing it with images directly as part of the chat and asking it to analyse the images before responding. Let's see how it works:

We will be using the Azure OpenAI service for working with the OpenAI gpt-4o and since we will be using .NET code, we will need the Azure OpenAI .NET SDK v2:

1. Basic image analysis

First, let's start with a simple scenario of sending an image to the model and asking it to describe it.

using Azure.AI.OpenAI;
using Azure;
using OpenAI.Chat;
using System.Text.Json;
namespace OpenAI.SDK.Test
{
internal class Program
{
static async Task Main(string[] args)
{
string endpoint = "https://<myopenaiservice>.openai.azure.com/";
string key = "<my-open-ai-service-key>";
string deploymentName = "gpt-4o";
var openAiClient = new AzureOpenAIClient(
new Uri(endpoint),
new AzureKeyCredential(key),
new AzureOpenAIClientOptions(AzureOpenAIClientOptions.ServiceVersion.V2024_06_01));
var chatClient = openAiClient.GetChatClient(deploymentName);
List<ChatMessage> messages = [
new UserChatMessage(
ChatMessageContentPart.CreateImageMessageContentPart(
new Uri("https://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Orange_tabby_cat_sitting_on_fallen_leaves-Hisashi-01A.jpg/360px-Orange_tabby_cat_sitting_on_fallen_leaves-Hisashi-01A.jpg"),
ImageChatMessageContentPartDetail.High)
),
new UserChatMessage("Describe the image to me")
];
ChatCompletion chatCompletion = chatClient.CompleteChat(messages);
Console.WriteLine($"[ASSISTANT]: {chatCompletion}");
}
}
}

[ASSISTANT]: The image shows a ginger and white cat sitting on a ground covered with dry leaves. The cat has a white chest and paws, with a ginger coat on its back and head. Its ears are perked up, and it appears to be looking intently at something. The background is out of focus, highlighting the cat as the main subject.

view raw output.md hosted with ❤ by GitHub


2. Answer questions based on details in images

Next, let's give a slightly more complex image of  some ingredients and ask it to create a recipe:

Image source: allrecipes.com
var chatClient = openAiClient.GetChatClient(deploymentName);
List<ChatMessage> messages = [
new UserChatMessage(
ChatMessageContentPart.CreateImageMessageContentPart(
new Uri("https://www.allrecipes.com/thmb/HbnN9fkzDBmzI83sbxOhtbfEQUE=/750x0/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/AR-15022-veggie-pizza-DDMFS-4x3-step-01-a32ad6054e974ecd9f79c8627bc9e811.jpg"),
ImageChatMessageContentPartDetail.High)
),
new UserChatMessage("What can I cook with these ingredients?"),
];
ChatCompletion chatCompletion = chatClient.CompleteChat(messages);
Console.WriteLine($"[ASSISTANT]: {chatCompletion}");

[ASSISTANT]: With these ingredients, you can make a vegetable pizza on a crescent roll crust. Here's a simple recipe:

Ingredients

  • Crescent roll dough (premade)
  • Cream cheese
  • Sour cream
  • Grated carrots
  • Broccoli florets
  • Chopped red bell pepper
  • Sliced radishes
  • Diced onion
  • Sliced celery
  • Ranch dressing mix (or powdered seasoning)

Instructions

  1. Prepare the Crust:

    • Preheat your oven according to the crescent roll package instructions.
    • Roll out the crescent dough onto a baking sheet, pressing the seams together to form a crust.
    • Bake until golden brown. Let it cool completely.
  2. Make the Spread:

    • In a mixing bowl, combine the cream cheese, sour cream, and ranch dressing mix until smooth.
  3. Assemble the Pizza:

    • Spread the cream cheese mixture evenly over the cooled crust.
    • Top with chopped broccoli, grated carrots, diced red bell pepper, sliced radishes, diced onion, and sliced celery.
  4. Serve:

    • Cut into squares and serve chilled.

This is a refreshing and colorful appetizer or snack!

view raw output.md hosted with ❤ by GitHub

3. Compare images

This one is my favourite, let's give it 2 images and ask it to compare them against each other. This can be useful in scenarios where there is a single "standard" image and we need to determine if another image adheres to the standard.

var chatClient = openAiClient.GetChatClient(deploymentName);
List<ChatMessage> messages = [
new UserChatMessage(
ChatMessageContentPart.CreateImageMessageContentPart(
new Uri("https://dalleprodsec.blob.core.windows.net/private/images/generated_00.png"),
ImageChatMessageContentPartDetail.High)
),
new UserChatMessage(
ChatMessageContentPart.CreateImageMessageContentPart(
new Uri("https://dalleprodsec.blob.core.windows.net/private/images/generated_01.png"),
ImageChatMessageContentPartDetail.High)
),
new UserChatMessage("Using the first image as a reference, what should change in the second image so that both images are similar?"),
];
ChatCompletion chatCompletion = chatClient.CompleteChat(messages);
Console.WriteLine($"[ASSISTANT]: {chatCompletion}");

[ASSISTANT]: To make the second image more similar to the first image, consider the following changes:

  1. Noodle Type: Use spaghetti instead of rotini pasta.

  2. Ingredients: Incorporate cherry tomatoes and black olives, as seen in the first image, and reduce the variety of vegetables.

  3. Color Scheme: Emphasize more red and yellow colors from the tomatoes and peppers, similar to the first image.

  4. Garnish and Seasoning: Add fresh herbs like basil and a sprinkle of grated cheese or visible seasoning such as pepper.

  5. Presentation: Arrange the dish more concentrically for a spiral design, as in the first image.

view raw output.md hosted with ❤ by GitHub

4. Binary data

If the URL of the image is not accessible anonymously, then we can also give the model binary data of the image:


var chatClient = openAiClient.GetChatClient(deploymentName);
List<ChatMessage> messages = [
new UserChatMessage(
ChatMessageContentPart.CreateImageMessageContentPart(BinaryData.FromStream(File.OpenRead("C:\\images\\ROBOT ASTRONAUT .png")), "image/jpg" ,
ImageChatMessageContentPartDetail.High)
),
new UserChatMessage("What is in this image?")
];
ChatCompletion chatCompletion = chatClient.CompleteChat(messages);
Console.WriteLine($"[ASSISTANT]: {chatCompletion}");

[ASSISTANT]: The image depicts a futuristic humanoid robot standing in an alien landscape. The robot has a detailed, intricate design with visible mechanical components and a spacesuit-like exterior. In the background, a colorful cosmic scene with stars, a luminous nebula, and a distant planet can be seen, creating a sci-fi atmosphere.

view raw output.md hosted with ❤ by GitHub

5. Data URI


We can also use Data URI's instead of direct URLs



var chatClient = openAiClient.GetChatClient(deploymentName);
string dataURI = "data:image/jpeg;base64,<long-data-uri-of-image>";
//convert data uri to binary data
byte[] binaryData = Convert.FromBase64String(dataURI.Split(',')[1]);
List<ChatMessage> messages = [
new UserChatMessage(
ChatMessageContentPart.CreateImageMessageContentPart(BinaryData.FromBytes(binaryData), "image/jpeg",
ImageChatMessageContentPartDetail.High)
),
new UserChatMessage("What is in this image?")
];
ChatCompletion chatCompletion = chatClient.CompleteChat(messages);
Console.WriteLine($"[ASSISTANT]: {chatCompletion}");

[ASSISTANT]: The image depicts a dramatic scene of a dragon with red scales and glowing eyes emerging from the clouds. Sunlight beams illuminate the creature, highlighting its sharp features and wings. The setting gives a mystical and powerful atmosphere.

view raw output.md hosted with ❤ by GitHub

6. Limitations

As per OpenAI docs, there are some limitations of the vision model that we should be aware of:

Medical images: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.

Non-English: The model may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.

Small text: Enlarge text within the image to improve readability, but avoid cropping important details.

Rotation: The model may misinterpret rotated / upside-down text or images.

Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.

Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.

Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.

Image shape: The model struggles with panoramic and fisheye images.

Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.

Counting: May give approximate counts for objects in images.

CAPTCHAS: For safety reasons, we have implemented a system to block the submission of CAPTCHAs.


Overall, I do think the ability to combine text and image input as part of of the same chat is a game changer! This could unlock a lot of scenarios which were not possible just with a single mode of input. Very excited to see what is next!

Hope you found the post useful!


Sunday, 10 March 2024

Create a Microsoft 365 Copilot plugin: Extend Microsoft 365 Copilot's knowledge

Microsoft 365 Copilot is an enterprise AI tool that is already trained on your Microsoft 365 data. If you want to "talk" to data such as your emails, Teams chats or SharePoint documents, then all of it is already available as part of it's "knowledge".

However, not all the data you want to work with will live in Microsoft 365. There will be instances when you want to use Copilot's AI on data residing in external systems. So how do we extend the knowledge of Microsoft 365 Copilot with real time data coming from external systems? The answer is by using plugins! Plugins not only help us do Retrieval Augmented Generation (RAG) with Copilot, but they also provide a framework for writing data to external systems. 

To know more about the different Microsoft 365 Copilot extensibility options, please have a look here: https://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/decision-guide

So in this post, let's have a look at how to build a plugin which talks to an external API and then infuses the real time knowledge into Copilot's AI. At the time of this writing, there is nothing more volatile than Cryptocurrency prices! So, I will be using a cryptocurrency price API and enhance Microsoft 365 Copilot's knowledge with real time Bitcoin and Ethereum rates!

(click to zoom)

So let's see the different moving parts of the plugin. We will be using a Microsoft Teams message extension built on the Bot Framework as a base for our plugin:  

1) App manifest

This is by far the most important part of the plugin. The name and description (both short and long) are what tell Copilot about the nature of the plugin and when to invoke it to get external data. We have to be very descriptive and clear about the features of the plugin here as this is what the Copilot will use to determine whether the plugin is invoked. The parameter descriptions are used to tell Copilot how to create the parameters required by the plugin based on the conversation.

{
"$schema": "https://developer.microsoft.com/en-us/json-schemas/teams/v1.16/MicrosoftTeams.schema.json",
"manifestVersion": "1.16",
"version": "1.0.0",
"id": "${{TEAMS_APP_ID}}",
"packageName": "com.microsoft.teams.extension",
"name": {
"short": "Cryptocurrency",
"full": "Cryptocurrency prices"
},
"description": {
"short": "Get the latest cryptocurrency prices",
"full": "Get the latest cryptocurrency prices and share them in a conversation. Search cryptocurrencies by name."
},
"composeExtensions": [
{
"botId": "${{BOT_ID}}",
"commands": [
{
"id": "getPrice",
"description": "Get cryptocurrency price by name",
"title": "Get Cryptocurrentcy Price",
"type": "query",
"initialRun": false,
"fetchTask": false,
"context": [
"commandBox",
"compose",
"message"
],
"parameters": [
{
"name": "cryptocurrencyName",
"title": "Cryptocurrency Name",
"description": "The name of the cryptocurrency. Value should either be bitcoin or ethereum. Output: bitcoin or ethereum",
"inputType": "text"
}
]
}
]
}
]
//Other properties removed for brevity
}
view raw teamsapp.json hosted with ❤ by GitHub

2) Teams messaging extension code

This function does the heavy lifting in our code. It is called with the parameters specified in the app manifest by Copilot. Based on the parameters we can fetch external data and return it as adaptive cards. 

protected override async Task<MessagingExtensionResponse> OnTeamsMessagingExtensionQueryAsync(ITurnContext<IInvokeActivity> turnContext, MessagingExtensionQuery query, CancellationToken cancellationToken)
{
var templateJson = await System.IO.File.ReadAllTextAsync(_adaptiveCardFilePath, cancellationToken);
var template = new AdaptiveCards.Templating.AdaptiveCardTemplate(templateJson);
var text = query?.Parameters?[0]?.Value as string ?? string.Empty;
var rate = await FindRate(text);
var attachments = new List<MessagingExtensionAttachment>();
string resultCard = template.Expand(rate);
var previewCard = new HeroCard
{
Title = rate.id,
Subtitle = rate.rateUsd
}.ToAttachment();
var attachment = new MessagingExtensionAttachment
{
Content = JsonConvert.DeserializeObject(resultCard),
ContentType = "application/vnd.microsoft.card.adaptive",
Preview = previewCard,
};
attachments.Add(attachment);
return new MessagingExtensionResponse
{
ComposeExtension = new MessagingExtensionResult
{
Type = "result",
AttachmentLayout = "list",
Attachments = attachments
}
};
}

3) Talk to the external system (Cryptocurrency API)

This is helper function which is used to actually talk to the crypto api and return rates. 

private async Task<Rate> FindRate(string currency)
{
var httpClient = new HttpClient();
var response = await httpClient.GetStringAsync($"https://api.coincap.io/v2/rates/{currency}");
var obj = JObject.Parse(response);
var rateData = JsonConvert.DeserializeObject<Rate>(obj["data"].ToString());
rateData.rateUsd = decimal.Parse(rateData.rateUsd).ToString("C", CultureInfo.GetCultureInfo(1033));
return rateData;
}
view raw FindRate.cs hosted with ❤ by GitHub

Hope you found this post useful! 

The code for this solution is available on GitHub: https://github.com/vman/M365CopilotPlugin

Thursday, 15 February 2024

Generate images using Azure OpenAI DALL·E 3 in SPFx

Dall E 3 is the latest AI image generation model coming out of OpenAI. It is leaps and bounds ahead of the previous model Dall E 2. Having explored both, the image quality as well as the adherence to text prompts is much better for Dall E 3. It is now available as a preview in Azure OpenAI Service as well.

Given all this, it is safe to say if you are working on the Microsoft stack and want to generate images with AI, using the Azure OpenAI Dall E 3 model would be the recommended option.

In this post, let's explore the image generation API for Dall E 3 and also how to use it from a SharePoint Framework (SPFx) solution. The full code of the solution is available on GitHub: https://github.com/vman/Augmentech.OpenAI

First, let's build the web api which will wrap the Azure OpenAI API to create images. This will be a simple ASP.NET Core Web API which will accept a text prompt and return the generated image to the client.

To run this code, we will need the following NuGet package: https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.13/

[HttpGet]
public async Task<IActionResult> Get(string imagePrompt)
{
string deploymentName = Configuration["AzureOpenAI:ImageModelName"];
Response<ImageGenerations> imageGenerations = await _openAIClient.GetImageGenerationsAsync(
new ImageGenerationOptions()
{
DeploymentName = deploymentName,
Prompt = imagePrompt,
Size = ImageSize.Size1024x1024
});
Uri imageUri = imageGenerations.Value.Data[0].Url;
string revisedPrompt = imageGenerations.Value.Data[0].RevisedPrompt;
return Ok(new
{
Url = imageUri.ToString(),
RevisedPrompt = revisedPrompt
});
}

Now for calling the API, we will use a standard React based SPFx webpart. The webpart will use Fluent UI controls to grab the text prompt from user and send it to our API.

import * as React from "react";
import styles from "./ImageCreator.module.scss";
import type { IImageCreatorProps } from "./IImageCreatorProps";
import { PrimaryButton, TextField, Image, Stack, Label } from "@fluentui/react";
import { HttpClient } from "@microsoft/sp-http";
const ImageCreator: React.FunctionComponent<IImageCreatorProps> = (props) => {
const [imagePromptValue, setImagePromptValue] = React.useState("");
const [imageSrc, setImageSrc] = React.useState("");
const [revisedPrompt, setRevisedPrompt] = React.useState("");
const _generateImageClicked = async () : Promise<void> => {
const response = await props.wpContext.httpClient.get(`http://localhost:5236/dalle3?imagePrompt=${imagePromptValue}`, HttpClient.configurations.v1);
const responseJson = await response.json();
setImageSrc(responseJson.url);
setRevisedPrompt(responseJson.revisedPrompt);
};
const onChangeFirstTextFieldValue = React.useCallback((event: React.FormEvent<HTMLInputElement | HTMLTextAreaElement>, newValue?: string) => {
setImagePromptValue(newValue || "");
},[]);
return (
<section className={`${styles.imageCreator}`}>
<Stack tokens={{ childrenGap: 5 }}>
<Stack horizontal tokens={{ childrenGap: 5 }}>
<Stack.Item grow={3}>
<TextField value={imagePromptValue} onChange={onChangeFirstTextFieldValue} />
</Stack.Item>
<PrimaryButton text="Generate Image" onClick={_generateImageClicked}/>
</Stack>
<Label>{revisedPrompt}</Label>
<Image src={imageSrc} />
</Stack>
</section>
);
};
export default ImageCreator;
view raw dalle3.spfx.ts hosted with ❤ by GitHub
Hope this helps!

Thursday, 25 January 2024

Get structured JSON data back from GPT-4-Turbo

With the latest gpt-4-turbo model out recently, there is one very helpful feature which came with it: The JSON mode option. Using JSON mode, we are able to predictably get responses back from OpenAI in structured JSON format. 

This can help immensely when building APIs using Large Language Models (LLMs). Even though the model can be instructed to return JSON in it's system prompt, previously, there was no guarantee that the model would return valid JSON. With the JSON mode option now, we can specify the required format and the model will return data according to it. 

To know more about JSON mode, have a look at the official OpenAI docs: https://platform.openai.com/docs/guides/text-generation/json-mode

Now let's look at some code to see how this works in action:

I am using the Azure OpenAI service to host the gpt-4-turbo model and I am also using the v1.0.0.-beta.12 version of the Azure OpenAI .NET SDK found on NuGet here:

https://www.nuget.org/packages/Azure.AI.OpenAI/1.0.0-beta.12

using Azure;
using Azure.AI.OpenAI;
using System.Text.Json;
namespace OpenAIJsonMode
{
internal class Program
{
static async Task Main(string[] args)
{
Uri azureOpenAIResourceUri = new("https://<your-azure-openai-service>.openai.azure.com/");
AzureKeyCredential azureOpenAIApiKey = new("<your-azure-openai-key>");
string deploymentName = "gpt-4-1106-preview"; //Deployment name
OpenAIClient openAIclient = new(azureOpenAIResourceUri, azureOpenAIApiKey);
var completionOptions = new ChatCompletionsOptions()
{
DeploymentName = deploymentName,
Temperature = 1,
MaxTokens = 500,
ResponseFormat = ChatCompletionsResponseFormat.JsonObject,
Messages = {
new ChatRequestSystemMessage("You are a data extraction assistant. " +
"I will give you some text. You will extract and return the names of major cities in the world from the text. " +
"Your response should only contain the names in json format. " +
"The format of the json should be: {\"cities\": [\"<city1>\", \"<city2>\", \"<city3>\"]}}"),
new ChatRequestUserMessage($"TEXT: In the dynamic tapestry of our planet, bustling metropolises emerge as beacons of innovation, culture, and diversity. From the energetic streets of New York, where towering skyscrapers paint the horizon, to the romantic canals of Venice, where gondolas glide through winding waterways, each city possesses a distinct charm. The vibrant streets of Tokyo beckon with neon lights and technological marvels, while in the heart of the desert, Dubai's futuristic skyline dazzles with its architectural wonders. Along the banks of the River Thames, London's historic landmarks stand as a testament to centuries of rich heritage, while the aromatic streets of Marrakech offer an immersive sensory experience. Whether strolling through the vibrant markets of Istanbul or wandering the enchanting streets of Paris, these cities become the tapestry upon which the stories of countless lives are woven. They are the nodes that connect us, the vibrant hubs where dreams are born, and the melting pots of cultures that shape the world.")
}
};
var completionResponse = await openAIclient.GetChatCompletionsAsync(completionOptions);
//LLM response is in JSON format {"cities": ["New York", "Venice", "Tokyo", "Dubai", "London", "Marrakech", "Istanbul", "Paris"]}
var cityResponse = JsonSerializer.Deserialize<CityResponse>(completionResponse.Value.Choices[0].Message.Content);
Console.WriteLine($"The cities mentioned in the text are: {string.Join(", ", cityResponse.cities)}");
}
}
public class CityResponse
{
public List<string> cities { get; set; }
}
}
view raw OpenAIJSON.cs hosted with ❤ by GitHub

What is happening in the code is that in the system message, we are instructing the LLM that analyse the text provided by the user and then extract the cities mentioned in this text and return them in the specified JSON format. 

Also important is line 22 where we explicitly specify to use the response format as JSON.   

Next, we provide the actually text to parse in the user message. 

Once we get the data back in expected JSON schema, we are able to convert it to objects which can be used in code.

And as expected we get the following output:


Hope this helps!