This is the second post in the series where we explore the OpenAI Assistants API. In this post, we will be looking at the file search capabilities which allows us to upload files to the Assistants API and chat with them. See the following posts for the entire series:
Working with OpenAI Assistants: Create a simple assistant
Working with OpenAI Assistants: Using file search (this post)
Working with OpenAI Assistants: Chat with Excel files using code interpreter
Working with OpenAI Assistants: Using code interpreter to generate charts
The file search API uses the Retrieval Augmented Generation (RAG) pattern which has been made popular recently. The added advantage of using the Assistants API for this is that the API manages document chunking, vectorizing and indexing for us. Whereas without the Assistants API we would have to use a separate service like Azure AI Search and manage the document indexing ourselves.
To upload and chat with documents using the Assistants API, we have to use the following moving pieces:
- First, we need to create a Vector Store in the Assistants API.
- Then, we need to upload files using the Open AI File client and add them to the vector store.
- Finally, we need to connect the vector store to either an assistant or a thread which would enable to assistant to answer questions based on the document.
string endpoint = "https://<myopenaiservice>.openai.azure.com/"; | |
string key = "<my-open-ai-service-key>"; | |
string deploymentName = "gpt-4o"; | |
var azureClient = new AzureOpenAIClient(new Uri(endpoint), new ApiKeyCredential(key)); | |
OpenAIFileClient fileClient = azureClient.GetOpenAIFileClient(); | |
AssistantClient assistantClient = azureClient.GetAssistantClient(); | |
VectorStoreClient vectorClient = azureClient.GetVectorStoreClient(); | |
var vectorStore = vectorClient.CreateVectorStore(true, new VectorStoreCreationOptions() | |
{ | |
Name = "focusworks_ai_vector_store", | |
//Make the documents expire after 3 days of inactivity. | |
ExpirationPolicy = new VectorStoreExpirationPolicy() { | |
Anchor = VectorStoreExpirationAnchor.LastActiveAt, | |
Days = 3 | |
} | |
}); | |
//Create and upload sample document | |
using Stream document = BinaryData.FromBytes(@"Focusworks AI is a versatile productivity tool designed to streamline your workflow and enhance collaboration within Microsoft Teams. With its internet-connected ChatGPT bot, you can engage in insightful conversations on any topic, leveraging a rich knowledge base to gain valuable insights. It also empowers you to create stunning AI-powered images effortlessly, simply by describing what you envision in your own words. | |
One of the standout features of Focusworks AI is its ability to interact with your data. You can upload documents, ask questions, and have a dynamic conversation with your information, uncovering details and insights you might have missed. The AI is also tailored to help you craft more effective Teams messages, improving communication quality and ensuring your ideas are clearly conveyed. Additionally, it can summarize both your personal and group chats, making it easy to extract key points and stay updated. | |
Sharing your generated content and insights with colleagues is made seamless through Focusworks AI. You can post directly to Teams channels and group chats, ensuring everyone stays informed. The intuitive dashboard allows you to view all your recently created content and quickly access the relevant chats or channels, keeping your workflow organized and efficient. With Focusworks AI, you can eliminate information overload and enjoy a more productive work environment. Try the app for free and conveniently upgrade to a subscription if it elevates your workflow!"u8.ToArray()).ToStream(); | |
OpenAIFile infoFile = await fileClient.UploadFileAsync(document, "focusworks_ai.txt", FileUploadPurpose.Assistants); | |
await vectorClient.AddFileToVectorStoreAsync(vectorStore.VectorStoreId, infoFile.Id, true); | |
AssistantCreationOptions assistantOptions = new() | |
{ | |
Name = "FileSearchPro", | |
Instructions = | |
@"You are FileSearchPro, an intelligent assistant designed to help users locate information within their uploaded files. Your primary function is to search through these documents and provide accurate, concise answers to users' questions. You understand various file types and can extract relevant data, ensuring users get the information they need quickly and efficiently. | |
Key Features: | |
Efficiently search all uploaded documents to extract precise information. | |
Provide clear, straightforward answers directly from the file contents. | |
Maintain confidentiality and security of all user data. | |
Offer guidance on effective search queries if needed. | |
Always strive to deliver accurate and helpful information, enhancing users' ability to access and utilize their stored documents effectively.", | |
Tools = | |
{ | |
new FileSearchToolDefinition(), | |
}, | |
ToolResources = new() //Files can be specified at the assistant level. | |
{ | |
FileSearch = new() | |
{ | |
VectorStoreIds = { vectorStore.VectorStoreId }, | |
} | |
} | |
}; | |
Assistant assistant = assistantClient.CreateAssistant(deploymentName, assistantOptions); | |
ThreadCreationOptions threadOptions = new() | |
{ | |
InitialMessages = { "What is Focusworks AI?" }, | |
//Files can also be specified at the thread level. | |
//ToolResources = new() | |
//{ | |
// FileSearch = new() | |
// { | |
// VectorStoreIds = { vectorStore.VectorStoreId }, | |
// } | |
//} | |
}; | |
ThreadRun threadRun = assistantClient.CreateThreadAndRun(assistant.Id, threadOptions); | |
do | |
{ | |
Thread.Sleep(TimeSpan.FromSeconds(1)); | |
threadRun = assistantClient.GetRun(threadRun.ThreadId, threadRun.Id); | |
} while (!threadRun.Status.IsTerminal); | |
CollectionResult<ThreadMessage> messages = assistantClient.GetMessages(threadRun.ThreadId, new MessageCollectionOptions() { Order = MessageCollectionOrder.Ascending }); | |
foreach (ThreadMessage message in messages) | |
{ | |
Console.Write($"[{message.Role.ToString().ToUpper()}]: "); | |
foreach (MessageContent contentItem in message.Content) | |
{ | |
if (!string.IsNullOrEmpty(contentItem.Text)) | |
{ | |
Console.WriteLine($"{contentItem.Text}"); | |
if (contentItem.TextAnnotations.Count > 0) | |
{ | |
Console.WriteLine(); | |
} | |
// Include annotations, if any. | |
foreach (TextAnnotation annotation in contentItem.TextAnnotations) | |
{ | |
if (!string.IsNullOrEmpty(annotation.InputFileId)) | |
{ | |
Console.WriteLine($"* File citation, file ID: {annotation.InputFileId}"); | |
} | |
if (!string.IsNullOrEmpty(annotation.OutputFileId)) | |
{ | |
Console.WriteLine($"* File output, new file ID: {annotation.OutputFileId}"); | |
} | |
} | |
} | |
} | |
} |
[USER]: What is Focusworks AI?
[ASSISTANT]: Focusworks AI is a productivity tool designed to enhance collaboration and streamline workflows within Microsoft Teams. It features an internet-connected ChatGPT bot that allows users to engage in insightful conversations and gain valuable insights from a rich knowledge base. The tool can create AI-powered images by simply describing users' visions. A key feature of Focusworks AI is its ability to interact with users' data, enabling document uploads and dynamic conversations to uncover insights. It also improves communication by helping craft more effective Teams messages and by summarizing both personal and group chats.
- File citation, file ID: assistant-VVwKBdUixwPyk6RuOEnJpixh
Limitations
- Each vector store can hold up to 10,000 files.
- The maximum file size of a file which can be uploaded is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file).
- Support for deterministic pre-search filtering using custom metadata.
- Support for parsing images within documents (including images of charts, graphs, tables etc.)
- Support for retrievals over structured file formats (like csv or jsonl).
- Better support for summarization — the tool today is optimized for search queries.