Skip to content

First AI Assistant with LlamaIndex and Azure AI

In today's digital era, many individuals use personal AI assistants to simplify their daily tasks, acquire creative ideas, or access information.

If you have data that is not public on the internet, you cannot ask your personal AI assistant about it without providing the context in the prompt. However, this is also not possible if you have large volumes of data due to the context size limitations of LLMs.

To mitigate this challenge, we'll show you how to create a simple personal AI assistant featuring a Retrieval-Augmented-Generation (RAG) system using LlamaIndex, leveraging the powerful capabilities of LLMs with your private data.

The application utilizes various Azure AI components to power the LLM and manage data retrieval.

This article will help you to understand and implement the following points:

  1. Configuration and Initialization: Setting up API keys, endpoints, and versions for Azure OpenAI and Azure Search.
  2. Initialization of Azure AI Components: Initializing the language model and embedding model with AzureOpenAI.
  3. Azure Search Vector Store Setup: Initializing a client to interact with Azure AI Search Index and setting up the vector store.
  4. Data Loading and Indexing: Creating a function to load and index data from a specified directory.
  5. Setting up the Chat Engine: Building a chat engine from the indexed data and enabling different chat modes for user interaction.

 
Key Azure components utilized in this application:

  • Azure OpenAI: Leveraged for the language model.
  • Azure Search: Used for data indexing and retrieval.
  • Azure AI Services: For text embeddings and related operations.

1.  Installation

 Install the necessary libraries using pip with the following commands:
 
pip install llama-index

pip install llama-index-vector-stores-azureaisearch
pip install azure-search-documents
pip install llama-index-embeddings-azure-openai
pip install llama-index-llms-azure-openai
 
Import the installed libraries:
 
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.core.settings import Settings
from llama_index.core.node_parser import SentenceSplitter
from azure.search.documents.indexes import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient as LISearchClient 
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore
from llama_index.vector_stores.azureaisearch import (
    IndexManagement,
)
 
 

2.  Configuration and Initialization

First, you need to configure Azure OpenAI by setting your API key, endpoint, and version. This step ensures that your application can communicate with the Azure OpenAI services.

# Configuration for Azure OpenAI
api_key = "your_azure_openai_api_key_here"
azure_endpoint = "your_azure_openai_endpoint_here"
api_version="2024-02-15-preview"
 
Similarly, configure Azure Search by setting the API key and endpoint. This configuration allows your application to interact with Azure Search for indexing and retrieving data.
 
# Configuration for Azure Search Service
search_service_api_key = "your_azure_search_service_admin_key_here"
search_service_endpoint= "your_azure_search_service_endpoint_here"
search_creds = AzureKeyCredential(search_service_api_key)
 
 

Download AI Readiness Checklist

Are you ready to harness the power of AI in your business? Get ahead with our 𝗙𝗥𝗘𝗘 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 𝗮𝗻𝗱 𝗔𝗜 𝗥𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁!

Download Your Checklist

3. Initialization of Azure AI Components

Initialize the language model and embedding model with the specified configurations. We will be using the gpt-35-turbo model from AzureOpenAI to generate responses, and text-embedding-ada-002 from AzureOpenAIEmbedding to convert text into numerical representations. Additionally, we provide our previously defined API key, Azure endpoint, and API version.

# Initialize the AzureOpenAI language model
llm = AzureOpenAI(
    model="gpt-35-turbo",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version
)

# Initialize the embedding model
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version
)

 

4. Azure Search Vector Store Setup

Initializing a client for accessing an Azure AI Search Index: SearchIndexClient is initialized with the specified configurations. This client allows you to manage and query your search index.

Azure Search Vector Store Initialization: Sets up the Azure AI Search Vector Store and configures various parameters such as field keys, dimensionality, and search settings. index_name is the name of the search index that the client interacts with. This store is responsible for managing the storage and retrieval of vector embeddings in your search index.

index_client = SearchIndexClient(
    endpoint=search_service_endpoint,
    credential=search_creds
)    

index_name = "llamaindex-vector-demo"

vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=index_name,
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="chunk",
    embedding_field_key="embedding",
    embedding_dimensionality=1536,
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    language_analyzer="en.lucene",
    vector_algorithm_type="exhaustiveKnn",
)

 

Define global settings

Before loading the data, we define our global settings. LlamaIndex v0.10 introduced the Settings object, which only needs to be defined once and can be used globally in our downstream code. Here we are configuring our LLM, which responds to prompts and queries, and our embedding model, responsible for converting text to numerical representations.

Settings.llm = llm
Settings.embed_model = embed_model

 

5. Data Loading and Indexing

 

load_and_index_data(path) function takes a path as an input, by default it takes the path to a directory named "data" that is in the same path as the script. You can put all the documents you want to use for retrieval in a directory and provide the path to it with this function.


docs = SimpleDirectoryReader(input_dir=path, recursive=True).load_data(): Reads the directory in the given "path".

recursive=True allows reading from subdirectories. Finally loads the documents into the docs variable. SimpleDirectoryReader supports lots of file types such as csv, docx, ipynb, md, mp3, pdf, png, and many more.

The StorageContext is set up with the initialized vector store.


Finally, we create a vector store with the storage_context and the loaded documents.

def load_and_index_data(path="./data"):
    docs = SimpleDirectoryReader(input_dir=path, recursive=True).load_data()
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)
    return index

 

Download AI Readiness Checklist

Are you ready to harness the power of AI in your business? Get ahead with our 𝗙𝗥𝗘𝗘 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 𝗮𝗻𝗱 𝗔𝗜 𝗥𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁!

Download Your Checklist

6. Setting up the Chat Engine

Here we call the load_and_index_data() function, we just implemented, to create a vector store index. Next, we are building a chat engine from the index and setting the chat mode to "condense_question". This mode generates a standalone question from conversation context and the last message, then queries the query engine with the generated question.

index = load_and_index_data()

chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True

 

You can also try out other chat modes:

  • "context" mode is a simple chat mode built on top of a retriever. For each interaction it retrieves text from the index using the input and sets the retrieved text as context in the system prompt. Finally, it returns an answer to the user.
  • "condense_plus_context" is a multi-step chat mode built on top of a retriever that combines condense question and context mode. For each interaction it generates a standalone question from the conversation and latest user message. It then builds a context for the standalone question and passes it along with prompt and user input to the LLM to generate a response.

7. CLI Interaction

At last, we start a conversation loop, where the user can input queries and the chatbot responds accordingly until the user exits.

while True:
    prompt = input("User: ")
    if prompt == "exit":
        break
    response = chat_engine.chat(prompt)
    print(f"Chatbot: {response}")                    

 

You should now be able to run the application and start communicating about the data you provided. The application will keep asking for your input until you type "exit" to stop. 

Download AI Readiness Checklist

Are you ready to harness the power of AI in your business? Get ahead with our 𝗙𝗥𝗘𝗘 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 𝗮𝗻𝗱 𝗔𝗜 𝗥𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁!

Download Your Checklist

Examples

To test our application, we put two PDF files in the data folder, containing information about coffee and tea, which we downloaded from Wikipedia. After running the application and waiting for a while to setup everything, we could start asking questions via the CLI interface. Here are some examples with their respective answers from the chatbot. The LLM also processes the user input for querying as we can see under "Querying with:" output.

Example 1:

LlamaIndex Example 1

Example 2:

Here we can see the benefits of condense_question mode. The LLM remembers the last question and combines the input from the last question with the current question to generate a query and returns a response.

LlamaIndex Example 2

Example 3:

LlamaIndex Example 3

Example 4:

LlamaIndex Example 4

 

Download AI Readiness Checklist

Are you ready to harness the power of AI in your business? Get ahead with our 𝗙𝗥𝗘𝗘 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗖𝗼𝗽𝗶𝗹𝗼𝘁 𝗮𝗻𝗱 𝗔𝗜 𝗥𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁!

Download Your Checklist
 
AI Readiness Checklist Valprovia

Conclusion

By following the steps outlined in this article, you have learned how to configure and initialize Azure OpenAI and Azure Search, set up an Azure Search Vector Store, embed and index your data, store it in a vector store, and build a chat engine.
 
These steps enable you to create a chatbot that can handle large volumes of private data efficiently, making use of various Azure AI components.
 
Key components used in this application include Azure OpenAI for the language model, Azure Search for data indexing and retrieval, and Azure AI Services for text embeddings and related operations. By installing the necessary libraries and configuring the Azure services, you can set up a robust system for creating an intelligent chatbot.
 
By implementing this RAG-based chatbot system, you can leverage LLMs to interact with your private data efficiently, overcoming the limitations of context size and enhancing the chatbot's capabilities. This foundation opens the door for further customization and improvement to meet specific needs and use cases, making your chatbot a powerful tool for various applications.
 
With these insights and practical steps, you are now equipped to create and deploy your own chatbot using LlamaIndex and Azure AI, empowering you to harness the full potential of modern AI technologies.