September 09, 2023

Building LLM-based Application using Langchain and OpenAI

Alex Semeniuchenko

Lead of AI Engineering

14 min

How Brocoders made an r&d to fulfill our client's requirements for a chat application that seamlessly integrates with their large knowledge database.

Initial Statement

C.I.A.Services is a reputable full-service company catering to homeowners associations in the USA. With approximately 150 clients, the company effectively manages over 50,000 properties in the Houston and San Antonio areas.

Client requested Brocoders to make an R&D of an AI chatbot capable of addressing property management and community association management services inquiries. This advanced chatbot will leverage the association's documents, videos, and frequently asked questions (FAQs) to provide accurate responses.

The desired outcome is to have the chatbot deliver precise and relevant answers to questions posed by C.I.A.Services employees. This will enable the company's team members to efficiently access information, enhance customer service, and expedite their workflow.

However, there are certain challenges that we need to overcome

  • Various types of data - documents in PDF, docs, videos and FAQ
  • Various quality of documents with the relevant information
  • Build not only a question-answer chat but make it capable of understanding custom questions and saving the context of the conversation

After gathering all requirements from the client we started our R&D process.

Initial solution concept

The initial concept revolves around leveraging the Documents Indexing Processor (GPT Index) to establish a connection with the GPT/LLM model.

Process:

  1. Building the Knowledge Database
  • Configure the Documents Indexing Processor to establish a connection with the client's data sources and ingest the documents.
  • Implement the GPT model to collaborate with the Documents Indexing Processor.
  1. Developing a Web Interface for the Chatbot
  • Create a user-friendly web interface for the chatbot that captures user details and comprehends user inquiries.
  1. Providing Automated Updates
  • Offer a streamlined and automated method to update the AI model based on the client's documents and instructions.

Frame 11565.png

Building knowledge database

We began creating a knowledge database using the clients' documents. This database will serve as a crucial resource for the chatbot's functionality.

Acquiring Documents

To gather all documents into one place, we had to either upload all the documents to our own database, by utilizing JSON file with all links (that we were provided with) or use the client’s server. We uploaded all documents to our server to have more control over the data.

Convert all information to the text and quality analyze

First of all we need to convert all information into the text. To extract text from the documents and make it accessible for further analysis, we employed OCR technology. Specifically, we utilized the Tesseract OCR engine, a popular open-source solution. We can convert the documents' images or scanned pages into machine-readable text. Also, we leveraged Tesseract to evaluate the quality of each document. This assessment helps identify potential issues or errors introduced during the OCR process, such as incorrect character recognition or formatting discrepancies. Document quality assessment is crucial to ensure the reliability and accuracy of the knowledge database. It's also important to find an areas for improvements for the overall result: we try to improve one part of the whole process (OCR accuracy) and see how it impacts on the overall result.

This database, containing text-based information, serves as the foundation for the subsequent stages of our research.

Adding embeddings

In the field of natural language processing (NLP), embeddings serve as a technique to transform textual data into a numerical format that can be understood and processed by machine learning algorithms. For our knowledge database, we utilized OpenAI API to generate embeddings. OpenAI's text embeddings are designed to measure the relationship between different text strings.

These embeddings find applications in various areas, including:
Search: Text strings can be ranked based on their relevance to a query, enabling effective search functionality.
Clustering: Text strings can be grouped together based on their similarity, facilitating clustering analysis.
Recommendations: By considering the relatedness of text strings, items with similar content can be recommended to users.
Anomaly Detection: Outliers that exhibit minimal relatedness to other text strings can be identified, aiding in anomaly detection tasks.
Diversity Measurement: Similarity distributions of text strings can be analyzed to assess the diversity within a dataset.
Classification: Text strings can be classified based on their similarity to predefined labels, enabling effective classification tasks.

Developing a Context-Aware Chatbot: Exploring Potential and Challenges with LlamaChain and Langchain

Custom data source was ready and now we had to enable connection between it and large language mode. To do this we have chosen Llamaindex. LlamaIndex is a flexible and straightforward data framework that provides the key tools to augment your LLM applications with data, such as data ingestion, data indexing, query interface.

Initially, our approach involved utilizing the conversation context encapsulation capabilities offered by LlamaIndex. However, we encountered some challenges using LlamaIndex because it's undergoing frequent updates. Additionally, the tools provided by Llamaindex for chat context handling essentially acted as a wrapper around another powerful technology known as Langchain. Therefore, to optimize our approach, we decided to delve deeper into the core technology, Langchain.

LangChain is an open-source development framework specifically designed for applications utilizing large language models (LLMs). It provides various components that serve as abstractions, enabling more efficient and programmatic utilization of LLMs.

These components include:
Models: Such as ChatGPT or other large language models.
Prompts: These encompass prompt templates and output parsers.
Indexes: Ingesting external data, including document loaders and vector stores.
Chains: Combining different components to create end-to-end use cases. For instance, a simple chain could involve Prompt + LLM + Output Parser.
Agents: Facilitating the utilization of external tools by LLMs.
By exploring the potential of LangChain, we aim to enhance the abilities of LLMs and develop chatbot solutions that excel at maintaining conversation context and delivering more advanced functionalities.

Indexes

Indexes in LangChain are used to structure documents for optimal interaction with large language models (LLMs). The indexing module provides utility functions and examples for working with different types of indexes.

The most common use of indexes in LangChain is for retrieval, where relevant documents are fetched based on a user's query. LangChain primarily supports vector databases as the main index type.

The indexing components in LangChain include document loaders, text splitters, vector stores, and retrievers. Document loaders retrieve documents from various sources, text splitters break text into smaller chunks, vector stores are the main index type relying on embeddings, and retrievers fetch relevant documents for use with language models.

During testing, we experimented with different index types, including tree, vector, and graph indexes. However, the Vector database showed the most promising results, leading us to focus on its utilization.

By leveraging LangChain's indexing capabilities, we optimized our chatbot's performance, ensuring efficient retrieval and use of relevant information.

Below is some information regarding various index types and our conclusions after testing them.

  1. Vector Index

The vector store index stores each Node and a corresponding embedding in a Vector Store.

Frame 11552.png

Our conclusion The Vector Index is primarily used for the retrieval of specific context. It is particularly effective when the query contains a piece of context that can be found in the document text. From a personal perspective, the Vector Index is considered the best index available. It outperforms other indices in retrieving information and works well with all types of questions, with the exception of summarization. The most significant advantage of this index type is its rapid retrieval of required information. We experimented with different vector databases, specifically the one that Llama uses by default and another one called Chroma. Our aim was to identify which database offered the most efficient storage and retrieval of data.

  1. Tree Index

The tree index builds a hierarchical tree from a set of Nodes (which become leaf nodes in this tree).

Frame 11553.png

Our conclusion The Tree Index is best suited for summarizing documents or sets of documents. It can also be used for simple retrieval, but the Vector Index is more efficient for this purpose. The main drawback of the Tree Index is its relatively long response time, making it more suitable for summarization tasks.

  1. List Index

The list index simply stores Nodes as a sequential chain.

Frame 11554.png

Our conclusion Similar to the Tree Index, the List Index is also best suited for summarizing documents or sets of documents. However, it differs in its underlying implementation. From a personal standpoint, the Tree Index outperforms the List Index in terms of speed and accuracy in finding the right information. The List Index was found to be slower and less efficient.

  1. Graph Indices

Graph Indices are a type of index where we have a root index, such as a List Index, that contains other indices. This type of index is useful when we have a multitude of different indices and we want to combine them into one. All sub-indices should have a good description so that the appropriate index can be chosen for use. There are three types of Graph Indices:

  • List Index with Vector Indices. In theory, splitting one large index into smaller indices should reduce response time, but this was not observed with this type of graph.
  • Tree Index with Vector Indices. This is similar to the List Index with Vector Indices, but it operates slightly faster.
  • Simple Keyword Index with Vector Indices. This type of index works best for split indices. It significantly speeds up our response time compared to one large index. However, all sub-indices should have a very good description with a lot of keywords, as this type of graph will select the sub-index that has the most keywords in its description.

Vector stores

We used vector stores to store and search information via embeddings, we implemented on the previous step. VectorStore serves as a storage facility for these embeddings, allowing efficient search based on semantic similarity.

Chains

Having confirmed the effectiveness of our vector-based search in retrieving relevant documents, our next objective is to feed them into the LLM to generate more detailed and human-like responses.

To accomplish this, we can utilize the chain component of LangChain, which allows us to create sequences of modular components for specific use cases. In the context of interacting with indexes and combining our own data with LLMs, LangChain provides index-related chains. One prominent example is question answering over your own documents, where the goal is to integrate your indexed data with the LLM. To achieve this, LangChain supports four common methods or chains:

  1. Stuffing

Frame 11555.png

The simplest approach involves including all relevant data as context in the prompt passed to the LLM. This is implemented as the StuffDocumentsChain in LangChain.
Pros: Only requires a single call to the LLM, allowing access to all the data at once.
Cons: Limited by the context length of the LLM, making it unsuitable for large documents or multiple documents exceeding the context length.

  1. Map Reduce

Frame 11556.png

Source

This method entails running an initial prompt on each chunk of data and generating individual outputs. Subsequently, a separate prompt is used to combine these initial outputs.

Pros: Can handle larger documents and more documents compared to Stuff Documents Approach. Calls to the LLM for individual documents are independent and can be parallelized.
Cons: Requires multiple calls to the LLM, losing some information during the final combination.

  1. Refine

Frame 11557.png

Source

The refine method involves running an initial prompt on the first chunk of data and generating output. This output, along with the subsequent document, is then passed to the LLM to refine the response based on the new information.

Pros: Allows for pulling in more relevant context and may retain more information compared to Map Reduce method.
Cons: Requires multiple calls to the LLM, and the calls are not independent, preventing parallelization. The order of the documents may also impact the results.

  1. Map-Rerank

Frame 11559.png

This approach entails running an initial prompt on each chunk of data, considering not only task completion but also assigning a score indicating the certainty of the answer. The responses are then ranked based on these scores, and the highest-scoring answer is returned.

Pros: Similar advantages as Map Reduce with fewer calls to the LLM.
Cons: Cannot combine information between documents, making it most suitable for scenarios with a single, straightforward answer within a single document.

By leveraging Stuffing method, we can effectively process and integrate our indexed data with LLMs to generate more comprehensive and contextually appropriate responses.

We also build our chain based on RetrievalQA and Conversational Retrieval QA provided by Langchain.

Frame 11399.png

Agents

In certain applications, the required chain of calls to LLMs or other tools may not be predetermined but rather dependent on the user's input. In such cases, an "agent" is employed, which has access to a range of tools. Based on the user's input, the agent determines which tools, if any, should be called.

During our exploration, we examined various agent implementations and the tools provided by LangChain. Unfortunately, none proved to be a satisfactory fit for our specific requirements.

Nevertheless, we discovered that LangChain offers the flexibility to create our own agent and tool implementations. With this in mind, we embarked on devising an implementation tailored to our needs. However, we encountered an error within LangChain: our tool, built using an index generated by LlamaIndex, was not recognized as a valid tool. The underlying cause of this error remains elusive at this time.

We have shifted our strategy towards constructing a tool exclusively using LangChain's tools, without relying on LlamaChain. We speculate that this approach may yield positive results, as LangChain's index exhibits slight differences compared to that of LlamaChain. This divergence might potentially address the error we encountered in our previous attempts.

Prompt templates

A PromptValue represents the final value passed to the model. Typically, this value is not hardcoded but dynamically generated based on a combination of user input, non-static information from multiple sources, and a fixed template string. The component responsible for creating the PromptValue is called a PromptTemplate. It exposes a method that takes input variables and returns the corresponding PromptValue.

Prompt Experimentation
We conducted experiments with various prompts to optimize the chatbot's responses. Our goal was to engineer prompts that would elicit the most relevant and useful responses from the chatbot.

LLM Testing
We evaluated different versions of LLMs, including text-davinci-003, gpt-3.5-turbo, and gpt-4. The text-davinci-003 model performed modestly, while the differences between gpt-3.5-turbo and gpt-4 results were not significant.

Input Parameter Testing
We explored various input parameters for the LLM, such as the number of input tokens and output tokens, to assess their impact on the chatbot's performance.

Key Parameters:
Prompt: This refers to the input text that you want the AI to respond to. It can be a question, statement, or any other text you want the model to process.
Max tokens: This parameter sets the maximum length of the generated response, specifying the maximum number of tokens (text chunks) the model should produce.
Temperature: This parameter controls the level of randomness in the model's output. Higher values (close to 1.0) make the output more diverse and creative, while lower values (close to 0.0) make it more deterministic and focused.
Top p or top_k: These parameters are used for techniques like nucleus sampling or top-k sampling, which probabilistically determine the next word in the sequence to add diversity to the model's output.
Frequency Penalty: This parameter discourages the use of common phrases or responses. Higher values promote more original output, while lower values allow more common phrases.
Presence Penalty: This parameter encourages the model to introduce new topics in its output. Higher values lead to more diverse topics, while lower values keep the output focused on the input topic.

Compare/Contrast Queries Technique

This technique is used for decomposing complex questions into sub-questions so that it can collect all necessary data from our documents. Sometimes it decomposes well and logically and asks the right questions, but sometimes it does not. The main issue with this technique is the response time, which is excessively long even for small sets of data.

Enhancing AI Performance through Questions Decomposition and Context Retrieval Optimization

Our ongoing investigations have led us to hypothesize that some inaccurate responses from the OpenAI could potentially stem from sub-optimal context retrieval by the Langchain. In light of this, we have embarked on implementing and testing custom retrieval processing to verify and potentially rectify this issue.

In our quest for enhanced accuracy, we have initiated an innovative research project focused on generating synonymous questions to more accurately capture the appropriate context from the index. Intriguingly, our preliminary testing revealed a subjective accuracy score of 0.7. We observed that slight variations in question phrasing can lead to significant differences in the accuracy of the response.

For instance:
Question: "What is the due date for payments?" - produced no answer.
Question: "What are the due rules for payments?" - produced the required answer.

This pattern was also observed with a different set of questions:
Question: "For how long should confidential data be kept?" - produced the required answer.
Question: "For how long should confidential information be kept?" - produced no answer.

In light of these findings, we explored further by rephrasing questions that initially did not yield an answer. For example, we inputted the question, "For how long should confidential information be kept?" and obtained a list of synonymous questions that yielded appropriate responses. Some of these synonymous questions include:
"What is the required duration for retaining sensitive data?"
"Is there a specified time frame for maintaining private information?"
"What is the time period for the storage of classified information?"
"Does the document specify how long confidential details should be preserved?"
"For what duration is secret information intended to be kept?"
"How long does the document suggest to hold proprietary data?"
"Is there a recommended period for keeping confidential information secure?"

This research indicates that we may be able to engineer prompts to indicate whether a response is successful (true/false) and possibly even the source of the response (no answer/from context/from LLM knowledge). This functionality could allow us to filter responses and feed several questions until the required answer is achieved, potentially enhancing the performance and utility of our AI systems.

Developing a Web Interface for the Chatbot

Screen Shot 2023-09-20 at 1.17.37 PM.png

Finally, we can move on to the main part - how to show this to the customer. To do this we have to prepare Backend and Frontend.

Make a python webserverRefactor index server endpoints(4) with community query param
Database and Backend DevelopmentBuild a database and backend using Node.js for the web server. Integrate with the Langchain service and provide endpoints for the front-end.
Http client module
Public endpoints
File upload
Frontend DevelopmentDesign the frontend part including the chat feature.
Front-end configuration(linting etc)
Change community refactor(localstorage)
4.56
Thank you for reading! Leave us your feedback!
4535 ratings

Read more on our blog