Together AI provides the fastest cloud platform for building and running generative AI. Today we are launching the Together Embeddings endpoint. As part of a series of blog posts about the Together Embeddings endpoint release, we are excited to announce that you can build your own powerful RAG-based application right from the Together platform with LlamaIndex.
What is Retrieval Augmented Generation (RAG)?
Retrieval Augmented Generation (RAG) (original paper, Lewis et al.), leverages both generative models and retrieval models for knowledge-intensive tasks. It improves Generative AI applications by providing up-to-date information and domain-specific data from external data sources during response generation, reducing the risk of hallucinations and significantly improving performance and accuracy.
Building a RAG system can be cost and data efficient without requiring technical expertise to train a model while keeping other advantages mentioned above. Note that you can still fine-tune an embedding or generative model to improve the quality of your RAG solution even further! Check out Together fine-tuning API to start.
To build RAG, you first need to create a vector store by indexing your source documents using an embedding model of your choice. LlamaIndex provides libraries to load and transform documents. After this step, you will create a VectorStoreIndex for your document objects with vector embeddings, and store them in a vector store. LlamaIndex supports numerous vector stores. See the complete list of supported vector stores here. Now when you have a query, you will retrieve relevant information from the vector store, augment it with your original query, and use an LLM to get your final output.
Below you will find an example of how you can incorporate a new article into your RAG application using the Together API and LlamaIndex, so that a generative model can respond with the correct information.
First, install the llama-index package from Pip. See the installation documentation for different ways to install.
Set the environment variables for the API keys. You can find the Together API key under the settings tab in Together Playground.
Now we will provide some of our recent blog posts including RedPajama-Data-v2, and ask "What is RedPajama-Data-v2?" with the retrieved information to Mixtral-8x7B-Instruct-v0.1 model, which is trained before the blog post was released. We will use "togethercomputer/m2-bert-80M-8k-retrieval" for embeddings.
The answer reflects the correct and recent information included in the blog post! If we only run the LLM completion using the same query, "What is RedPajama-Data-v2? Describe it in a simple sentence.", it returns a less informative response,
The above example demonstrates how to build a RAG (Retrieval-Augmented Generation) system using Together and LlamaIndex. By leveraging the power of these tools, you can create a generative model that provides accurate and up-to-date responses by retrieving relevant data from your vector store.
As you continue to explore the capabilities of Together APIs and LlamaIndex, we encourage you to experiment with different use cases and applications. We are excited to see the innovative solutions that you will build using these powerful tools.
Thank you for following along with this tutorial!