Together AI provides the fastest cloud platform for building and running generative AI. Today we are launching the Together Embeddings endpoint. As part of a series of blog posts about the Together Embeddings endpoint release, we are excited to announce that you can build your own powerful RAG-based application right from the Together platform with MongoDB’s Atlas Vector Search.
What is Retrieval Augmented Generation (RAG)?
Retrieval Augmented Generation (RAG) (original paper, Lewis et al.), leverages both generative models and retrieval models for knowledge-intensive tasks. It improves Generative AI applications by providing up-to-date information and domain-specific data from external data sources during response generation, reducing the risk of hallucinations and significantly improving performance and accuracy.
Building a RAG system can be cost and data efficient without requiring technical expertise to train a model while keeping other advantages mentioned above. Note that you can still fine-tune an embedding or generative model to improve the quality of your RAG solution even further! Check out Together fine-tuning API to start.
To use RAG, you first populate a vector database using an embedding model of your choice. For example, this database may contain recent knowledge, private documents, or domain specific information that can later guide your generative model to generate a correct answer. Once you have a vector database, you will retrieve relevant data examples to your query using the same embedding model you used to create the vector database with vector search. Lastly, you augment the retrieved information to your prompt, and obtain the final output from a generative model.
Below, we will walk you through step-by-step how to do this with a sample Airbnb listing review dataset in the Python environment.
Implementing RAG with Together and MongoDB Atlas
Step 1: Setting up
If you don’t have an account already, first sign up for Together AI by clicking here. Once you sign up, you will be granted with $25 free credits, so that you can try out various models and products we provide. Visit our Playground and experience 100+ generative AI models and for more details about how to use our API, visit our Documentation page.
To access Together API for this tutorial, you will be using your private Together API key. Find your key on the API key page.
Additionally, set up a MongoDB Atlas account by visiting the Register page if you don’t have an account yet. To access your database, you will need your MongoDB URI. Find your URI by clicking “Connect” > “Drivers” > “3. Add your connection string … ” :
Now install the following python packages `pip install requests pymongo together`. On your main script, define your private key variables with the API key and URI you found above:
Step 2: Set up the embedding creation function
Now we will define an embedding function using Together’s Embeddings REST API:
Choose your embedding model from the list of available models. Copy the model string for API to provide in your script. In this example, we will use the retrieval fine-tuned M2-BERT 8K model. Also, define the vector database field name, and choose the number of documents you will process to generate embeddings and save them. The more documents you use in the retrieval step, the more accurate your final output will be. However, you may hit the rate limit for a heavy use case. Consider switching to the Paid Account to avoid the rate limit.
Confirm that your embedding dimension meets the expected value. The expected value can be found on the available Models page.
Step 3: Create and store embeddings
In this tutorial, we are going to use the Airbnb listing review sample dataset in your Atlas database. You can see all other sample datasets in the documentation or under “Collections”. To have its embeddings contain important information, we will select a set of keys to extract and concatenate their values as an input string. Below, we are generating embeddings for one document at a time, but you can also send a list of document strings at once.
Step 4: Create a vector search index in Atlas
In this step, we will go to your Atlas page and follow this instruction to create the Atlas Vector Search Index. When you are selecting a database, select “sample_airbnb” > “listingsAndReviews”.Select “sample_airbnb” > “listingsAndReviews”. Provide an index name – here we used “SemanticSearch”. Lastly, add a json config for your embeddings:
Note that the field name should be the same as vector_database_field_name and the embedding dimension should be equal to the actual dimension. In this example, we are using dotProduct. For other options, see the Atlas Vector Search documentation.
Click “Next” and then “Create Search Index” button on the review page.
Step 5: Retrieve
Now we will retrieve a number of documents from the sample database that may contain relevant information to the user query. For the definitions of fields that the $vectorSearch takes, see the documentation.
The output for this example will look like this:
Step 6: Augment and Generate
In this last step, we will augment the retrieved data and provide it to a generative model of our choice to generate the final output on our prompt.
First, the prompt we will use in this example is:
Next, we will combine the retrieved data as one string to add to the prompt. Since it contains some not useful information such as _id and embedding_together_m2-bert-8k-retrieval, we will filter them out.
Finally, choose a generative model and run the inference using Together Inference API. You can find the full list of available models here. Consider selecting a model that can accept your input length. Additionally, we will check the default prompt format and stop sequences, and update the prompt before running the inference.
Run the inference (see more information about the parameters in our documentation):
The example output looks like this:
You can see “March 2019 availability! Oceanview on Sugar Beach!” is indeed in the retrieved list! To further check if this satisfies the query, let’s take a look at the listing of this place from the original database:
It’s close to the proximity to town instead of particular restaurants, but overall, it satisfies most of the requirements in our query!
This tutorial showed how to use Together Inference to generate embeddings and language responses. We also demonstrated how to use MongoDB’s Atlas Vector Search to store embeddings and perform the semantic search to retrieve relevant data examples for your natural language query. Combining these learnings, we presented an example of building a RAG application with a sample Airbnb listing data and how the generative AI model can recommend a place that meets our criteria while adhering to factual information.
This is one example, and we are looking forward to seeing many amazing applications that will be built using Together APIs and MongoDB’s Atlas Vector Search!