Daniel Puente

Automatic RAG Dataset Creation and Evaluation with Giskard & RAGAS

This article presents a comprehensive guide on how to automatically create and evaluate RAG datasets for large language models. The workflow leverages several powerful tools, including Langchain, Gemmini, RAGAS, Giskard, and LangSmith. It is designed to help you quickly evaluate Retrieval-Augmented Generation (RAG) systems without the need to manually curate large datasets.

For all the content and a complete tutorial notebook, check out the project on GitHub → GitHub repository

By following this pipeline, you will learn how to set up an automatic system for evaluating RAG responses. In this case, we simulate a real-world scenario by answering questions about popular TV shows, such as Breaking Bad, Game of Thrones, and La Casa de Papel. For a detailed walkthrough and tutorial, visit the GitHub repository → GitHub repository

In this pipeline, you'll explore the following:

How to automatically generate realistic questions and answers using Giskard.
How to evaluate the RAG system using RAGAS to compute key metrics such as Context Precision, Answer Similarity, and Faithfulness.
How to monitor and track your system's performance with LangSmith at runtime.

By using this automated approach, you eliminate the need to manually create a labeled dataset, streamlining the process for testing and refining RAG systems. The pipeline is open-source and easy to integrate into your existing projects.

Front page