RAGAS

AI & LLMs

Supercharge Your LLM Application Evaluations 🚀

Latest: v0.4.327 releases11 breaking changes16 common errorsUpdated Jan 13, 2026View on GitHub

Release History

v0.4.37 fixes6 features

Jan 13, 2026

This release introduces advanced prompt optimization via DSPyOptimizer, adds system prompt support for several LLM wrappers, and includes several bug fixes related to caching and CI configuration.

v0.4.2Breaking9 fixes13 features

Dec 23, 2025

This release focuses heavily on migrating core metrics to the new collections API structure and introduces caching support for metrics and embeddings. Several bug fixes address issues related to instructor modes, type validation, and Claude workflow tokens.

v0.4.1Breaking2 fixes6 features

Dec 10, 2025

This release focuses heavily on migrating core metrics (ToolCallAccuracy, ToolCallF1, TopicAdherence, AgentGoalAccuracy, Rubrics) to utilize the collections API for better structure. It also introduces a breaking change by renaming `embed_text` to `aembed_text` in AnswerRelevancy.

v0.4.0Breaking9 fixes5 features

Dec 3, 2025

This release introduces major architectural updates, migrating numerous metrics to a modular BasePrompt system and enhancing LLM provider support via instructor.from_provider and dual adapter capabilities. It also includes several bug fixes related to LangChain integration and LLM detection.

v0.3.9Breaking5 fixes9 features

Nov 11, 2025

This release focuses heavily on migrating core metrics to a new structure, removing deprecated metrics like 'aspect critic', and introducing new features like synthetic data traceability metadata. Several documentation fixes and minor bug fixes related to OpenAI models were also implemented.

v0.3.8Breaking5 fixes6 features

Oct 28, 2025

This release focuses heavily on internal refactoring, migrating core functionalities like semantic similarity and simple criteria to collections, and merging LLM factory methods. Several bugs related to async handling and specific synthesizers were also fixed.

v0.3.74 fixes4 features

Oct 14, 2025

This release focuses on migrating several core metrics (BLEU, string metrics, answer similarity) to collections, improving robustness in query distribution, and adding new configuration options for LLM wrappers. Internal code quality and documentation were also enhanced.

v0.3.615 fixes10 features

Oct 3, 2025

This release introduces several new features, including CHRF score support, enhanced input flexibility for metrics, and OCI Gen AI integration. Numerous bug fixes address issues related to asyncio, metric calculations, and dependency compatibility.

v0.3.53 fixes4 features

Sep 17, 2025

This release focuses on improving core functionality, including better async execution and knowledge graph optimization, alongside several bug fixes and documentation updates.

v0.3.5rc2

Sep 17, 2025

No release notes provided.

v0.3.5rc12 fixes4 features

Sep 17, 2025

This release focuses on improving asynchronous operations, optimizing knowledge graph handling for large datasets, and fixing a TypeError in metric calculations. It also introduces telemetry collection.

v0.3.42 fixes1 feature

Sep 10, 2025

This release focuses on performance improvements, documentation updates, and minor bug fixes, including optimizing cluster finding and fixing batching issues with LangChain.

v0.3.3Breaking19 fixes11 features

Sep 4, 2025

This release focuses heavily on internal restructuring, moving modules like `tracing`, `prompts`, `dataset`, and experimental features into the main package structure while retiring the `ragas.experimental` namespace. Numerous bug fixes address CI, LLM compatibility (especially OpenAI O1 series), and metric stability.

v0.3.3rc1Breaking20 fixes11 features

Sep 4, 2025

This release focuses heavily on internal restructuring, migrating modules like `tracing`, `prompts`, `dataset`, and experimental metrics out of experimental namespaces and into the main package structure. It also includes numerous bug fixes, performance optimizations (like 50% speedup for factual correctness), and improved LLM compatibility.

v0.3.2Breaking3 fixes3 features

Aug 19, 2025

This release moves key features like `experiment` and the CLI from experimental to the main package, adds prompt saving/loading capabilities, and removes the simulation feature.

v0.3.2rc3

Aug 19, 2025

No release notes provided.

v0.3.2-rc21 fix

Aug 19, 2025

This release (v0.3.2-rc2) primarily addresses fixes related to pypi requirements and image absolute paths.

v0.3.2-rc1Breaking2 fixes4 features

Aug 19, 2025

This release moves key features like `experiment` and the CLI from experimental to the main package, removes simulation functionality, and adds support for Python 3.13.

v0.3.14 fixes1 feature

Aug 11, 2025

This release introduces a new Google Drive backend for dataset storage and includes several documentation and example improvements, alongside minor configuration fixes.

v0.3.0Breaking6 fixes10 features

Jul 17, 2025

This release introduces major features like LlamaIndex agentic integration, a new CLI, and security enhancements including a fix for CVE-2025-45691. It also includes significant internal refactoring, notably the removal of the Project structure.

v0.3.0-rc2

Jul 17, 2025

No release notes provided.

v0.3.0-rc1

Jul 17, 2025

No release notes provided.

v0.2.151 fix4 features

Apr 24, 2025

This release introduces new integrations with AWS Bedrock, LlamaStack, and Griptape, alongside enhancements to validation logic and documentation updates. A key documentation change involves renaming AWS Bedrock references to Amazon Bedrock.

v0.2.148 fixes6 features

Mar 4, 2025

This release introduces new features like HTTP request-response logging and multi-turn conversation evaluation, alongside numerous bug fixes across various metrics and synthesizers. It also includes documentation updates and new integrations.

v0.2.13Breaking3 fixes2 features

Feb 4, 2025

This release focuses on bug fixes, prompt improvements, and enhancements to integrations like langgraph, alongside removing an unnecessary argument from ToolCallAccuracy initialization.

v0.2.123 fixes2 features

Jan 21, 2025

This release introduces Bedrock token parser support and an optional parameter for the BLEU score, alongside several bug fixes for TP/FP calculations and the output parser.

v0.2.115 fixes6 features

Jan 14, 2025

This release introduces new features like Swarm integration and the ability to specify an experiment name during evaluation. It also includes several bug fixes related to metrics and dependency management, alongside numerous documentation updates.

Common Errors

BadRequestError6 reports

BadRequestError in ragas often arises from malformed requests sent to the underlying LLM service, such as exceeding token limits or providing incompatible input formats. To fix this, ensure your prompts and input data adhere to the specific LLM's requirements, including length limitations. Adjust configurations like `max_tokens` or reformat your input to be compatible with the LLM's expected structure.

ModuleNotFoundError3 reports

The "ModuleNotFoundError" in ragas usually indicates that the ragas package or a specific sub-module is not installed or the installed version is outdated. Resolve this by first ensuring ragas is installed with `pip install ragas` and then upgrade to the latest version `pip install --upgrade ragas` to include the missing modules. If using a virtual environment, activate it before installation.

RagasOutputParserException3 reports

RagasOutputParserException typically arises when the LLM's output doesn't conform to the expected format (e.g., JSON) required by ragas metrics, or when the output is incomplete due to issues like timeouts. Address this by refining your prompt to explicitly instruct the LLM to output in the desired format and handle potential failures gracefully, and implement robust error handling with retries and timeout configurations. Enforcing a JSON schema and validating the output before parsing can also reduce parsing errors.

InstructorRetryException2 reports

InstructorRetryException in ragas usually stems from rate limits or temporary unavailability of the LLM service used by Instructor. Implement retry logic with exponential backoff within your LLM service calls, and also ensure you're adhering to the specific API rate limits outlined in your LLM provider's (e.g., OpenAI) documentation. Consider increasing timeout values or reducing batch sizes if rate limits are still consistently hit after implementing retries.

OutputParserException2 reports

The "OutputParserException" in ragas usually occurs when the LLM's output format doesn't match the expected format defined in the output parser (e.g., expecting "text" but receiving "statements"). To fix it, carefully review the output parser's expected format in the ragas metric definition (often a pydantic class) and adjust the LLM prompt instructions and output parsing logic to ensure the LLM consistently returns data in the required structure. Specifically handle cases where the intermediate output is incorrect and consider adding error handling to gracefully manage unexpected output formats and potential repair mechanisms.

NewConnectionError1 report

NewConnectionError in ragas usually arises when the evaluation tries to connect to an external service (like MLflow or OpenAI) and fails due to network issues or the service being unavailable. Ensure the target service (e.g., MLflow server) is running and accessible from your environment by checking its status and network connectivity; additionally, verify your environment has the necessary permissions to access external resources. If using a proxy, configure the `HTTP_PROXY` and `HTTPS_PROXY` environment variables accordingly.

Related AI & LLMs Packages

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Ollama

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

LangChain

🦜🔗 The platform for reliable agents.

ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

llama.cpp

LLM inference in C/C++

GPT4All

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.