v0.14.16

📅 Mar 10, 2026📦 llamaindexView on GitHub →

✨ 13 features🐛 23 fixes⚡ 1 deprecations🔧 14 symbols

Summary

This release introduces new rate limiting features, multimodal reranking, and several security and stability fixes across core components and integrations. Key improvements include better async handling and fixes for tool calling and schema introspection.

Migration Steps

When using asynchronous operations that previously relied on `asyncio_module`, update usage to call `get_asyncio_module()` instead.

✨ New Features

Add token-bucket rate limiter for LLM and embedding API calls in `llama-index-core`.
Introduce Multimodal LLMReranker in `llama-index-core`.
Add optional `embed_model` to `SemanticDoubleMergingSplitterNodeParser` in `llama-index-core`.
Add SlidingWindowRateLimiter for strict per-minute caps in `llama-index-core`.
Extend vector store metadata filters in `llama-index-core`.
Add Neo4j user agent in `llama-index-graph-stores-neo4j`.
Add `apoc_sample` parameter for large database schema introspection in `llama-index-graph-stores-neo4j`.
Add extra span processors to register within the otel tracer in `llama-index-observability-otel`.
Allow passing a custom tracer provider in `llama-index-observability-otel`.
Add inheritance for external context in `llama-index-observability-otel`.
Add ModelsLab LLM integration (`llama-index-llms-modelslab`).
Support `gpt-5-chat` in `llama-index-llms-openai`.
Support `reasoning_content` in OpenAI Chat Completions in `llama-index-llms-openai`.

🐛 Bug Fixes

Fix Chonkie init documentation in `llama-index-core` and `llama-index-node-parser-chonkie`.
Fix passing `tool_choice` through `FunctionCallingProgram` in `llama-index-core`.
Preserve `doc_id` in `legacy_json_to_doc` in `llama-index-core`.
Fix async retry backoff to avoid blocking the event loop in `llama-index-core`.
Fix additionalProperties in auto-generated KG schema models in `llama-index-core`.
Fix respecting `db_schema` when a custom async_engine is provided in `llama-index-core`.
Replace blocking `run_async_tasks` with `asyncio.gather` in `llama-index-core`.
Fix `FunctionTool` not respecting pydantic `Field` defaults in `llama-index-core`.
Fix `MarkdownElementNodeParser` to extract code blocks in `llama-index-core`.
Fix partial-failure handling in `SubQuestionQueryEngine` in `llama-index-core`.
Fix bounds check to prevent infinite loop in `ChatMemoryBuffer.get()` in `llama-index-core`.
Fix ensuring streaming flag reset on exception in `CondenseQuestionChatEngine` in `llama-index-core`.
Fix passing run id correctly in `llama-index-core`.
Raise `ValueError` when 'model' is passed instead of 'model_name' in `BedrockEmbedding` in `llama-index-embeddings-bedrock`.
Respect `Retry-After` header in OpenAI retry decorator in `llama-index-embeddings-openai` and `llama-index-llms-openai`.
Properly manage async client lifecycle to prevent unclosed sessions in `llama-index-llms-azure-inference`.
Improve handling of `reasoningContent` in responses from Converse & ConverStream requests in `llama-index-llms-bedrock-converse`.
Fix OpenAI tool call after thinking in `llama-index-llms-openai`.
Fix forwarding `allow_parallel_tool_calls` for OpenAI chat completions in `llama-index-llms-openai`.
Fix using constrained decoding for `OpenAIResponses.structured_predict` in `llama-index-llms-openai`.
Fix OpenAI tool calls in `llama-index-llms-openai`.
Fix stripping parallel_tool_calls for reasoning models in `llama-index-llms-openai`.
Fix Mistral package Python requirement in `llama-index-llms-mistralai`.

Affected Symbols

FunctionCallingProgram legacy_json_to_doc run_async_tasks SemanticDoubleMergingSplitterNodeParser FunctionTool MarkdownElementNodeParser SubQuestionQueryEngine ChatMemoryBuffer CondenseQuestionChatEngine BedrockEmbedding ReActChatFormatter SimpleObjectNodeMapping asyncio_module get_asyncio_module

⚡ Deprecations

Deprecate `asyncio_module` in favor of `get_asyncio_module` in `llama-index-core`.