Change8

Unstructured

Data & ML

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Latest: 0.18.2836 releases2 breaking changesView on GitHub →

Release History

0.18.28
Jan 9, 2026
0.18.272 fixes
Jan 8, 2026

This release focuses on performance optimizations across several internal functions and resolves a bug related to partially extracted elements. A critical dependency upgrade to pdfminer-six addresses a performance regression.

0.18.262 fixes
Jan 5, 2026

Version 0.18.26 pins deltalake to resolve ARM64 build issues, while 0.18.25 includes a security update by relaxing constraints and bumping pdfminer.six and urllib3.

0.18.241 fix1 feature
Dec 30, 2025

This release includes an optimization for OCR extraction and a critical security update by bumping dependencies.

0.18.221 fix
Dec 10, 2025

This patch release primarily addresses a security vulnerability by updating the fonttools dependency.

0.18.211 fix
Nov 24, 2025

This release primarily focuses on security by updating the "unstructured-inference" dependency to version 1.1.2 to address known CVEs.

0.18.202 features
Nov 15, 2025

This release improves the VoyageAI integration, adds support for voyage-context-3, and enhances metadata flagging for extracted elements.

0.18.184 fixes
Nov 7, 2025

This set of releases focuses heavily on security by addressing multiple CVEs through dependency bumps and fixing a critical path traversal vulnerability in MSG attachment handling. Additionally, performance was improved in hash ID assignment.

0.18.151 fix2 features
Sep 17, 2025

This release focuses on performance improvements for paragraph grouping and HTML element processing, alongside dependency updates to address security vulnerabilities.

0.18.142 fixes1 feature
Aug 26, 2025

This release focuses on performance improvements, notably speeding up the sentence_count function, and addresses several security vulnerabilities by updating dependencies.

0.18.131 fix
Aug 13, 2025

This patch improves the robustness of email parsing by handling more diverse date formats within email headers.

0.18.121 fix
Jul 28, 2025

This release improves error handling during encoding detection by replacing UnicodeDecodeError with UnprocessableEntityError to prevent logging large file contents.

0.18.112 features
Jul 23, 2025

This release introduces support for '|' as a CSV delimiter and adds mapping for <input> tags based on their type attribute. It also switches the dependency used for character set normalization.

0.18.101 feature
Jul 18, 2025

This minor release introduces a new environment variable, OCR_AGENT_CACHE_SIZE, for better memory management in OCR agents.

0.18.92 fixes1 feature
Jul 16, 2025

Version 0.18.9 introduces a new feature to convert elements to markdown and includes fixes for language detection with empty text and handling password-protected XLSX files.

0.18.71 feature
Jul 15, 2025

This release introduces language detection capabilities for PDF documents at both the document and element levels.

0.18.62 fixes
Jul 15, 2025

This release focuses on stability, improving EPUB partitioning error handling and correcting serialization types for TableChunks.

0.18.41 fix
Jul 8, 2025

This minor release primarily addresses an issue by increasing the field limit when parsing CSV files.

0.18.31 fix
Jul 5, 2025

This minor release bumps the pillow dependency to address a security vulnerability (CVE).

0.18.25 fixes
Jul 1, 2025

This release focuses on stability and correctness, addressing several bugs related to HTML parsing, XML escaping, Markdown encoding, and table processing.

0.18.11 fix1 feature
Jun 24, 2025

This release introduces the DocumentData element type for better handling of large document data and fixes an issue related to the encoding property in _CsvPartitioningContext.

0.17.11-dev1Breaking11 fixes5 features
Jun 13, 2025

This release drops support for Python 3.9 due to dependency conflicts and includes numerous bug fixes related to PDF parsing, stability, and security vulnerabilities. It also deprecates the `stage_for_label_studio` utility.

0.17.21 fix2 features
Mar 20, 2025

This release introduces the extraction of image URLs from HTML partitions and significantly speeds up hOCR parsing by switching to lxml. It also bumps the minimum required numpy version to greater than 2.

0.17.01 fix2 features
Mar 12, 2025

Version 0.17.0 introduces image inclusion during HTML partitioning and allows passing OCR/table agents, while removing the deprecated PageLayout.elements reference.

0.16.251 fix
Mar 7, 2025

This minor release primarily addresses a bug in filetype detection when handling JSON byte streams.

0.16.241 fix2 features
Mar 7, 2025

This release introduces dynamic file type registration for partitioners and enhances image block extraction for CamelCase element types. It also adds support for converting JSON elements to HTML.

0.16.231 fix
Feb 20, 2025

This minor release primarily addresses a bug in file type detection when handling SpooledTemporaryFile objects.

0.16.221 fix
Feb 20, 2025

This minor release focuses on security by addressing open CVES and updating underlying dependencies.

0.16.211 fix3 features
Feb 17, 2025

This release introduces password support for PDF loading and new configuration options for PDF Miner, alongside performance improvements in layout merging and a fix for NDJSON file detection.

0.16.201 fix
Feb 6, 2025

This release addresses a critical security vulnerability related to file inclusion in rst and org files by implementing sandboxing for file partitioning.

0.16.192 fixes
Feb 5, 2025

This release primarily focuses on fixing critical bugs related to table extraction logic and HTML partitioning within table cells. It also updates internal tooling configurations like `make tidy`.

0.16.171 fix1 feature
Jan 29, 2025

This minor release refactors the VoyageAI integration and fixes a bug related to text ordering in layout element construction.

0.16.162 fixes1 feature
Jan 27, 2025

This release introduces vectorized data structures for layout processing to improve performance and fixes an issue with NLTK auto-downloading and a patch in pdfminer that caused token splitting errors.

0.16.15
Jan 23, 2025

This release updates the versions of `unstructured-inference` and `pdfminer-six`, resulting in the removal of `layoutparser` related dependencies from `unstructured-inference`.

0.16.141 fix
Jan 20, 2025

This release primarily addresses a bug related to redundant passing of the 'infer_table_structure' argument during email partitioning with image attachments.

0.16.13Breaking1 fix1 feature
Jan 13, 2025

This release introduces character-level filtering for Tesseract output and resolves an issue with NLTK asset usage in Docker images, while removing automatic NLTK package downloading.