Change8

3.5.0

📦 datasetsView on GitHub →
2 features🐛 3 fixes🔧 2 symbols

Summary

This release introduces native PDF support when loading datasets, allowing users to directly process PDF files. It also includes several minor fixes related to local loading and file handling.

Migration Steps

  1. If loading PDFs, be aware that the feature column will now contain a pdfplumber.pdf.PDF object, which can be processed using its methods (e.g., .pages[0].extract_text()).

✨ New Features

  • Introduce PDF support via load_dataset.
  • PDF support allows loading PDF files, returning a pdfplumber.pdf.PDF object in the corresponding feature column.

🐛 Bug Fixes

  • Fix local pdf loading.
  • Minor fix for metadata files in extension counter.
  • Prioritize json loading.

🔧 Affected Symbols

load_datasetPdf