Initial ebook management support by jfro · Pull Request #41 · jfro/fuzzy_catalog

jfro · 2026-01-11T19:23:21Z

This is a step in the direction of integrating features for something akin to BookLore, Calibre etc. Very basics currently of scanning a folder & processing metadata & covers.

Implement core data model for ebook file management: - Create Ebook schema with file info, metadata, and processing fields - Add database migration with indexes and constraints - Implement Ebooks context with CRUD operations - Add comprehensive test coverage for schema and context - Ebooks are shared across all users (no user ownership) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements MetadataExtractor module to parse ebook files and extract metadata including title, author, ISBN, publisher, language, and description. Also supports extracting cover images from EPUB files. - Add bupe library for EPUB parsing - Add pdf_info library for PDF metadata extraction - Implement MetadataExtractor with extract/1 and extract_cover/1 functions - Add comprehensive test suite with real EPUB/PDF file generation - Use ExUnit.CaptureLog to suppress expected errors in tests - All 179 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implement Oban worker that scans directories for EPUB/PDF files: - Discovers files recursively or non-recursively - Calculates SHA256 file hashes for deduplication - Creates Ebook records with file metadata - Enqueues ProcessWorker jobs for metadata extraction - Validates directory existence and skips existing files Includes comprehensive test coverage with 8 test cases covering file discovery, recursive scanning, deduplication, and error handling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements the ProcessWorker Oban job that processes ebook files by: - Verifying file existence and hash integrity - Extracting metadata (title, author, ISBN, publisher, language, description) - Extracting and storing cover thumbnails via Storage backend - Matching ebooks to existing Book records via ISBN or fuzzy title/author matching - Updating ebook records with extracted metadata and processing status Includes comprehensive test suite with proper error log capture for clean test output. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add high-level convenience functions to the Ebooks context for: - Triggering directory scans via trigger_scan/1 - Triggering file reprocessing via trigger_reprocess/2 - Querying ebooks by processing status (list_pending_ebooks/0, list_failed_ebooks/0) - Deleting ebooks via delete_ebook/1 - Creating changesets via change_ebook/2 All functions include comprehensive test coverage using TDD approach. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds comprehensive configuration for ebook management with the following settings: - supported_formats: Controls which file formats are scanned (default: epub, pdf) - max_file_size: Limits file size for processing (default: 100MB) - enable_fuzzy_matching: Enables automatic book linking via fuzzy matching (default: true) - fuzzy_threshold: Controls fuzzy matching strictness 0.0-1.0 (default: 0.85) Workers updated to read from configuration: - ScanWorker uses supported_formats for file discovery and validates max_file_size - ProcessWorker uses enable_fuzzy_matching as default when not explicitly provided Also fixes .coveralls.exs syntax error that prevented mix format from running. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit includes multiple related changes: feat: add ebook library management with admin UI - Add Libraries CRUD UI at /admin/libraries for managing library directories - Support three scan modes: manual, auto_watch (filesystem monitoring), scheduled (cron) - Include directory tree picker component for intuitive path selection - Implement LibraryWatcher GenServer for automatic filesystem change detection - Implement LibraryScheduler GenServer for cron-based scheduled scanning - Add comprehensive test coverage (341 tests passing) feat: display all books regardless of collection status - Switch from INNER JOIN to LEFT JOIN on collection_items - Make media types informational only, not governing visibility - Add delete book button with proper confirmation fix: resolve PDF processing hanging on large files - Switch from in-memory pdf_info library to command-line pdfinfo - Resolve 8+ hour hangs on 50MB+ PDF files - Process large PDFs in sub-second timeframes fix: handle semicolon-separated tags to prevent overflow - Add normalize_tags/1 to split and process delimited tag strings - Prevent varchar(255) overflow errors from long concatenated tags fix: support namespaced XML elements in EPUB parsing - Handle container.xml and OPF files with namespace prefixes - Support various EPUB XML format variations refactor: migrate EPUB parsing from erlang :zip to zstream - Use zstream library for more flexible ZIP handling - Add fallback to erlang :zip for unsupported ZIP formats refactor: replace xmerl with Saxy for XML parsing - Rewrite OpfParser to use Saxy instead of xmerl - Improve parsing reliability and performance chore: update dependencies and Docker configuration - Add zstream, file_system, saxy dependencies - Remove pdf_info dependency - Add poppler-utils to Dockerfile for pdfinfo command Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Install inotify-tools and poppler-utils in CI environment: - inotify-tools: Required for file_system library (LibraryWatcher) - poppler-utils: Provides pdfinfo command for PDF metadata extraction Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

jfro and others added 11 commits January 6, 2026 19:06

docs: fix lingering user_id references in plan docs

3d8a926

chore: coveralls to help us get better test coverage going

7ec7cad

feat: show number of books in catalog or search/filter

9d2b94c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial ebook management support#41

Initial ebook management support#41
jfro wants to merge 11 commits intomainfrom
jtk/ebook-management

jfro commented Jan 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jfro commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jfro commented Jan 11, 2026 •

edited

Loading