Skip to content

Initial ebook management support#41

Draft
jfro wants to merge 11 commits intomainfrom
jtk/ebook-management
Draft

Initial ebook management support#41
jfro wants to merge 11 commits intomainfrom
jtk/ebook-management

Conversation

@jfro
Copy link
Owner

@jfro jfro commented Jan 11, 2026

This is a step in the direction of integrating features for something akin to BookLore, Calibre etc. Very basics currently of scanning a folder & processing metadata & covers.

jfro and others added 11 commits January 6, 2026 19:06
Implement core data model for ebook file management:
- Create Ebook schema with file info, metadata, and processing fields
- Add database migration with indexes and constraints
- Implement Ebooks context with CRUD operations
- Add comprehensive test coverage for schema and context
- Ebooks are shared across all users (no user ownership)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements MetadataExtractor module to parse ebook files and extract metadata including title, author, ISBN, publisher, language, and description. Also supports extracting cover images from EPUB files.

- Add bupe library for EPUB parsing
- Add pdf_info library for PDF metadata extraction
- Implement MetadataExtractor with extract/1 and extract_cover/1 functions
- Add comprehensive test suite with real EPUB/PDF file generation
- Use ExUnit.CaptureLog to suppress expected errors in tests
- All 179 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement Oban worker that scans directories for EPUB/PDF files:
- Discovers files recursively or non-recursively
- Calculates SHA256 file hashes for deduplication
- Creates Ebook records with file metadata
- Enqueues ProcessWorker jobs for metadata extraction
- Validates directory existence and skips existing files

Includes comprehensive test coverage with 8 test cases covering
file discovery, recursive scanning, deduplication, and error handling.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements the ProcessWorker Oban job that processes ebook files by:
- Verifying file existence and hash integrity
- Extracting metadata (title, author, ISBN, publisher, language, description)
- Extracting and storing cover thumbnails via Storage backend
- Matching ebooks to existing Book records via ISBN or fuzzy title/author matching
- Updating ebook records with extracted metadata and processing status

Includes comprehensive test suite with proper error log capture for clean test output.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add high-level convenience functions to the Ebooks context for:
- Triggering directory scans via trigger_scan/1
- Triggering file reprocessing via trigger_reprocess/2
- Querying ebooks by processing status (list_pending_ebooks/0, list_failed_ebooks/0)
- Deleting ebooks via delete_ebook/1
- Creating changesets via change_ebook/2

All functions include comprehensive test coverage using TDD approach.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds comprehensive configuration for ebook management with the following settings:
- supported_formats: Controls which file formats are scanned (default: epub, pdf)
- max_file_size: Limits file size for processing (default: 100MB)
- enable_fuzzy_matching: Enables automatic book linking via fuzzy matching (default: true)
- fuzzy_threshold: Controls fuzzy matching strictness 0.0-1.0 (default: 0.85)

Workers updated to read from configuration:
- ScanWorker uses supported_formats for file discovery and validates max_file_size
- ProcessWorker uses enable_fuzzy_matching as default when not explicitly provided

Also fixes .coveralls.exs syntax error that prevented mix format from running.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit includes multiple related changes:

feat: add ebook library management with admin UI
- Add Libraries CRUD UI at /admin/libraries for managing library directories
- Support three scan modes: manual, auto_watch (filesystem monitoring), scheduled (cron)
- Include directory tree picker component for intuitive path selection
- Implement LibraryWatcher GenServer for automatic filesystem change detection
- Implement LibraryScheduler GenServer for cron-based scheduled scanning
- Add comprehensive test coverage (341 tests passing)

feat: display all books regardless of collection status
- Switch from INNER JOIN to LEFT JOIN on collection_items
- Make media types informational only, not governing visibility
- Add delete book button with proper confirmation

fix: resolve PDF processing hanging on large files
- Switch from in-memory pdf_info library to command-line pdfinfo
- Resolve 8+ hour hangs on 50MB+ PDF files
- Process large PDFs in sub-second timeframes

fix: handle semicolon-separated tags to prevent overflow
- Add normalize_tags/1 to split and process delimited tag strings
- Prevent varchar(255) overflow errors from long concatenated tags

fix: support namespaced XML elements in EPUB parsing
- Handle container.xml and OPF files with namespace prefixes
- Support various EPUB XML format variations

refactor: migrate EPUB parsing from erlang :zip to zstream
- Use zstream library for more flexible ZIP handling
- Add fallback to erlang :zip for unsupported ZIP formats

refactor: replace xmerl with Saxy for XML parsing
- Rewrite OpfParser to use Saxy instead of xmerl
- Improve parsing reliability and performance

chore: update dependencies and Docker configuration
- Add zstream, file_system, saxy dependencies
- Remove pdf_info dependency
- Add poppler-utils to Dockerfile for pdfinfo command

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Install inotify-tools and poppler-utils in CI environment:
- inotify-tools: Required for file_system library (LibraryWatcher)
- poppler-utils: Provides pdfinfo command for PDF metadata extraction

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant