PHP FFI wrapper for pdf-inspector, a fast Rust library for PDF classification and text extraction.
Documentation: English | Türkçe
This package exposes the full pdf-inspector API surface (process, detect, classify, extract text, region extraction, per-page markdown) through PHP FFI. It includes a Laravel service provider and facade for seamless integration.
- PHP 8.4+
- PHP FFI extension enabled (
extension=ffi) - Rust toolchain (only needed when you build native binaries yourself)
composer require ferdiunal/php-firepdfThe package resolves the shared library in this order:
FIREPDF_LIB_PATH(or Laravelphp-firepdf.lib_path)- Bundled package path:
native/lib/<os>-<arch>/ - Dev fallback:
native/pdf-inspector-ffi/target/release/
If your production deployment does not set FIREPDF_LIB_PATH, make sure the package contains the prebuilt file under native/lib/<os>-<arch>/.
cd vendor/ferdiunal/php-firepdf/native/pdf-inspector-ffi
cargo build --release --locked
# copy the built file into package bundle layout
cd ../..
./scripts/stage-native-bundle.shUse GitHub Actions workflow native-bundles to produce bundle artifacts for Linux, macOS, and Windows.
The output folder name includes runner architecture (for example: linux-x86_64, darwin-arm64, windows-x86_64).
Each artifact contains:
native/lib/<os>-<arch>/<library>
Include these files in the package release, or set FIREPDF_LIB_PATH explicitly at runtime.
php artisan vendor:publish --tag="php-firepdf-config"use Ferdiunal\FirePdf\FirePdf;
$pdf = new FirePdf();
// Full processing: detect + extract + markdown
$result = $pdf->processPdf('document.pdf');
echo $result->pdfType; // TextBased, Scanned, ImageBased, Mixed
echo $result->markdown; // Markdown string or null
// Fast detection only
$info = $pdf->detectPdf('document.pdf');
// From bytes (no filesystem)
$bytes = file_get_contents('document.pdf');
$result = $pdf->processPdfBytes($bytes);
// Per-page markdown
$pages = $pdf->extractPagesMarkdown('document.pdf');
foreach ($pages->pages as $page) {
echo "Page {$page->page}: {$page->markdown}";
}use Ferdiunal\FirePdf\Facades\FirePdf;
$result = FirePdf::processPdf('document.pdf');This package ships optional AI SDK-compatible tools under the
Ferdiunal\FirePdf\Ai\Tools namespace:
DetectPdfToolClassifyPdfToolProcessPdfToolExtractTextToolExtractPagesMarkdownTool
These tools follow the Laravel AI SDK Tool contract and can be returned
explicitly from your agent's tools() method:
<?php
namespace App\Ai\Agents;
use Ferdiunal\FirePdf\Ai\Tools\ClassifyPdfTool;
use Ferdiunal\FirePdf\Ai\Tools\DetectPdfTool;
use Ferdiunal\FirePdf\Ai\Tools\ExtractPagesMarkdownTool;
use Ferdiunal\FirePdf\Ai\Tools\ExtractTextTool;
use Ferdiunal\FirePdf\Ai\Tools\ProcessPdfTool;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasTools;
use Laravel\Ai\Contracts\Tool;
use Laravel\Ai\Promptable;
final class PdfAssistant implements Agent, HasTools
{
use Promptable;
/**
* @return Tool[]
*/
public function tools(): iterable
{
return [
new DetectPdfTool(),
new ClassifyPdfTool(),
new ProcessPdfTool(),
new ExtractTextTool(),
new ExtractPagesMarkdownTool(),
];
}
}Tool input is storage-scoped and requires a relative path argument. Configure
the disk and base path:
// config/php-firepdf.php
'ai_tools' => [
'disk' => env('FIREPDF_AI_TOOLS_DISK', 'local'),
'base_path' => env('FIREPDF_AI_TOOLS_BASE_PATH', 'incoming/pdfs'),
],Example call payload (from an AI tool invocation):
{
"path": "contracts/sample.pdf"
}If you use these tools, install the Laravel AI SDK in your Laravel app:
composer require laravel/aiObject rule:
use Ferdiunal\FirePdf\Rules\ValidPdf;
$rules = [
'document' => ['required', 'file', new ValidPdf()],
];String alias:
$rules = [
'document' => ['required', 'file', 'firepdf_pdf'],
];Recommended for early filtering + deep validation:
$rules = [
'document' => ['required', 'file', 'mimetypes:application/pdf', 'firepdf_pdf'],
];| Method | Description |
|---|---|
processPdf(path, pages?) |
Full processing (detect + extract + markdown) |
processPdfBytes(data, pages?) |
Full processing from bytes |
detectPdf(path) |
Fast detection only |
detectPdfBytes(data) |
Fast detection from bytes |
classifyPdf(path) |
Lightweight classification |
classifyPdfBytes(data) |
Lightweight classification from bytes |
extractText(path) |
Plain text extraction |
extractTextBytes(data) |
Plain text from bytes |
extractTextWithPositions(path, pages?) |
Text with X/Y coords and font info |
extractTextWithPositionsBytes(data, pages?) |
Positions from bytes |
extractTextInRegions(path, pageRegions) |
Extract text in bounding-box regions |
extractTextInRegionsBytes(data, pageRegions) |
Region extraction from bytes |
extractTablesInRegions(path, pageRegions) |
Table markdown in regions |
extractTablesInRegionsBytes(data, pageRegions) |
Table regions from bytes |
extractPagesMarkdown(path, pages?) |
Per-page markdown + layout metadata |
extractPagesMarkdownBytes(data, pages?) |
Per-page markdown from bytes |
getRuntimeSnapshot() |
Returns aggregate runtime telemetry for worker memory/speed |
resetRuntimeSnapshot() |
Resets aggregate runtime telemetry counters |
shouldRecycleWorker() |
Returns true when configured soft/hard memory limit was exceeded |
close() |
Closes the FFI handle and runs a GC cycle |
Validation extensions:
Ferdiunal\FirePdf\Rules\ValidPdf(object rule)firepdf_pdf(string alias)
$firePdf->resetRuntimeSnapshot();
$result = $firePdf->processPdf($path);
$snapshot = $firePdf->getRuntimeSnapshot();
echo $snapshot->lastDurationMs; // last operation duration
echo $snapshot->averageDurationMs; // average duration
echo $snapshot->currentMemoryBytes; // current process memory
echo $snapshot->peakMemoryBytes; // process peak memoryFor quick markdown + telemetry reports on sample PDFs:
php scripts/test-user-pdfs.php$result = $firePdf->processPdf($path);
if ($firePdf->shouldRecycleWorker()) {
// Mark worker for graceful recycle at end of request.
}$result = $firePdf->processPdf($path);
if ($firePdf->shouldRecycleWorker()) {
// Trigger worker restart in your supervisor/worker control flow.
}$result = $firePdf->processPdf($path);
if ($firePdf->shouldRecycleWorker()) {
// Stop current worker and let RR spawn a fresh one.
}Recommended policy:
- Use worker
max requestsandshouldRecycleWorker()together. - Set
soft_limit_mbbelow your process hard limit. - Set
hard_limit_mbas a deterministic recycle threshold.
# Native build
cd native/pdf-inspector-ffi
cargo build --release --locked
# PHP tests (requires the FFI library to be built)
composer test
# PHP static analysis
composer analyseMIT. Please see License File for more information.