Skip to content

guillercp93/HtmlToPdfDotNet

Repository files navigation

HtmlToPdfDotNet

HtmlToPdfDotNet is a fast, lightweight, and completely native .NET library designed to convert HTML and CSS into high-quality PDF documents. It does not rely on headless browsers (like Chromium) or external system dependencies (like wkhtmltopdf). Instead, it implements a custom HTML parser, CSS styling engine, and PDF writer directly in C# to provide maximum performance and lower memory usage.


Library Overview

Purpose

The primary goal of HtmlToPdfDotNet is to provide a fully managed, dependency-free solution for generating PDFs from HTML templates within .NET applications. It is ideal for generating invoices, reports, receipts, and other structured documents.

Main Features

  • Fully Managed: 100% C# code, no external browser or unmanaged bindings required.
  • Advanced CSS Support: Supports modern CSS selectors, the standard box model (margins, borders, padding), inline styles, and external stylesheets.
  • Font Subsetting: Embeds only the required glyphs from TTF/OTF fonts to significantly reduce PDF file sizes.
  • Rich Media Support: Embeds JPEG and PNG images natively, preserving aspect ratios and transparency.
  • Table Layouts: Comprehensive support for complex <table> layouts, including spanning and borders.
  • Dependency Injection: First-class support for Microsoft.Extensions.DependencyInjection.

Supported .NET Versions

  • .NET 10.0 (Target framework: net10.0)

Typical Use Cases

  • Generating automated financial reports (invoices, receipts, ledgers).
  • Exporting dashboards and analytics views to PDF.
  • Creating standardized legal or medical documents from HTML templates.
  • Generating tickets, boarding passes, or printable labels.

Installation

Prerequisites

  • .NET 10.0 SDK or newer must be installed on your development machine or build server.

NuGet Package Manager

You can install the package via the NuGet Package Manager UI in Visual Studio, or by using the following commands:

Using .NET CLI:

dotnet add package HtmlToPdfDotNet.Library --version 1.0.0

Using Package Manager Console (Visual Studio):

Install-Package HtmlToPdfDotNet.Library -Version 1.0.0

Using PackageReference (in .csproj):

<PackageReference Include="HtmlToPdfDotNet.Library" Version="1.0.0" />

How to Use

Quick Start Example

The simplest way to generate a PDF is by instantiating the generator directly and passing HTML text.

using HtmlToPdfDotNet.Library;
using HtmlToPdfDotNet.Library.Commons;

// 1. Initialize configuration options
var options = new ConversionOptions
{
    Page = PageLayout.A4,
    CompressStreams = true
};

// 2. Create the generator instance
IPdfGenerator generator = new PdfGenerator(options);

// 3. Define HTML content
string html = @"
    <html>
        <head>
            <style>
                body { font-family: 'Helvetica'; font-size: 12pt; }
                h1 { color: #2c3e50; }
                .content { border: 1px solid #bdc3c7; padding: 10px; }
            </style>
        </head>
        <body>
            <h1>Hello, World!</h1>
            <div class='content'>
                <p>This PDF was generated entirely in .NET.</p>
            </div>
        </body>
    </html>";

// 4. Generate and save the PDF
generator.WritePdfFile(html, "output.pdf");

Dependency Injection (DI) Example

HtmlToPdfDotNet integrates seamlessly with modern .NET DI containers.

using Microsoft.Extensions.DependencyInjection;
using HtmlToPdfDotNet.Library;
using HtmlToPdfDotNet.Library.Infrastructure;

ServiceCollection services = new();

// Register the PDF Generator using the extension method
services.AddHtmlToPdfDotNet();

ServiceProvider serviceProvider = services.BuildServiceProvider();

// Resolve and use
IPdfGenerator generator = serviceProvider.GetRequiredService<IPdfGenerator>();
generator.WritePdfFile("<h1>Injected Generator</h1>", "di_output.pdf");

Important Classes and Interfaces

  • IPdfGenerator: The primary interface. Exposes methods like Convert(html), Convert(html, stream), and WritePdfFile(html, path).
  • PdfGenerator: The default implementation of IPdfGenerator.
  • ConversionOptions: Configuration class allowing customization of page sizes (PageLayout), base paths for resources (BasePath), external stylesheets (StyleSheets), and custom fonts (FontRegistry).
  • PageLayout: Defines the physical page size (e.g., PageLayout.A4, PageLayout.Letter) and document margins.

Best Practices

  • Reuse the Generator: The PdfGenerator class is thread-safe and should be registered as a Singleton to prevent redundant memory allocations.
  • Use External Stylesheets: Instead of large <style> blocks, pass pre-compiled CSS files via ConversionOptions.StyleSheets for faster parsing.
  • Font Subsetting: When using custom TTF fonts, explicitly register them using the FontRegistry to ensure only used glyphs are embedded, drastically reducing output file size.

Error Handling Example

Always wrap PDF generation in a try-catch block, especially when dealing with external HTML input or file I/O operations.

try
{
    string htmlContent = "<h1>Invoice #1234</h1>";
    generator.WritePdfFile(htmlContent, "invoice.pdf");
    Console.WriteLine("PDF generated successfully.");
}
catch (UnauthorizedAccessException ex)
{
    Console.Error.WriteLine($"Permission denied while writing PDF: {ex.Message}");
}
catch (Exception ex) // Handles parsing or rendering errors
{
    Console.Error.WriteLine($"Failed to generate PDF: {ex.Message}");
}

Dependencies

HtmlToPdfDotNet is designed to be lightweight, relying on minimal external packages.

Package Version Required Purpose
HtmlAgilityPack 1.12.4+ Yes Used for robust parsing of HTML strings into a navigable DOM tree. Handles malformed HTML gracefully.
Microsoft.Extensions.DependencyInjection 10.0.6+ Yes Provides core abstractions for the native DI container implementation, enabling seamless integration with ASP.NET Core and Worker Services.

There are no transitive unmanaged dependencies (no native .dll or .so files required).


Code Policies and Contribution Guidelines

We enforce strict engineering standards to ensure high performance and maintainability.

Development Standards

  • Coding Standards: Follow standard Microsoft C# coding conventions. Use C# 10.0+ language features (e.g., file-scoped namespaces, pattern matching, global usings).
  • Naming Conventions:
    • Interfaces: IPascalCase
    • Classes/Records/Structs: PascalCase
    • Private Fields: _camelCase
    • Local Variables/Parameters: camelCase
  • SOLID Principles: Architecture is highly decoupled. Layout engines (BlockLayoutEngine, TableLayoutEngine) are strictly separated from PDF writing constructs (PdfDocumentWriter).
  • Folder Structure:
    • HtmlToPdfDotNet.Library/Commons/: Static helpers and constants.
    • HtmlToPdfDotNet.Library/Models/Layout/: Layout tree and rendering primitives.
    • HtmlToPdfDotNet.Library/Models/Styles/: CSS parsing and cascade resolution.
    • HtmlToPdfDotNet.Library/Models/Writer/: PDF binary generation (XRefs, streams).
    • HtmlToPdfDotNet.Library/Models/Fonts/: Custom TTF/OTF font loading, parsing, and subsetting to minimize PDF size.
    • HtmlToPdfDotNet.Library/Models/Imaging/: Native decoding of JPEG and PNG images and PDF Image XObject generation.

Process Guidelines

  • Unit Testing: All new features must include xUnit tests. Code coverage for HtmlToPdfDotNet.Library must remain above 85%. Use InternalsVisibleTo to test internal layout behaviors.
  • Error Handling: Use exceptions only for truly exceptional circumstances (e.g., missing files, I/O failures). Use standard fallback mechanisms for CSS parsing errors (e.g., reverting to auto or default colors).
  • Logging: Do not use Console.WriteLine in the library. If diagnostics are required, inject an ILogger<T> instance.
  • Performance: Avoid excessive allocations. Use Span<T> and ReadOnlySpan<char> for text and CSS parsing. Use static readonly dictionaries for lookup tables.

Source Control Strategy

  • Branching: We use standard GitFlow. main contains production-ready code. Development happens on feature/* branches.
  • Pull Requests: PRs must include an updated CHANGELOG, pass all GitHub Actions CI tests, and require at least one approving review from a maintainer.
  • Versioning: We follow strict Semantic Versioning (SemVer).

Security

  • HTML input is treated as untrusted. External resources (images referenced via src) are only resolved if they fall under the configured BasePath or are valid data: URIs, preventing SSRF attacks.

Advanced Examples

ASP.NET Core Integration with Razor Views

A highly effective pattern for generating professional PDFs in ASP.NET Core is rendering a Razor view (.cshtml) to an HTML string in memory, and then passing it to HtmlToPdfDotNet. This allows you to use the exact same template and CSS styles for both in-browser HTML previewing and offline PDF generation.

This architectural pattern is fully implemented in the HtmlToPdfDotNet.Example project.

1. Register Services in Program.cs

Register the MVC controller views, view-renderer helper, and the generator as a Singleton:

using HtmlToPdfDotNet.Library;
using HtmlToPdfDotNet.Library.Models.Layout;

var builder = WebApplication.CreateBuilder(args);

// MVC + Razor Views
builder.Services.AddControllersWithViews();

// Register a service to render Razor views to strings (see below)
builder.Services.AddScoped<IRazorViewRenderer, RazorViewRenderer>();

// Register the PDF Generator as a Singleton
builder.Services.AddSingleton<IPdfGenerator>(_ =>
    new PdfGenerator(new ConversionOptions
    {
        Page            = PageLayout.A4,
        CompressStreams = true
    }));

2. The Razor View Renderer

Use the standard ASP.NET Core view engine to render templates into raw HTML strings asynchronously:

public interface IRazorViewRenderer
{
    Task<string> RenderToStringAsync<TModel>(string viewName, TModel model);
}

public class RazorViewRenderer(
    IRazorViewEngine viewEngine,
    ITempDataProvider tempDataProvider,
    IServiceProvider serviceProvider) : IRazorViewRenderer
{
    public async Task<string> RenderToStringAsync<TModel>(string viewName, TModel model)
    {
        var httpContext = new DefaultHttpContext { RequestServices = serviceProvider };
        var actionContext = new ActionContext(httpContext, new RouteData(), new ActionDescriptor());

        using var sw = new StringWriter();
        var viewResult = viewEngine.FindView(actionContext, viewName, isMainPage: true);

        if (!viewResult.Success)
            throw new InvalidOperationException($"View '{viewName}' not found.");

        var viewData = new ViewDataDictionary<TModel>(new EmptyModelMetadataProvider(), new ModelStateDictionary()) { Model = model };
        var tempData = new TempDataDictionary(httpContext, tempDataProvider);
        var viewContext = new ViewContext(actionContext, viewResult.View, viewData, tempData, sw, new HtmlHelperOptions());

        await viewResult.View.RenderAsync(viewContext);
        return sw.ToString();
    }
}

3. The Web API Controller

Expose endpoints that either render the HTML in-browser (excellent for design and debugging CSS) or generate a downloadable PDF:

[ApiController]
public class ReportController(IRazorViewRenderer renderer, IPdfGenerator pdfGenerator) : ControllerBase
{
    // GET /report — Preview HTML in browser
    [HttpGet("report")]
    [Produces("text/html")]
    public async Task<ContentResult> GetReportHtml()
    {
        var model = new SalesReportModel();
        string html = await renderer.RenderToStringAsync("Report/SalesReport", model);
        return Content(html, "text/html");
    }

    // GET /report/pdf — Download the same report as PDF
    [HttpGet("report/pdf")]
    [Produces("application/pdf")]
    public async Task<FileContentResult> GetReportPdf()
    {
        var model = new SalesReportModel();
        string html = await renderer.RenderToStringAsync("Report/SalesReport", model);
        
        // Convert to PDF byte array directly
        byte[] pdfBytes = pdfGenerator.Convert(html);
        
        return File(pdfBytes, "application/pdf", "SalesReport.pdf");
    }
}

Loading and Embedding Images

HtmlToPdfDotNet supports loading images either by relative physical paths or by embedding them as Base64 Data URIs.

Option A: Embedding as Base64 Data URIs (Recommended)

This approach is extremely robust for web environments because it eliminates path-resolution issues and runtime base-path complications across environments (development, production containers, etc.).

// Inside your View Model
public string LogoDataUri => GetLogoAsBase64();

private string GetLogoAsBase64()
{
    string filePath = Path.Combine(AppContext.BaseDirectory, "wwwroot", "images", "logo.png");
    byte[] bytes = File.ReadAllBytes(filePath);
    return $"data:image/png;base64,{Convert.ToBase64String(bytes)}";
}

Then, reference the property directly in your .cshtml view:

<img src="@Model.LogoDataUri" alt="Company Logo" />

Option B: Loading via BasePath

If your HTML contains relative physical image paths (e.g. <img src="images/logo.png" />), you must configure the BasePath property inside the conversion options:

var options = new ConversionOptions
{
    BasePath = "/var/www/html/assets/", // Resolves images relative to this base directory
    Page = PageLayout.A4
};

var generator = new PdfGenerator(options);
string html = @"<img src='images/logo.png' width='200' />"; 
generator.WritePdfFile(html, "report.pdf");

Troubleshooting

Common Installation Problems

  • Error: NU1202: Package HtmlToPdfDotNet.Library is not compatible with net8.0
    • Fix: This library strictly targets net10.0. You must upgrade your project to .NET 10.0 or higher.

Runtime Issues

  • Images are not rendering (Blank spaces in PDF)
    • Fix: Ensure the src attribute is either an absolute path, a valid Base64 string (data:image/png;base64,...), or that you have configured ConversionOptions.BasePath for relative paths.
  • Text overlapping or incorrect font sizes
    • Fix: By default, HTML assumes 96 DPI, while PDF uses 72 DPI. The library automatically scales px to pt (1px = 0.75pt). Ensure your CSS explicitly uses pt or px consistently.

Debugging Tips

  • If the layout looks wrong, try disabling CompressStreams = false in ConversionOptions. This will output a raw, human-readable PDF file. You can then open the .pdf file in a text editor to inspect the raw PDF commands (e.g., BT, Tf, cm) being emitted.

FAQ

Q: Does it support JavaScript execution? A: No. HtmlToPdfDotNet is a static HTML/CSS layout engine. It does not execute JavaScript. If you need JS rendering (like React/Angular SPAs), you must pre-render the HTML before passing it to the library.

Q: Can I use TailwindCSS or Bootstrap? A: Yes, but keep in mind that the library supports a subset of CSS. Complex flexbox or grid layouts might degrade gracefully into block layouts. It is recommended to use standard block, inline-block, and table layouts for maximum compatibility.

Q: Why is the generated PDF larger than expected? A: Make sure CompressStreams = true is enabled in your options. If you are embedding custom TTF fonts, ensure the subsetter is actively trimming unused glyphs.


License

This project is licensed under the MIT + Commercial License model.

  • MIT License: You are free to use, modify, and include this software in commercial or non-commercial products.
  • Commercial Clause: You may NOT sell this library as a standalone SaaS product (e.g., a "PDF Generation API as a Service").

Please refer to the LICENSE file in the repository root for the full legal text.

About

Generate PDFs from HTML directly in .NET.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors