Skip to content

Latest commit

 

History

History
247 lines (197 loc) · 6.5 KB

File metadata and controls

247 lines (197 loc) · 6.5 KB

xmlschema-rs: Rust Port of Python xmlschema Package

This document tracks the progress of porting the Python xmlschema package to Rust.

Reference

  • Python Source: sissaschool/xmlschema
  • License: MIT
  • Python Package Stats: 79 Python source files, ~50k+ lines of code

Current Progress

Overall: ~85% complete Current Stage: Phase 5 - Polish & Remaining Features

Completed Features

Core Infrastructure

  • Error types and error handling
  • Namespace handling with QName support
  • XML name validation (NCName, QName)
  • Resource loading (file-based)
  • Security limits and constraints
  • Module structure

XSD Parsing (Complete)

  • Schema parsing from file and string
  • Simple type parsing (atomic, list, union)
  • Complex type parsing
  • Element declarations
  • Attribute declarations
  • Attribute groups
  • Model groups (sequence, choice, all)
  • Group references
  • Type restrictions and extensions
  • Forward reference resolution

Type System (Complete)

  • Built-in XSD types (string, integer, decimal, date, etc.)
  • Simple type restrictions
  • Complex types with simple/complex content
  • Type derivation (extension/restriction)
  • Qualified name resolution

Facets (Complete)

  • enumeration
  • pattern (regex)
  • length, minLength, maxLength
  • minInclusive, maxInclusive, minExclusive, maxExclusive
  • totalDigits, fractionDigits
  • whiteSpace

Content Models (Complete)

  • Sequence compositor
  • Choice compositor
  • All compositor
  • Mixed content
  • Occurrence constraints (minOccurs/maxOccurs)
  • ModelVisitor state machine

Wildcards (Complete)

  • Any element wildcard
  • Any attribute wildcard
  • Namespace constraints
  • Process contents (strict/lax/skip)

Document Validation (Complete)

  • Element validation
  • Attribute validation
  • Content model validation
  • Simple type value validation
  • ValidationContext with error collection
  • Validation modes (strict/lax)

Identity Constraints (Complete)

  • Unique constraints
  • Key constraints
  • Keyref constraints
  • Selector/field XPath evaluation

XSD 1.1 Features (Complete)

  • Assertions (assert/report elements)
  • Basic XSD 1.1 parsing

Data Converters (Complete)

  • Parker convention
  • BadgerFish convention
  • Unordered converter

XPath (Complete)

  • XPath expression evaluation
  • Identity constraint selectors

Schema Export (Complete)

  • JSON export of schema structure
  • Python parity for schema dumps

Remaining Work

HTTP/Network Support

  • HTTP/HTTPS schema loading
  • URL resource resolution
  • Schema caching from remote sources

Schema Composition

  • xs:include resolution across files
  • xs:import with namespace mapping
  • xs:redefine support
  • Circular import detection

Advanced Features

  • Substitution groups
  • Default/fixed value application during validation
  • Full conditional type assignment (XSD 1.1)
  • xsi:type handling
  • xsi:nil handling for nillable elements

CLI Tool

  • Validate command
  • Convert command (XML to JSON)
  • Inspect command (schema introspection)
  • Download schemas command

Polish

  • Performance optimization
  • Memory optimization
  • Documentation improvements
  • More extensive error messages

Phase Overview

Phase 1: Project Setup & Infrastructure [COMPLETE]

  • Clone Python reference code
  • Initialize Rust cargo project
  • Set up Cargo.toml with dependencies
  • Create module structure
  • README.md
  • Documentation infrastructure

Phase 2: Core Validators [COMPLETE]

  • Base validator infrastructure
  • Simple type validators
  • Facet validators
  • Complex type validators
  • Element validators
  • Attribute validators

Phase 3: Schema Structure [COMPLETE]

  • Model groups
  • Content models
  • Wildcards
  • Schema component parsing
  • Forward reference resolution

Phase 4: Advanced Features [COMPLETE]

  • Identity constraints
  • XSD 1.1 assertions
  • Document validation

Phase 5: Data Conversion [COMPLETE]

  • Converter framework
  • Parker converter
  • BadgerFish converter
  • Unordered converter

Phase 6: XPath & Navigation [COMPLETE]

  • XPath expression evaluation
  • Schema context evaluation

Phase 7: Polish & Remaining [IN PROGRESS]

  • HTTP/HTTPS loading
  • CLI tool commands (inspect, xml2json, validate)
  • Schema composition (include/import)
  • Substitution groups
  • Performance optimization

Testing Status

Comparison Testing

  • Schema dump comparison framework
  • Python parity validation
  • Book.xsd comparison test (passing)

Real-World Schema Tests

  • DITA schema bundle tests
  • NISO schema bundle tests

Unit Tests

  • Per-module functionality tests
  • Integration tests

Remaining Tests

  • W3C XSD 1.0 conformance suite
  • W3C XSD 1.1 conformance suite
  • Property-based testing
  • Performance benchmarks

Implementation Notes

Key Design Decisions

  1. XML Parser: Using roxmltree for DOM-like access
  2. Error Handling: thiserror for error types
  3. Type Safety: Arc<dyn SimpleType + Send + Sync> for thread-safe type references
  4. Memory: Arc/Rc for shared references, cloning where needed
  5. API Style: Similar to Python where idiomatic in Rust

Rust Advantages Leveraged

  1. Type safety: Compile-time error catching
  2. Performance: Faster validation than Python
  3. Memory safety: No memory leaks
  4. Concurrency: Thread-safe type system with Send + Sync

Session Notes

Session 1 (2025-12-28)

  • Cloned Python reference repository
  • Initialized Rust project
  • Created TODO tracking document

Session 2 (2025-12-29)

  • Implemented core XSD parsing
  • Added type system and facets
  • Implemented content model validation
  • Added document validation
  • Implemented data converters
  • Achieved Python parity for schema dumps
  • Updated README and TODO documentation

Session 3 (2025-12-29)

  • Implemented CLI tool with clap
  • Added inspect command for schema introspection (JSON output, element/type lookup)
  • Added xml2json command with multiple formats (default, parker, badgerfish, unordered)
  • Added validate command with strict/lax modes
  • Created DITA/NISO bundle comparison test infrastructure with static facts

Last Updated: 2025-12-29