Conversation
- read_all_channels_sorted_record: replace chunk loop with single readinto() into a pre-allocated recarray (zero-copy, writeable) — the main win: T3 1.7 GB file drops from 1.5s to 0.4s (4x faster) - DZ transpose: arr.T.tobytes() instead of .copy() + tobytes() (one fewer intermediate allocation) - Vectorize sign-extension in _apply_unsorted_bit_masking using np.where(negative, bitwise_or(temp_u, sign_extend), temp_u) - SI block cache in Info4._si_cache (keyed by file pointer) to skip duplicate Source Information block reads - Add SymBufReader cdef class to dataRead.pyx: bidirectional-buffered file wrapper that fills its C-level buffer centred on the current position, matching the mdfr Rust SymBufReader design; activated for all Info4 metadata reads via _SymBufReader import - Bump version to 4.3 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nel files Adds read_cn_chain_fast() to dataRead.pyx: a Cython function that reads the entire MDF4 CN linked list using POSIX pread() (zero Python file-object dispatch), C packed structs + memcpy for zero-copy parsing, and a fast <TX>...</TX> bytes scan replacing lxml.objectify for the common MD block pattern. Falls back to full Python CCBlock for complex cc_type 3/7-11. Benchmarks (3-run best): test.mf4 (36k channels): 0.90s → 0.61s (3.1x total from 1.9s baseline) T3 (480 channels): 0.40s → 0.33s (4.5x total from 1.5s baseline) mdfinfo4.py: import read_cn_chain_fast; modify read_cn_blocks() to use fast Cython path for files with fileno() (raw open() or SymBufReader), falling back to the Python path otherwise. Post-processing handles composition blocks (CA/CN/DS/CL/CV/CU), VLSD/VLSC detection, and CC completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
README.md:
- Add Performance section documenting read_cn_chain_fast, SymBufReader and
vectorised data reading with benchmark table (1.9s → 0.6s on 36k channels)
- Expand Requirements into structured table; clarify Cython fallback
- Rewrite Installation with build-from-source steps
- Convert channel-structure list to a table; document masterChannelList
- Memory-saving options expanded into descriptive bullets
mdfinfo4.py:
- Module docstring rewritten: explains fast vs. fallback path, key classes,
design constraints (why CC val/ref and composition stay in Python)
- Info4 class docstring: full Attributes section including _si_cache,
complete dictionary layout table with all top-level keys and their meaning
dataRead.pyx:
- Fast-reader section header expanded with technique summary and design
constraints
- All six C packed structs documented with field-level comments including
value enumerations, bit flags, and byte-offset rationale
- _fast_read_tx_or_md, _fast_read_si docstrings expanded
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
conf.py:
- Bump version to 4.3, copyright year to 2025
- Switch theme to sphinx_rtd_theme
- Add sphinx.ext.viewcode, autosummary; configure autodoc_default_options
- Add intersphinx mappings (Python, NumPy)
- Remove missing _static path warning
docs/index.rst:
- Add quick-start code example and pip/source installation snippet
- Architecture table mapping each module to its responsibility
- Channel dict structure table
- Integrated 'performance' page into toctree
docs/performance.rst (new):
- Benchmark table: 1.9s → 0.6s on 36k-channel file
- Detailed explanation of all three optimisations:
pread()+C-structs CN chain reader, SymBufReader, single-call readinto()
- How to verify the fast path is active
Per-module index.rst files:
- mdfreader: Mdf vs MdfInfo purpose, typical usage snippets
- mdf: channel dict layout table, field constant reference
- mdfinfo4: reading-path comparison table, Info4 dict structure example
- mdf4reader: key classes, data block type table, conversion type table
- mdfinfo3: MDF3 block key reference
- mdf3reader: MDF3 vs MDF4 differences
- channel: method reference table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
mdfreader 4.3 — Release Notes
Performance
Up to 4× faster metadata parsing for large MDF4 files.
A new Cython
SymBufReaderreplaces Python's file-object dispatch with abidirectional 64 KB buffered reader, cutting syscall overhead on files with
many data groups and channel groups.
Up to 3× faster CN/CC/SI/TX chain reading (Cython fast path).
The hot loop that walks channel-name, conversion, and source-information linked
lists is now implemented in Cython using POSIX
pread(), C packed structs, anda zero-copy
<TX>bytes scan instead oflxml.objectify. On files with ~36 000channels the total open time drops from ~1.9 s to ~0.6 s. The Python fallback
path is kept automatically when the Cython extension is not available.
Bug Fixes
import mdfreaderno longer fails when scipy is notinstalled; the import is now lazy (only
resample()requires it).Documentation
tables, and a new Performance page documenting the Cython optimisations
and how to verify the fast path is active.
mdfinfo4.pyanddataRead.pyxnow carry comprehensive docstrings coveringthe on-disk block layout, fast-path design constraints, and the SI-cache
strategy.
Packaging
license_filesdeclared insetup.cfg,pyproject.tomlreduced to build-system table only, resolving
License-Fileheader rejectionby twine/PyPI.