Skip to content

Streaming: pick filewriter / consider filewriter requirements in detail #2

Description

@rerpha

Currently impeded by #20

we should consider filewriter requirements in detail and decide whether one of the existing filewriters can be used or modified, or whether a new one should be written.

Bearing in mind that I do not think we should run any of this software on Windows as HRPD-X is going to require a linux machine(s?) anyway to run Kafka/Redpanda.


ESS one

Pros:

  • Is currently used by ESS
  • Supports the "template" architecture which is flexible ie. you can write a nexus file in whatever structure you'd like

Cons:

  • Is C++ and therefore not as memory safe as rust, much more difficult to develop
  • Quite out of date in terms of conan etc.
  • the template architecture is probably more flexible than we need it to be - ISIS files are laid out pretty much the same on every instrument
  • duplicated work across ISIS if we choose a different filewriter to SuperMUSR

Supermusr one

Pros:

  • Is written in rust so benefits from memory safety, built in dependency management/testing/packaging etc.
  • Is being used by supermusr

Cons:

  • needs some work to allow some templated streamed items such as blocks
  • ev44 (events) support is not the best tested as i've only run it on EMMA-B and POLREF.

Differences

ESS one uses a central pooled architecture, rust one does not, this doesn't really matter for us as we'd probably be running one per instrument but this is up for debate


Some possible requirements

(all up for debate)

  • Support for writing histograms, events, or both (configurable).
    • Histogram file bin boundaries to be configurable independent of kafka_dae_diagnostics bin boundaries, multiple time regimes.
    • Spectrum-mapping: similar to time channel boundaries, configurable independent of kafka_dae_diagnostics.
  • Support for writing "intermediate" (How do we handle writing intermediate files / autosave files? #13) & "autosave" files
  • Support for schemas used by SuperMuSR pipeline and neutron pipeline (assuming we want filewriter to cater for both muon & neutron data).
    • Including both "input" .fbs schemas and "output" .nxs file layout
  • Support for hs00 if monitors may emit histograms from hardware in future
  • Support for non-time-series data (where only the most recent value gets written)
  • Support for Data streaming: write journals #11 (this could be done by a separate process reacting to a file-writer wrdn message, as long as the wrdn contains sufficient metadata).
  • Support for triggering data-archiving steps (this could be done by a separate process reacting to a file-writer wrdn message, as long as the wrdn contains sufficient metadata).
  • Support for writing frame vetos and veto masks alongside each frame's event data (so that a veto can be retroactively disabled in post-processing)
  • Needs to be able to "go back" in Kafka stream to get most recent sampleEnv update for each block at run start
  • Consider in detail architecture (e.g. pooled or not) and deployment considerations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    research/needs discussion

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions