Bug Fixes:
- Fixed
show_metadatafailing for records where the CMA GDC omitschannelfields in links (e.g. Kazakhstan synop record).show_metadatanow looks up the record directly frommerged_records()rather than relying on a per-session flat index. - Fixed
_build_merged_records()discrepancy detection to also flag records where link sets differ between catalogues. - Fixed
KeyError: 'datasets'in_insert_channel()when a channel string is a prefix of another (e.g.cache/a/wis2/centreandcache/a/wis2/centre/data/...). Root cause was overwriting thechildrendict; fixed usingsetdefaultfor each level independently. - Fixed catalogue search breaking after
render()was incorrectly changed toasync def— synchronous callers inshow_view()cannotawaitit. - Fixed confirmation dialog rendering too narrow (292 px). Root cause:
.dialog-confirmclass was onui.scroll_areainstead ofui.card, and buttons were inside the scroll area.
Architecture:
merged_records()andtopic_hierarchy()are now module-level caches indata.py, computed once at the end ofscrape_all(). Previously_build_merged_records()was called on everymerged_records()invocation.- Added
get_datasets_for_channel(channel)— strips trailing/#, navigates the topic hierarchy, and recursively collects all datasets from that node and its descendants. - Removed
state.features— the module-level_topic_hierarchyindata.pyis now the single source of truth for channel→dataset mapping. AppStatereduced toselected_topicsonly;features,selected_datasets, andtree_widgetfields removed.- Link merging:
_build_merged_records()unions channel-bearing links from all GDCs onto the primary record, so channel data present in any catalogue is preserved on the merged record.
New Features:
- Subscription confirmation dialog: before sending to the subscription manager, a dialog shows the full JSON payload (pretty-printed) with Cancel / Confirm buttons.
- Catalogue dataset filter locks to the selected record and is non-editable when a dataset is selected from search results.
- Custom filters (derived from MQTT link metadata
filterskey) are shown in the right sidebar only when selecting from catalogue search results; hidden in tree view. - Tree view: single-topic selection enforced via
on_select(native NiceGUI/Quasar behaviour). Usingon_tickwas rejected as it requires a timer race to reset internal state. - Tree view: topics sorted alphabetically at every level of the hierarchy.
Changed Files:
modules/ui/data.py— added_merged_records,_topic_hierarchy,_insert_channel(),_collect_datasets(),_build_topic_hierarchy(),get_datasets_for_channel(),topic_hierarchy();scrape_all()rebuilds both caches on completionmodules/ui/views/tree.py— replacedput_in_dicctree-builder with_to_tree_nodes(); switched toon_select; alphabetical sortingmodules/ui/views/catalogue.py— removedstate.featurespopulation; passdataset_idtoon_topics_picked; revertedrendertodefmodules/ui/views/shared.py— removedstate.featuresfromclean_page;on_topics_pickedacceptsdataset_idparam;show_metadatalooks up frommerged_records(); addedconfirm_subscribe()dialogmodules/ui/main.py—AppStatereduced toselected_topicsmodules/ui/assets/base.css— added.dialog-scrolland.dialog-confirmsizing rules
Fixed Critical/High Issues:
- Path traversal in downloaded filename - Added directory traversal check in
wis2.py:379-384 - MQTT TLS certificate not validated - Using
certifi.where()insubscriber.py - Redis has no password protection - Added
--requirepassandREDIS_PASSWORDrequired for all clients - Flask debug mode enabled in production - Controlled by
FLASK_DEBUGenv var, default false - Weak default Flask secret key -
FLASK_SECRET_KEYnow required, application fails if missing - Dynamic hash algorithm selection - Whitelist
ALLOWED_HASH_METHODS(SHA-256/384/512, SHA3 variants) - Celery serialization not restricted - JSON-only serialization in
worker.py - Exception details exposed - Generic error messages, details logged server-side only
Security Features Added:
- Hash algorithm whitelist prevents arbitrary algorithm injection
- Path boundary checks prevent directory traversal attacks
- MQTT TLS validation via certifi CA bundle
- Redis authentication required for all connections
- Atomic file operations prevent partial/corrupt writes
Fixed Issues:
- Missing graceful shutdown signal handlers - Added SIGTERM/SIGINT handlers in
manager.py - GLOBAL_BROKER_HOST not validated - Application exits with code 1 if not set
- Unsafe nested dictionary access - Using
.get()chains throughoutwis2.py - CONTAINER_DATA_PATH/DATA inconsistency - Standardized on
CONTAINER_DATA_PATH - CACHE_EXCLUDE_LIST parsing bug - Fixed with list comprehension
- No Celery result expiration - Added
result_expires = 86400 - TOCTOU race condition in file write - Atomic temp file + rename pattern
- Missing Docker healthchecks - Added healthchecks for all services in
docker-compose.yaml - Subscriber not resubscribing on restart - Added
load_persisted_subscriptions()inmanager.py - Unvalidated environment variable type conversions - Added try/except validation
Changes:
modules/shared/shared/redis_client.py- REDIS_PASSWORD now required, fails on startup if missingmodules/subscription_manager/subscription_manager/app.py- FLASK_SECRET_KEY requiredmodules/task_manager/task_manager/worker.py- JSON-only serialization, result expirationmodules/task_manager/task_manager/tasks/wis2.py- Path traversal checks, hash whitelist, atomic writesmodules/subscriber/subscriber/manager.py- Signal handlers, subscription reloaddocker-compose.yaml- Redis password, healthchecks for all services
Removed:
- Redis Sentinel cluster (3 sentinels)
- Redis replicas (2 replicas)
- Static IP network configuration (172.28.0.0/16)
- Sentinel-specific environment variables (
REDIS_SENTINEL_HOSTS,REDIS_PRIMARY_NAME) - Celery transport options (
CELERY_BROKER_TRANSPORT_OPTIONS,CELERY_RESULT_BACKEND_TRANSPORT_OPTIONS) /containers/redis-sentinel/directory
Changed:
docker-compose.yaml- Single Redis instance, simplified networkdefault.env- Direct Redis connection variablesmodules/shared/shared/redis_client.py- Removed Sentinel support, direct connection onlymodules/task_manager/task_manager/worker.py- Direct Redis URLs- All documentation updated to reflect single Redis architecture
Rationale: Sentinel was over-engineered for most deployment scenarios. Single Redis provides:
- Simpler configuration and debugging
- Reduced resource usage (5 fewer containers)
- Faster startup time
- Easier local development
Problem: Refactor was left incomplete 2 months ago. Services wouldn't start due to missing imports and undefined variables.
Fixed:
modules/subscriber/setup.py- Created with entry pointsubscriber_startmodules/subscriber/subscriber/__init__.py- Added exportsmodules/subscriber/subscriber/manager.py- Addedfrom uuid import uuid4modules/subscriber/subscriber/command_listener.py- Addedimport os, uses shared Redis clientmodules/subscription_manager/subscription_manager/__init__.py- Removed broken subscriber importsmodules/subscription_manager/subscription_manager/app.py- Fixed delete endpoint, added COMMAND_CHANNELcontainers/subscriber/Dockerfile- Created new Dockerfile for subscriber servicedocker-compose.yaml- Updated subscriber-france to use new Dockerfile
Problem: Duplicated Redis client code across modules, no single Redis fallback.
Fixed:
modules/shared/- Created shared module withredis_client.pymodules/shared/shared/redis_client.py- Centralized Redis client with Sentinel support and single Redis fallback- Updated all modules to import from shared
- Updated all Dockerfiles to install shared module
Problem: Sentinel failover not working in Docker Compose due to hostname resolution issues causing repeated tilt mode.
Root Cause: When a container stops, Docker DNS removes its hostname. Sentinel's resolve-hostnames yes caused constant resolution failures and tilt mode re-entry, blocking failover.
Solution: Implemented static IPs for all Redis components.
Changes:
docker-compose.yaml- Addedredis-netnetwork with subnet 172.28.0.0/16- redis-primary: 172.28.0.10
- redis-replica-1: 172.28.0.11
- redis-replica-2: 172.28.0.12
- redis-sentinel-1: 172.28.0.20
- redis-sentinel-2: 172.28.0.21
- redis-sentinel-3: 172.28.0.22
containers/redis-sentinel/sentinel.conf- Uses static IP instead of hostname:sentinel monitor redis-primary 172.28.0.10 6379 2 sentinel down-after-milliseconds redis-primary 5000 sentinel failover-timeout redis-primary 60000 protected-mode no
Result: Failover completes in ~2 seconds.
Problem: After Redis failover, subscriber's command_listener didn't reconnect properly.
Root Cause: pubsub object created once in __init__, connection error handler just slept and continued without recreating pubsub or resubscribing.
Fix: Added _reconnect() method to command_listener.py:
def _reconnect(self):
"""Recreate pubsub and resubscribe after connection failure."""
try:
self.pubsub.close()
except Exception:
pass
self.pubsub = self.redis.pubsub(ignore_subscribe_messages=True)
self.pubsub.subscribe(self.channel)
LOGGER.info(f"Reconnected and resubscribed to channel: {self.channel}")Result: Subscriber auto-reconnects after failover.
Fixed:
- Subscription storage uses Redis only (removed SQLite)
- Cache blacklist made configurable via CACHE_BLACKLIST env var
- Celery worker configuration improved with proper JSON parsing
- Fixed broker/backend transport options JSON defaults
Fixed:
- Improved file type detection
- Added STATUS_QUEUED constant
- Added centre_id extraction to result dict
- Added dataset field to result
- Added media-type filtering via subscription filters
Media-type Filtering:
- Subscriptions can now include
filters.media_types(list of allowed types) - Filter is applied early in workflow, before download
- Uses fnmatch for wildcard support (e.g.,
application/x-grib*)
Fixed:
- All metrics renamed with
wis2_prefix for consistency:wis2_notifications_receivedwis2_notifications_skippedwis2_downloads_failedwis2_downloads_totalwis2_downloads_bytes_totalwis2_celery_queue_length
- Fixed Prometheus multiprocess mode initialization
- Added
multiprocess_mode='livesum'to Gauge metrics - Fixed shared volume between celery and subscription-manager containers
- Added Grafana service with auto-provisioned Prometheus/Loki datasources
Multiprocess Mode Fix:
- subscription-manager clears
/tmp/prometheus_metrics/on startup - celery depends_on subscription-manager for proper startup order
- Both containers share prometheus-metrics-data volume
Fixed:
- Created centralized logging in
modules/shared/shared/logging.py - All modules updated to use
setup_logging()from shared - Removed unused imports across all modules
- Removed commented/dead code
- Fixed credential logging security issue (was logging passwords)
- Changed verbose log messages from WARNING to DEBUG
Added:
- Main project
README.adocwith architecture overview docs/admin-guide.adoc- Deployment, configuration, monitoringdocs/user-guide.adoc- Subscriptions, filtering, common use casesdocs/api-reference.adoc- REST API documentationdocs/developer-guide.adoc- Architecture, modules, extending- Module READMEs for shared, subscriber, subscription_manager, task_manager
- Apache 2.0 license to all modules and main repo
Updated:
openapi.yml- Complete rewrite with filters, correct schemas, examples- Wildcard support for media type filtering (fnmatch)
- Fixed WIS2 topic examples (correct centre-id format with ISO2C prefix)
Cleaned up:
- Removed duplicate
config/redis-sentinel/sentinel.conf - Fixed sentinel.conf permissions
- Fixed CRLF line endings