Change stats interval from 5s to 1s for better real-time monitoring by carlhoerberg · Pull Request #1195 · cloudamqp/lavinmq

carlhoerberg · 2025-07-23T20:07:04Z

By changing from 5s to 1s stats interval the charts looks a lot better, and we can decrease resolution to integers (from 1 decimal floats), as you can't publish less than 1 msg per second (before we showed 0.2 msgs/s per 5s period).

Before the stats interval was configuratable, but it didn't update the UI side, so it wasn't really possible to use.

Update config default from 5000ms to 1000ms
Adjust stats log size to maintain 10-minute history
Improve stats collection timing with loop measurement and logging
Change rate calculation from float to uint32 for performance
Update all frontend polling intervals to match new backend rate
Optimize queue declaration in performance test

claude · 2025-07-25T22:57:09Z

Claude finished @carlhoerberg's task —— View job

PR Review: Change stats interval from 5s to 1s

I've conducted a comprehensive review of this PR focusing on code quality, potential issues, performance, security, and test coverage.

Todo List:

Code Quality & Best Practices ✅

Configuration Changes (src/lavinmq/config.cr:42):

Clean update from stats_retention_seconds = 600 maintains 10-minute history
Good naming choice - stats_retention_seconds is clear and self-documenting

Server Logic (src/lavinmq/server.cr:377-409):

Excellent improvements to stats collection loop with proper timing measurement
Smart performance logging with different thresholds (error >1000ms, warn >100ms, info >1ms)
Proper sleep calculation to maintain precise 1-second intervals (sleep 1.second - loop_time)

Data Type Optimization (src/lavinmq/stats.cr:9-10):

Well-justified change from Float64 to UInt32 for rates - since you can't publish <1 msg/sec, integers are appropriate
Consistent use of UInt32 throughout rate calculations and logs

Frontend Consistency (all JS files):

All polling intervals correctly updated to 1000ms across 5 files
Maintains consistent user experience

Performance Considerations ⚠️

Positive Impacts:

UInt32 vs Float64 reduces memory usage by ~50% for rate storage
Better real-time monitoring experience
Performance test optimization with no_wait: true in queue declarations

Potential Concerns:

5x increase in polling frequency (1s vs 5s) increases:
- CPU usage for stats collection
- Network traffic from frontend
- Database/storage I/O for metrics
Memory usage for logs remains the same (600 entries vs 120 previously)

Recommendation: Monitor CPU usage in production, especially on high-traffic systems.

Potential Issues ⚠️

Race Conditions: The stats collection loop timing looks robust, but the 5x frequency increase could expose timing issues under very high load
Log Rotation Logic (server.cr:328-344): The until @log.size < log_size pattern could be inefficient with frequent updates. Consider batching log cleanup.
Frontend Polling: No retry logic or backoff strategy for failed requests at higher frequency

Security Concerns ✅

No security issues identified
Changes are purely performance/monitoring related
No new attack surfaces introduced

Test Coverage ⚠️

What's Covered:

Configuration changes tested in flow_spec.cr
Performance test optimization included
Server spec properly updated

Missing Coverage:

No specific tests for the new 1-second timing behavior
No tests validating UInt32 rate calculations vs previous Float64
No frontend JavaScript test updates for new polling intervals

Recommendation: Add integration tests that verify:

# Test that stats are collected every 1 second
# Test that rate calculations work correctly with UInt32
# Test frontend polling behavior under network issues

Additional Observations

Code Consistency: ✅ Excellent - all related components updated consistently

Documentation: The PR description clearly explains the rationale and changes

Backwards Compatibility: ✅ No breaking API changes

Summary

This is a well-implemented improvement that successfully achieves better real-time monitoring. The code quality is high with good performance optimizations. Main concerns are around the increased resource usage from 5x polling frequency and some gaps in test coverage for the new timing behavior.

Recommendation: Approve with monitoring of production performance metrics after deployment.

claude · 2025-07-27T21:51:49Z

Claude finished @carlhoerberg's task —— View job

PR Review: Stats Interval Change from 5s to 1s

Todo List:

Code Quality & Best Practices ✅

Configuration Changes (src/lavinmq/config.cr:42):

Clean update to stats_log_size = 600 with descriptive comment "10 mins at 1s interval"
Maintains 10-minute history window as intended

Server Logic (src/lavinmq/server.cr:377-409):

Excellent improvements to stats collection loop with proper timing measurement
Smart performance logging with different thresholds:
- Error >1000ms (server.cr:394-395)
- Warn >100ms (server.cr:396-397)
- Info >1ms (server.cr:398-399)
Proper sleep calculation to maintain precise 1-second intervals (server.cr:401-403)
Good use of Time.measure for both overall loop time and individual collection phases

Data Type Optimization (src/lavinmq/stats.cr:9-10):

Well-justified change from Float64 to UInt32 for rates
Since you can't publish <1 msg/sec, integers are appropriate and more performant
Consistent use of UInt32 throughout rate calculations and logs
Memory usage reduction: UInt32 uses ~50% less memory than Float64

Frontend Consistency:

All polling intervals correctly updated to 1000ms across 5 JavaScript files
Maintains consistent user experience with backend changes

Performance Considerations ⚠️

Positive Impacts:

UInt32 vs Float64 reduces memory usage for rate storage
Better real-time monitoring experience for users
More responsive charts and UI updates

Potential Concerns:

5x increase in polling frequency (1s vs 5s) increases:
- CPU usage for stats collection (mitigated by optimized timing)
- Network traffic from frontend polling
- I/O operations for metrics collection
Memory usage for logs remains controlled (600 entries vs previous ~120)

Recommendation: Monitor CPU usage in production, especially on high-traffic systems.

Potential Issues ⚠️

Race Conditions: The 5x frequency increase could expose timing issues under very high load, though the timing improvements help mitigate this
Log Rotation Logic (server.cr:328-344, stats.cr:45-46): The until @log.size < log_size pattern could be inefficient with frequent updates - consider batching log cleanup
Frontend Polling: No retry logic or backoff strategy for failed requests at higher frequency

Security Concerns ✅

No security issues identified
Changes are purely performance/monitoring related
No new attack surfaces introduced

Test Coverage ⚠️

What's Covered:

Server spec properly updated (server_spec.cr) - removed unnecessary stats_interval manipulation
Test improvements in flow and queue specs (better test isolation)

Missing Coverage:

No specific tests for the new 1-second timing behavior
No tests validating UInt32 rate calculations vs previous Float64 behavior
No frontend JavaScript test updates for new polling intervals
No load testing to verify performance under high frequency polling

Recommendation: Add integration tests that verify:

# Test that stats are collected every 1 second
# Test that rate calculations work correctly with UInt32
# Test frontend polling behavior under network issues

Code Architecture Alignment ✅

The changes align well with LavinMQ's design principles:

Maintains the disk-first approach with minimal memory usage
Follows existing patterns for stats collection and logging
Consistent with the high-performance goals mentioned in CLAUDE.md

Summary

This is a well-implemented improvement that successfully achieves better real-time monitoring. The code quality is high with good performance optimizations and proper error handling.

Strengths:

Excellent timing measurement and logging in stats collection
Smart data type optimization (Float64 → UInt32)
Consistent updates across all frontend components
Maintains backward compatibility

Main Concerns:

Increased resource usage from 5x polling frequency
Gaps in test coverage for new timing behavior
Potential efficiency issues with frequent log rotation

Recommendation: Approve with monitoring of production performance metrics after deployment.

claude · 2025-07-28T14:32:59Z

Claude finished @carlhoerberg's task —— View job

PR Review: Stats Interval Change from 5s to 1s

Todo List:

Code Quality & Best Practices ✅

Configuration Changes (src/lavinmq/config.cr:42):

Clean update to stats_log_size = 600 with descriptive comment "10 mins at 1s interval"
Maintains 10-minute history window as intended

Server Logic (src/lavinmq/server.cr:377-409):

Excellent improvements to stats collection loop with proper timing measurement
Smart performance logging with different thresholds (error >1000ms, warn >100ms, info >1ms)
Proper sleep calculation to maintain precise 1-second intervals (sleep 1.second - loop_time)
Good use of Time.measure for both overall loop time and individual collection phases

Data Type Optimization (src/lavinmq/stats.cr:9-10):

Well-justified change from Float64 to UInt32 for rates
Since you can't publish <1 msg/sec, integers are appropriate and more performant
Consistent use of UInt32 throughout rate calculations and logs
Memory usage reduction: UInt32 uses ~50% less memory than Float64

Frontend Consistency (static/js/datasource.js:56):

All polling intervals correctly updated to 1000ms across frontend
Maintains consistent user experience with backend changes

Performance Considerations ⚠️

Positive Impacts:

UInt32 vs Float64 reduces memory usage for rate storage
Better real-time monitoring experience for users
More responsive charts and UI updates

Potential Concerns:

5x increase in polling frequency (1s vs 5s) increases:
- CPU usage for stats collection (mitigated by optimized timing)
- Network traffic from frontend polling
- I/O operations for metrics collection
Memory usage for logs remains controlled (600 entries vs previous ~120)

Recommendation: Monitor CPU usage in production, especially on high-traffic systems.

Potential Issues ⚠️

Race Conditions: The 5x frequency increase could expose timing issues under very high load, though the timing improvements help mitigate this
Log Rotation Logic (server.cr:328-344, stats.cr:45-46): The until @log.size < log_size pattern could be inefficient with frequent updates - consider batching log cleanup
Frontend Polling: No retry logic or backoff strategy for failed requests at higher frequency

Security Concerns ✅

No security issues identified
Changes are purely performance/monitoring related
No new attack surfaces introduced

Test Coverage ⚠️

What's Covered:

Server spec properly updated (server_spec.cr) - removed unnecessary stats_interval manipulation
Test improvements in flow and queue specs (better test isolation)

Missing Coverage:

No specific tests for the new 1-second timing behavior
No tests validating UInt32 rate calculations vs previous Float64 behavior
No frontend JavaScript test updates for new polling intervals
No load testing to verify performance under high frequency polling

Recommendation: Add integration tests that verify:

# Test that stats are collected every 1 second
# Test that rate calculations work correctly with UInt32
# Test frontend polling behavior under network issues

Code Architecture Alignment ✅

The changes align well with LavinMQ's design principles:

Maintains the disk-first approach with minimal memory usage
Follows existing patterns for stats collection and logging
Consistent with the high-performance goals mentioned in CLAUDE.md

Summary

This is a well-implemented improvement that successfully achieves better real-time monitoring. The code quality is high with good performance optimizations and proper error handling.

Strengths:

Excellent timing measurement and logging in stats collection
Smart data type optimization (Float64 → UInt32)
Consistent updates across all frontend components
Maintains backward compatibility
Proper removal of now-unused configuration

Main Concerns:

Increased resource usage from 5x polling frequency
Gaps in test coverage for new timing behavior
Potential efficiency issues with frequent log rotation

Recommendation: Approve with monitoring of production performance metrics after deployment.

claude · 2025-07-28T23:00:18Z

Claude finished @carlhoerberg's task —— View job

PR Review: Change stats interval from 5s to 1s

I've conducted a comprehensive review of this PR focusing on code quality, potential issues, performance, security, and test coverage.

Todo List:

Code Quality & Best Practices ✅

Configuration Changes (src/lavinmq/config.cr:42):

Clean update to stats_log_size = 600 with descriptive comment "10 mins at 1s interval"
Maintains 10-minute history window as intended
Good naming choice - stats_log_size is clear and self-documenting

Server Logic (src/lavinmq/server.cr:377-409):

Excellent improvements to stats collection loop with proper timing measurement
Smart performance logging with different thresholds (error >1000ms, warn >100ms, info >1ms)
Proper sleep calculation to maintain precise 1-second intervals (sleep 1.second - loop_time)
Good use of Time.measure for both overall loop time and individual collection phases

Data Type Optimization (src/lavinmq/stats.cr:9-10):

Well-justified change from Float64 to UInt32 for rates
Since you can't publish <1 msg/sec, integers are appropriate and more performant
Consistent use of UInt32 throughout rate calculations and logs
Memory usage reduction: UInt32 uses ~50% less memory than Float64

Frontend Consistency (static/js/datasource.js:56):

All polling intervals correctly updated to 1000ms across frontend
Maintains consistent user experience with backend changes

Performance Considerations ⚠️

Positive Impacts:

UInt32 vs Float64 reduces memory usage for rate storage
Better real-time monitoring experience for users
More responsive charts and UI updates

Potential Concerns:

5x increase in polling frequency (1s vs 5s) increases:
- CPU usage for stats collection (mitigated by optimized timing)
- Network traffic from frontend polling
- I/O operations for metrics collection
Memory usage for logs remains controlled (600 entries vs previous ~120)

Recommendation: Monitor CPU usage in production, especially on high-traffic systems.

Potential Issues ⚠️

Race Conditions: The 5x frequency increase could expose timing issues under very high load, though the timing improvements help mitigate this
Log Rotation Logic (server.cr:328-344, stats.cr:45-46): The until @log.size < log_size pattern could be inefficient with frequent updates - consider batching log cleanup
Frontend Polling: No retry logic or backoff strategy for failed requests at higher frequency

Security Concerns ✅

No security issues identified
Changes are purely performance/monitoring related
No new attack surfaces introduced

Test Coverage ⚠️

What's Covered:

Server spec properly updated (server_spec.cr) - removed unnecessary stats_interval manipulation
Test improvements in flow and queue specs (better test isolation)

Missing Coverage:

No specific tests for the new 1-second timing behavior
No tests validating UInt32 rate calculations vs previous Float64 behavior
No frontend JavaScript test updates for new polling intervals
No load testing to verify performance under high frequency polling

Recommendation: Add integration tests that verify:

# Test that stats are collected every 1 second
# Test that rate calculations work correctly with UInt32
# Test frontend polling behavior under network issues

Code Architecture Alignment ✅

The changes align well with LavinMQ's design principles:

Maintains the disk-first approach with minimal memory usage
Follows existing patterns for stats collection and logging
Consistent with the high-performance goals mentioned in CLAUDE.md

Summary

This is a well-implemented improvement that successfully achieves better real-time monitoring. The code quality is high with good performance optimizations and proper error handling.

Strengths:

Excellent timing measurement and logging in stats collection
Smart data type optimization (Float64 → UInt32)
Consistent updates across all frontend components
Maintains backward compatibility
Proper removal of now-unused configuration

Main Concerns:

Increased resource usage from 5x polling frequency
Gaps in test coverage for new timing behavior
Potential efficiency issues with frequent log rotation

Recommendation: Approve with monitoring of production performance metrics after deployment.

Updates stats collection frequency across server and web UI: - Remove configurable stats_interval, hardcode to 1s in server stats loop - Change JavaScript polling intervals from 5000ms to 1000ms - Update stats log size from 120 to 600 (maintains 10min history) - Fix CPU time metrics to use counter type in Prometheus - Improve stats collection timing and logging - Update flow control tests to work with new timing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

claude · 2025-07-29T13:35:59Z

Claude finished @carlhoerberg's task —— View job

PR Review: Change stats interval from 5s to 1s

I've conducted a comprehensive review of this PR focusing on code quality, potential issues, performance, security, and test coverage.

Todo List:

Code Quality & Best Practices ✅

Configuration Changes (src/lavinmq/config.cr:42):

Clean update to stats_log_size = 600 with descriptive comment "10 mins at 1s interval"
Maintains 10-minute history window as intended
Good naming choice - stats_log_size is clear and self-documenting

Server Logic (src/lavinmq/server.cr:377-409):

Excellent improvements to stats collection loop with proper timing measurement
Smart performance logging with different thresholds:
- Error >1000ms (server.cr:394-395)
- Warn >100ms (server.cr:396-397)
- Info >1ms (server.cr:398-399)
Proper sleep calculation to maintain precise 1-second intervals (server.cr:401-403)
Good use of Time.measure for both overall loop time and individual collection phases

Data Type Optimization (src/lavinmq/stats.cr:9-10):

Well-justified change from Float64 to UInt32 for rates
Since you can't publish <1 msg/sec, integers are appropriate and more performant
Consistent use of UInt32 throughout rate calculations and logs
Memory usage reduction: UInt32 uses ~50% less memory than Float64

Frontend Consistency (static/js/datasource.js:56):

All polling intervals correctly updated to 1000ms across frontend
Maintains consistent user experience with backend changes

Performance Considerations ⚠️

Positive Impacts:

UInt32 vs Float64 reduces memory usage for rate storage
Better real-time monitoring experience for users
More responsive charts and UI updates

Potential Concerns:

5x increase in polling frequency (1s vs 5s) increases:
- CPU usage for stats collection (mitigated by optimized timing)
- Network traffic from frontend polling
- I/O operations for metrics collection
Memory usage for logs remains controlled (600 entries vs previous ~120)

Recommendation: Monitor CPU usage in production, especially on high-traffic systems.

Potential Issues ⚠️

Race Conditions: The 5x frequency increase could expose timing issues under very high load, though the timing improvements help mitigate this
Log Rotation Logic (server.cr:328-344, stats.cr:45-46): The until @log.size < log_size pattern could be inefficient with frequent updates - consider batching log cleanup
Frontend Polling: No retry logic or backoff strategy for failed requests at higher frequency

Security Concerns ✅

No security issues identified
Changes are purely performance/monitoring related
No new attack surfaces introduced

Test Coverage ⚠️

What's Covered:

Server spec properly updated (server_spec.cr) - removed unnecessary stats_interval manipulation
Test improvements in flow and queue specs (better test isolation)

Missing Coverage:

No specific tests for the new 1-second timing behavior
No tests validating UInt32 rate calculations vs previous Float64 behavior
No frontend JavaScript test updates for new polling intervals
No load testing to verify performance under high frequency polling

Recommendation: Add integration tests that verify:

# Test that stats are collected every 1 second
# Test that rate calculations work correctly with UInt32
# Test frontend polling behavior under network issues

Code Architecture Alignment ✅

The changes align well with LavinMQ's design principles:

Maintains the disk-first approach with minimal memory usage
Follows existing patterns for stats collection and logging
Consistent with the high-performance goals mentioned in CLAUDE.md

Summary

This is a well-implemented improvement that successfully achieves better real-time monitoring. The code quality is high with good performance optimizations and proper error handling.

Strengths:

Excellent timing measurement and logging in stats collection
Smart data type optimization (Float64 → UInt32)
Consistent updates across all frontend components
Maintains backward compatibility
Proper removal of now-unused configuration

Main Concerns:

Increased resource usage from 5x polling frequency
Gaps in test coverage for new timing behavior
Potential efficiency issues with frequent log rotation

Recommendation: Approve with monitoring of production performance metrics after deployment.

@Mask

A ring buffer should be faster for our use case. While not the most common operation, for a broker with a lot of objects (queues, exchanges etc) we want to minimize any CPU time spent on just keeping track of the stats. Improvements to the `to_a` method that's used in the UI/API, i.e. any time we refresh the web page with new stats (proposed to be once per second in #1195) should help a bit, even though the full JSON serialization is probably much more expensive. Getting around 9% improvements in the simulated http_benchmark.cr See benchmark/ directory for all micro benchmarks (should be cleaned up before we merge this). Also kept the old Deque based code until we're ready to say this is good. ``` RingBuffer (steady-state) 355.76M ( 2.81ns) (± 5.70%) 0.0B/op fastest Deque (steady-state) 177.83M ( 5.62ns) (± 2.41%) 0.0B/op 2.00× slower RingBuffer (fill from empty) 285.64k ( 3.50µs) (± 0.76%) 12.1kB/op fastest Deque (fill from empty) 202.75k ( 4.93µs) (± 1.55%) 12.1kB/op 1.41× slower RingBuffer (to_a) 872.70k ( 1.15µs) (± 0.84%) 12.1kB/op fastest Deque (to_a) 146.77k ( 6.81µs) (± 0.94%) 12.1kB/op 5.95× slower ``` Why RingBuffer is faster: Bitwise Modulo Operations - RingBuffer uses & @Mask for wrap-around (single CPU instruction) - Deque uses conditional branches (if index >= @capacity) - Bitwise AND avoids branch misprediction penalties No Memory Shifting - RingBuffer: Simply overwrites oldest value when full - Deque: Calls shift to remove first element, causing memory movement Simpler Push Logic - RingBuffer: Direct write to calculated position - Deque: Check size → shift if needed → then push (3 operations vs 1) CPU Cache Optimization - RingBuffer: Predictable circular access pattern - Deque: Shift operations invalidate cache lines No Dynamic Memory Management - RingBuffer: Fixed allocation, never grows - Deque: Resize checks and potential reallocation overhead

carlhoerberg requested a review from a team as a code owner July 23, 2025 20:07

carlhoerberg force-pushed the stats_interval_1s branch 2 times, most recently from a9013af to 24a6f9e Compare July 25, 2025 21:30

carlhoerberg force-pushed the stats_interval_1s branch from 5d38006 to 0eb0e43 Compare July 27, 2025 21:51

carlhoerberg force-pushed the stats_interval_1s branch from 2d551a9 to 8b72c19 Compare July 29, 2025 13:35

jage mentioned this pull request Oct 13, 2025

RingBuffer for stats #1353

Draft

carlhoerberg closed this Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change stats interval from 5s to 1s for better real-time monitoring#1195

Change stats interval from 5s to 1s for better real-time monitoring#1195
carlhoerberg wants to merge 1 commit intomainfrom
stats_interval_1s

carlhoerberg commented Jul 23, 2025 •

edited

Loading

Uh oh!

claude bot commented Jul 25, 2025 •

edited

Loading

Uh oh!

claude bot commented Jul 27, 2025 •

edited

Loading

Uh oh!

claude bot commented Jul 28, 2025 •

edited

Loading

Uh oh!

claude bot commented Jul 28, 2025 •

edited

Loading

Uh oh!

claude bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlhoerberg commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Change stats interval from 5s to 1s

Todo List:

Code Quality & Best Practices ✅

Performance Considerations ⚠️

Potential Issues ⚠️

Security Concerns ✅

Test Coverage ⚠️

Additional Observations

Summary

Uh oh!

claude bot commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Stats Interval Change from 5s to 1s

Todo List:

Code Quality & Best Practices ✅

Performance Considerations ⚠️

Potential Issues ⚠️

Security Concerns ✅

Test Coverage ⚠️

Code Architecture Alignment ✅

Summary

Uh oh!

claude bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Stats Interval Change from 5s to 1s

Todo List:

Code Quality & Best Practices ✅

Performance Considerations ⚠️

Potential Issues ⚠️

Security Concerns ✅

Test Coverage ⚠️

Code Architecture Alignment ✅

Summary

Uh oh!

claude bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Change stats interval from 5s to 1s

Todo List:

Code Quality & Best Practices ✅

Performance Considerations ⚠️

Potential Issues ⚠️

Security Concerns ✅

Test Coverage ⚠️

Code Architecture Alignment ✅

Summary

Uh oh!

claude bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Change stats interval from 5s to 1s

Todo List:

Code Quality & Best Practices ✅

Performance Considerations ⚠️

Potential Issues ⚠️

Security Concerns ✅

Test Coverage ⚠️

Code Architecture Alignment ✅

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

carlhoerberg commented Jul 23, 2025 •

edited

Loading

claude bot commented Jul 25, 2025 •

edited

Loading

claude bot commented Jul 27, 2025 •

edited

Loading

claude bot commented Jul 28, 2025 •

edited

Loading

claude bot commented Jul 28, 2025 •

edited

Loading

claude bot commented Jul 29, 2025 •

edited

Loading