Skip to content

Add unauthenticated /metrics endpoint on port 15692#1198

Closed
carlhoerberg wants to merge 2 commits intomainfrom
claude/issue-1162-20250725-2156
Closed

Add unauthenticated /metrics endpoint on port 15692#1198
carlhoerberg wants to merge 2 commits intomainfrom
claude/issue-1162-20250725-2156

Conversation

@carlhoerberg
Copy link
Member

This implements the ability to serve Prometheus metrics without authentication on localhost:15692 (configurable), addressing the need for simplified metrics scraping as described in issue #1162.

Key Changes

  • Add metrics_port configuration option (default: 15692)
  • Create MetricsServer class for unauthenticated metrics serving
  • Expose necessary methods in PrometheusController
  • Integrate metrics server into launcher
  • Add comprehensive tests for new functionality

The metrics server only binds to localhost for security and serves both /metrics and /metrics/detailed endpoints. Followers expose the same metrics they can provide (CPU/RAM/GC metrics and replication-related data).

Fixes #1162

Generated with Claude Code

@carlhoerberg carlhoerberg requested a review from a team as a code owner July 25, 2025 22:04
@claude
Copy link

claude bot commented Jul 25, 2025

Claude finished @carlhoerberg's task —— View job


PR Review: Add unauthenticated /metrics endpoint on port 15692

I've conducted a comprehensive review of this PR. The implementation looks solid overall with good architectural decisions and security considerations.

Review Tasks:

  • Read repository setup instructions (CLAUDE.md)
  • Review config changes in src/lavinmq/config.cr
  • Analyze new MetricsServer class in src/lavinmq/http/metrics_server.cr
  • Review Prometheus controller modifications in src/lavinmq/http/controller/prometheus.cr
  • Check launcher integration in src/lavinmq/launcher.cr
  • Examine test coverage in spec/metrics_server_spec.cr
  • Assess security implications and best practices
  • Provide comprehensive feedback

🟢 Overall Assessment: LGTM with Minor Suggestions

This is a well-implemented feature that addresses issue #1162 effectively. The code follows good practices and maintains security through localhost-only binding.

Strengths

Security:

  • ✅ Hardcoded localhost binding in launcher (127.0.0.1) prevents external access
  • ✅ Prefix length validation (max 20 characters) prevents abuse
  • ✅ No user authentication bypass - designed for local metrics scraping
  • ✅ Same data exposure as authenticated endpoint (no additional data leakage)

Architecture:

  • ✅ Clean separation with dedicated MetricsServer class
  • ✅ Good code reuse of existing PrometheusController methods
  • ✅ Consistent error handling and logging
  • ✅ Proper resource cleanup in tests

Code Quality:

  • ✅ Follows Crystal naming conventions
  • ✅ Appropriate use of private methods
  • ✅ Clear method signatures and consistent patterns
  • ✅ Good logging with dedicated logger

🟡 Minor Suggestions

1. Input Validation Enhancement (src/lavinmq/http/metrics_server.cr:42,64)

# Current: Only validates bytesize
if prefix.bytesize > 20

# Suggestion: Add character validation for security
if prefix.bytesize > 20 || !prefix.matches?(/^[a-zA-Z_][a-zA-Z0-9_]*$/)
  context.response.status_code = 400
  context.response.print "Invalid prefix format"
  return
end

2. Error Handling Consistency (src/lavinmq/http/metrics_server.cr:18)

# Current: Simple text response
context.response.print "Not Found"

# Suggestion: Include valid paths for better UX
context.response.print "Not Found. Available endpoints: /metrics, /metrics/detailed"

3. Configuration Validation (src/lavinmq/config.cr:155-157)
Consider adding port range validation:

p.on("--metrics-port=PORT", "Metrics port to listen on without auth (default: 15692)") do |v|
  port = v.to_i
  raise "Metrics port must be between 1 and 65535" unless (1..65535).includes?(port)
  @metrics_port = port
end

📊 Test Coverage Analysis

Excellent coverage includes:

  • ✅ Basic /metrics endpoint functionality
  • ✅ Detailed metrics with query parameters
  • ✅ 404 handling for invalid paths
  • ✅ Configuration parsing and defaults
  • ✅ Proper server lifecycle (bind, listen, close)

Potential additional tests:

  • Prefix validation edge cases (empty, special characters, max length)
  • Query parameter handling edge cases
  • Error scenarios (malformed requests)

🚀 Performance Considerations

  • ✅ Minimal overhead - reuses existing metrics collection
  • ✅ No additional authentication overhead by design
  • ✅ Separate server prevents blocking main HTTP server
  • ✅ Efficient fiber-based concurrency

🔒 Security Assessment

The security model is appropriate for the use case:

  • Localhost-only binding is the key security control
  • No authentication is intentional for simplified scraping
  • Same data exposure as authenticated endpoint maintains consistency
  • Port configuration allows environment-specific security policies

📝 Code Quality Observations

Prometheus Controller Changes (lines 72-86, 103-124):

  • ✅ Good extraction of target_vhosts method for reuse
  • ✅ Methods properly made public for external access
  • ✅ Consistent with existing patterns

Launcher Integration (lines 71, 184-188):

  • ✅ Proper lifecycle management (start, bind, listen, close)
  • ✅ Conditional startup based on port configuration
  • ✅ Consistent with other server components

🎯 Recommendation

Approve and merge - this is a solid implementation that effectively addresses the requirements. The minor suggestions are optional improvements that don't impact the core functionality or security model.

The feature successfully provides:

  • Unauthenticated localhost access for simplified monitoring
  • Configurable port with sensible default (15692)
  • Comprehensive metric exposure (both basic and detailed)
  • Proper security boundaries through localhost-only binding

@carlhoerberg
Copy link
Member Author

@claude modify this pr, when MetricServer is run in a replication follower an amqp_server can't be injected, so make it optional.

@claude
Copy link

claude bot commented Jul 25, 2025

Claude finished @carlhoerberg's task —— View job


I've successfully modified the PR to make amqp_server optional in MetricsServer for replication follower scenarios.

Tasks Completed:

  • Read repository setup instructions (CLAUDE.md) - No CLAUDE.md, reviewed README.md
  • Analyze current MetricsServer implementation
  • Analyze PrometheusController dependencies on amqp_server
  • Make amqp_server optional in MetricsServer constructor
  • Update code to handle nil amqp_server case
  • Update tests to reflect optional amqp_server
  • Test the changes - Unable to run tests due to permissions, but changes are logically sound
  • Commit and push modifications

Changes Made:

  • MetricsServer (src/lavinmq/http/metrics_server.cr): Made amqp_server parameter optional (LavinMQ::Server?)
  • PrometheusController (src/lavinmq/http/controller/prometheus.cr):
    • Added constructor overload to handle optional amqp_server
    • Conditional route registration only when amqp_server is present
    • All metrics methods now safely handle nil amqp_server (return 0 or skip server-specific metrics)
  • Tests (spec/metrics_server_spec.cr): Added test case for follower scenario where amqp_server is nil

Summary:

The changes ensure that when MetricsServer runs in a replication follower context where amqp_server can't be injected, the metrics server still functions properly. Server-specific metrics return default values (0) while system-level metrics (like file descriptors, GC stats) are still available when applicable.

This addresses the original issue where followers couldn't start the metrics server due to the missing amqp_server dependency.

Commit: 1582a55 - Make amqp_server optional in MetricsServer for replication followers


- Modified MetricsServer to accept optional amqp_server parameter
- Updated PrometheusController to handle nil amqp_server gracefully
- All metrics methods now return default values (0) when amqp_server is nil
- Added test case for follower scenario without amqp_server
- Ensures metrics endpoint works in replication follower contexts

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Carl Hörberg <carlhoerberg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to serve /metrics without auth

1 participant