Skip to content

Latest commit

 

History

History
100 lines (73 loc) · 9.84 KB

File metadata and controls

100 lines (73 loc) · 9.84 KB

Host Monitoring Plugin

Enhanced Failure Monitoring

The figure that follows shows a simplified Enhanced Failure Monitoring (EFM) workflow. Enhanced Failure Monitoring is a feature available from the Host Monitoring Connection Plugin. The Host Monitoring Connection Plugin periodically checks the connected database host's health or availability. If a database host is determined to be unhealthy, the connection will be aborted. The Host Monitoring Connection Plugin uses the Enhanced Failure Monitoring Parameters and a database host's responsiveness to determine whether a host is healthy.

The Benefits of Enhanced Failure Monitoring

Enhanced Failure Monitoring helps user applications detect failures earlier. When a user application executes a query, EFM may detect that the connected database host is unavailable. When this happens, the query is cancelled and the connection will be aborted. This allows queries to fail fast instead of waiting indefinitely or failing due to a timeout.

One use case is to pair EFM with the Failover Connection Plugin. When EFM discovers a database host failure, the connection will be aborted. Without the Failover Connection Plugin, the connection would be terminated up to the user application level. With the Failover Connection Plugin, the AWS Advanced Python Wrapper can attempt to failover to a different, healthy database host where the query can be executed.

Using the Host Monitoring Connection Plugin

The Host Monitoring Connection Plugin will be loaded by default if the plugins parameter is not specified. The Host Monitoring Connection Plugin can also be explicitly loaded by adding the plugin code host_monitoring to the plugins parameter. Enhanced Failure Monitoring is enabled by default when the Host Monitoring Connection Plugin is loaded, but it can be disabled by setting the failure_detection_enabled parameter to False.

This plugin only works with drivers that support aborting connections from a separate thread. At this moment, this plugin is incompatible with the MySQL Connector/Python driver.

[IMPORTANT]
The Host Monitoring Plugin creates monitoring threads in the background to monitor all connections established to each cluster instance. The monitoring threads can be cleaned up in two ways:

  1. If there are no connections to the cluster instance the thread is monitoring for over a period of time, the Host Monitoring Plugin will automatically terminate the thread. This period of time is adjustable via the monitor_disposal_time_ms parameter.
  2. Client applications can manually call aws_advanced_python_wrapper.release_resources() to clean up any dangling resources. It is best practice to call aws_advanced_python_wrapper.release_resources() at the end of the application to ensure a graceful exit; otherwise, the application may wait until the monitor_disposal_time_ms has been passed before terminating. This is because the Python wrapper waits for all daemon threads to complete before exiting. See PGFailover for an example.

Enhanced Failure Monitoring Parameters

The parameters failure_detection_time_ms, failure_detection_interval_ms, and failure_detection_count are similar to TCP Keepalive parameters. Each connection has its own set of parameters. failure_detection_time_ms controls how long the monitor waits after a SQL query is executed before sending a probe to the database host. failure_detection_interval_ms controls how often the monitor sends probes to the database after the initial probe. failure_detection_count controls how many times a monitor probe can go unacknowledged before the database host is deemed unhealthy.

To determine the health of a database host:

  1. The monitor will first wait for a time equivalent to the failure_detection_time_ms.
  2. Then, every failure_detection_interval_ms, the monitor will send a probe to the database host.
  3. If the probe is not acknowledged by the database host, a counter is incremented.
  4. If the counter reaches the failure_detection_count, the database host will be deemed unhealthy and the connection will be aborted.

If a more aggressive approach to failure checking is necessary, all of these parameters can be reduced to reflect that. However, increased failure checking may also lead to an increase in false positives. For example, if the failure_detection_interval_ms was shortened, the plugin may complete several connection checks that all fail. The database host would then be considered unhealthy, but it may have been about to recover and the connection checks were completed before that could happen.

Parameter Value Required Description Default Value
failure_detection_enabled Boolean No Set to True to enable Enhanced Failure Monitoring. Set to False to disable it. True
monitor_disposal_time_ms Integer No Interval in milliseconds specifying how long to wait before an inactive monitor should be disposed. 60000
failure_detection_count Integer No Number of failed connection checks before considering database host as unhealthy. 3
failure_detection_interval_ms Integer No Interval in milliseconds between probes to database host. 5000
failure_detection_time_ms Integer No Interval in milliseconds between sending a SQL query to the server and the first probe to the database host. 30000

The Host Monitoring Connection Plugin may create new monitoring connections to check the database host's availability. You can configure these connections with driver-specific configurations by adding the monitoring- prefix to the configuration parameters, as in the following example:

import psycopg
from aws_advanced_python_wrapper import AwsWrapperConnection, release_resources

props = {
    "monitoring-connect_timeout": 10,
    "monitoring-socket_timeout": 10
}

try: 
    conn = AwsWrapperConnection.connect(
        psycopg.Connection.connect,
        host="database.cluster-xyz.us-east-1.rds.amazonaws.com",
        dbname="postgres",
        user="john",
        password="pwd",
        plugins="host_monitoring",
        # Configure the timeout values for all non-monitoring connections.
        connect_timeout=30, socket_timeout=30,
        # Configure different timeout values for the monitoring connections.
        **props)
finally:
    release_resources()

Important

If specifying a monitoring- prefixed timeout, always ensure you provide a non-zero timeout value

Warning

Warnings About Usage of the AWS Advanced Python Wrapper with RDS Proxy We recommend you either disable the Host Monitoring Connection Plugin or avoid using RDS Proxy endpoints when the Host Monitoring Connection Plugin is active.

Although using RDS Proxy endpoints with the AWS Advanced Python Wrapper with Enhanced Failure Monitoring doesn't cause any critical issues, we don't recommend this approach. The main reason is that RDS Proxy transparently re-routes requests to a single database instance. RDS Proxy decides which database instance is used based on many criteria (on a per-request basis). Switching between different instances makes the Host Monitoring Connection Plugin useless in terms of instance health monitoring because the plugin will be unable to identify which instance it's connected to, and which one it's monitoring. This could result in false positive failure detections. At the same time, the plugin will still proactively monitor network connectivity to RDS Proxy endpoints and report outages back to a user application if they occur.

Host Monitoring Plugin v2

Host Monitoring Plugin v2, also known as host_monitoring_v2, is an alternative implementation of enhanced failure monitoring and it is functionally equivalent to the Host Monitoring Plugin described above. Both plugins share the same set of configuration parameters. The host_monitoring_v2 plugin is designed to be a drop-in replacement for the host_monitoring plugin. The host_monitoring_v2 plugin can be used in any scenario where the host_monitoring plugin is mentioned. This plugin is enabled by default. The original EFM plugin can still be used by specifying host_monitoring in the plugins parameter.

Note

Since these two plugins are separate plugins, users may decide to use them together with a single connection. While this should not have any negative side effects, it is not recommended. It is recommended to use either the host_monitoring_v2 plugin, or the host_monitoring plugin where it's needed.

The host_monitoring_v2 plugin is designed to address some of the issues that have been reported by multiple users. The following changes have been made:

  • Used weak pointers to ease garbage collection
  • Split monitoring logic into two separate threads to increase overall monitoring stability
  • Reviewed locks for monitoring context
  • Reviewed and redesigned stopping of idle monitoring threads
  • Reviewed and simplified monitoring logic