Amazon ElastiCache

개요

주요 성능 메트릭, 수집 방법, Coursera가 Datadog 를 사용하여 ElastiCache를 모니터링하는 방법에 대해 알아보려면 Redis 또는 Memcached를 사용한 ElastiCache 성능 메트릭 모니터링 을 참조하세요.

설정

이미 하지 않은 경우 먼저 Amazon Web Services 통합을 설정합니다.

Datadog 에이전트 없는 설치

AWS 통합 페이지에서 ElastiCache가 Metric Collection 탭에서 활성화되어 있는지 확인하세요.

Amazon ElastiCache 메트릭을 수집하려면 Datadog IAM 정책에 다음 권한을 추가하세요. 자세한 내용은 AWS 웹사이트에서 ElastiCache 정책을 참조하세요.

AWS 권한	설명
`elasticache:DescribeCacheClusters`	캐시 클러스터를 목록화 및 설명하고 태그와 메트릭을 추가합니다.
`elasticache:ListTagsForResource`	클러스터의 커스텀 태그를 목록화하여 커스텀 태그를 추가합니다.
`elasticache:DescribeEvents`	스냅샷과 유지관리에 대한 이벤트를 추가합니다.

Datadog - Amazon ElastiCache 통합을 설치합니다.

Datadog 에이전트를 사용한 설치(권장)

에이전트를 통해 네이티브 메트릭 수집

다음 다이어그램은 Datadog가 네이티브 ElastiCache 통합을 사용해 클라우드와치(CloudWatch)에서 직접 메트릭을 수집하는 방법과 Redis 또는 Memcached 등 백엔드 기술에서 직접 네이티브 메트릭을 추가로 수집하는 방법을 보여줍니다. 백엔드에서 직접 수집함으로써 더 높은 명확도의 더 많은 주요 메트릭에 액세스할 수 있습니다.

작업 방법

에이전트 메트릭이 실제 ElastiCache 인스턴스가 아니라 에이전트가 실행되는 EC2 인스턴스에 연결되어 있기 때문에 cacheclusterid 태그를 사용해 모든 메트릭을 연결해야 합니다. ElastiCache 인스턴스와 동일한 태그로 에이전트가 구성되면 Redis/Memcached 메트릭을 ElastiCache 메트릭과 결합하는 것이 직관적입니다.

단계별

에이전트가 실제 ElastiCache 인스턴스가 아니라 원격 머신에서 실행되므로 이 통합을 올바르게 설치하기 위한 핵심은 에이전트가 어디에서 메트릭을 수집할지를 지정하는 것입니다.

ElastiCache 인스턴스를 위한 연결 상세 정보 수집

먼저 AWS 콘솔로 이동합니다. ElastiCache 섹션을 연 다음 캐시 클러스터 탭을 열어 모니터링하려는 클러스터를 찾습니다. 다음과 같습니다.

그런 다음 “노드” 링크를 클릭해 엔드포인트 URL에 액세스합니다.

엔드포인트 URL(예: replica-001.xxxx.use1.cache.amazonaws.com)과 cacheclusterid (예: replica-001)을 적어둡니다. 해당 값은 에이전트를 설정하고 그래프와 대시보드를 생성하는 데 필요합니다.

Agent 설정

Redis/Memcached 통합은 개별 캐시 인스턴스의 태깅을 지원합니다. 원래는 동일한 머신의 여러 인스턴스를 모니터링하기 위해 허용되었지만 이들 태그는 메트릭을 필터링하고 그룹화하는 데 사용할 수 있습니다. redisdb.yaml를 사용해 Redis 포함 ElastiCache를 설정한 예시는 다음과 같습니다. 이 파일이 사용자 플랫폼 기준 어디에 저장되는지에 대한 자세한 정보는 에이전트 설정 디렉터리을 참조하세요.

init_config:

instances:
    # Endpoint URL from AWS console
    - host: replica-001.xxxx.use1.cache.amazonaws.com
      port: 6379
      # Cache Cluster ID from AWS console
      tags:
          - cacheclusterid:replicaa-001

그런 다음 에이전트를 다시 시작합니다. sudo /etc/init.d/datadog-agent restart(리눅스(Linux)에서)

메트릭을 함께 시각화

몇 분 후, ElastiCache 메트릭과 Redis 또는 Memcached 메트릭은 그래픽, 모니터링 등을 위해 Datadog에 액세스할 수 있습니다.

그래프를 설정하고 동일한 cacheclusterid 태그 replicaa-001를 사용하여 ElastiCache의 캐시 히트 메트릭을 Redis의 네이티브 지연 메트릭과 결합하는 예시입니다.

수집한 데이터

메트릭

aws.elasticache.active_defrag_hits (gauge)	Redis - The number of value reallocations per minute performed by the active defragmentation process.
aws.elasticache.bytes_read_into_memcached (count)	Memcached - The number of bytes that have been read from the network by the cache node. Shown as byte
aws.elasticache.bytes_used_for_cache (gauge)	Redis - The total number of bytes allocated by Redis. Shown as byte
aws.elasticache.bytes_used_for_cache_items (gauge)	Memcached - The number of bytes used to store cache items. Shown as byte
aws.elasticache.bytes_used_for_hash (gauge)	Memcached - The number of bytes currently used by hash tables. Shown as byte
aws.elasticache.bytes_written_out_from_memcached (count)	Memcached - The number of bytes that have been written to the network by the cache node. Shown as byte
aws.elasticache.cache_hit_rate (gauge)	Redis - Indicates the usage efficiency of the Redis instance. Shown as percent
aws.elasticache.cache_hits (count)	Redis - The number of successful key lookups. Shown as hit
aws.elasticache.cache_misses (count)	Redis - The number of unsuccessful key lookups. Shown as miss
aws.elasticache.cas_badval (count)	Memcached - The number of CAS (check and set) requests the cache has received where the Cas value did not match the Cas value stored. Shown as request
aws.elasticache.cas_hits (count)	Memcached - The number of CAS requests the cache has received where the requested key was found and the Cas value matched. Shown as hit
aws.elasticache.cas_misses (count)	Memcached - The number of CAS requests the cache has received where the key requested was not found. Shown as miss
aws.elasticache.cluster_count (count)	The number of Elasticache clusters.
aws.elasticache.cmd_config_get (count)	Memcached - The cumulative number of config get requests. Shown as get
aws.elasticache.cmd_config_set (count)	Memcached - The cumulative number of config set requests. Shown as set
aws.elasticache.cmd_flush (count)	Memcached - The number of flush commands the cache has received. Shown as flush
aws.elasticache.cmd_get (count)	Memcached - The number of get commands the cache has received. Shown as get
aws.elasticache.cmd_set (count)	Memcached - The number of set commands the cache has received. Shown as set
aws.elasticache.cmd_touch (count)	Memcached - The cumulative number of touch requests. Shown as request
aws.elasticache.cpucredit_balance (gauge)	The number of earned CPU credits that an instance has accrued since it was launched or started. Shown as unit
aws.elasticache.cpucredit_usage (gauge)	The number of CPU credits spent by the instance for CPU utilization. Shown as unit
aws.elasticache.cpuutilization (gauge)	The percentage of CPU utilization for the server. Shown as percent
aws.elasticache.curr_config (gauge)	Memcached - The current number of configurations stored.
aws.elasticache.curr_connections (gauge)	Redis - The number of client connections, excluding connections from read replicas. Memcached - A count of the number of connections connected to the cache at an instant in time. Shown as connection
aws.elasticache.curr_items (gauge)	Redis - The number of items in the cache. This is derived from the Redis keyspace statistic, summing all of the keys in the entire keyspace. Memcached - A count of the number of items currently stored in the cache. Shown as item
aws.elasticache.database_memory_usage_percentage (gauge)	Redis - The percentage of the memory available for the cluster that is in use. Shown as percent
aws.elasticache.db_0average_ttl (gauge)	Redis - Exposes avg_ttl of DB0 from the keyspace statistic of the Redis INFO command. Shown as millisecond
aws.elasticache.decr_hits (count)	Memcached - The number of decrement requests the cache has received where the requested key was found. Shown as hit
aws.elasticache.decr_misses (count)	Memcached - The number of decrement requests the cache has received where the requested key was not found. Shown as miss
aws.elasticache.delete_hits (count)	Memcached - The number of delete requests the cache has received where the requested key was found. Shown as hit
aws.elasticache.delete_misses (count)	Memcached - The number of delete requests the cache has received where the requested key was not found. Shown as miss
aws.elasticache.engine_cpuutilization (gauge)	The percentage of CPU utilization for the Redis process. Shown as percent
aws.elasticache.eval_based_cmds (count)	Redis - The total number of commands for eval-based commands. Shown as command
aws.elasticache.eval_based_cmds_latency (gauge)	Redis - The latency of eval-based commands. Shown as microsecond
aws.elasticache.evicted_unfetched (count)	Memcached - The number of valid items evicted from the least recently used cache (LRU) which were never touched after being set. Shown as item
aws.elasticache.evictions (count)	Redis - The number of keys that have been evicted due to the maxmemory limit. Memcached - The number of non-expired items the cache evicted to allow space for new writes. Shown as eviction
aws.elasticache.expired_unfetched (count)	Memcached - The number of expired items reclaimed from the LRU which were never touched after being set. Shown as item
aws.elasticache.freeable_memory (gauge)	The amount of free memory available on the host. Shown as byte
aws.elasticache.geo_spatial_based_cmds (count)	Redis - The total number of geo spatial based commands. Shown as command
aws.elasticache.get_hits (count)	Memcached - The number of get requests the cache has received where the key requested was found. Shown as hit
aws.elasticache.get_misses (count)	Memcached - The number of get requests the cache has received where the key requested was not found. Shown as miss
aws.elasticache.get_type_cmds (count)	Redis - The total number of read-only type commands. This is derived from the Redis OSS commandstats statistic by summing all of the read-only type commands (get, hget, scard, lrange, and so on.) Shown as command
aws.elasticache.get_type_cmds_latency (gauge)	Redis - The latency of read commands. Shown as microsecond
aws.elasticache.hash_based_cmds (count)	Redis - The total number of commands that are hash-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more hashes. Shown as command
aws.elasticache.hash_based_cmds_latency (gauge)	Redis - The latency of hash-based commands. Shown as microsecond
aws.elasticache.hyper_log_log_based_cmds (count)	Redis - The total number of HyperLogLog based commands. This is derived from the Redis commandstats statistic by summing all of the pf type of commands (pfadd, pfcount, pfmerge). Shown as command
aws.elasticache.incr_hits (count)	Memcached - The number of increment requests the cache has received where the key requested was found. Shown as hit
aws.elasticache.incr_misses (count)	Memcached - The number of increment requests the cache has received where the key requested was not found. Shown as miss
aws.elasticache.is_master (gauge)	Redis - Returns 1 if the node is master, 0 otherwise.
aws.elasticache.key_based_cmds (count)	Redis - The total number of commands that are key-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more keys. Shown as command
aws.elasticache.key_based_cmds_latency (gauge)	Redis - The latency of key-based commands. Shown as microsecond
aws.elasticache.list_based_cmds (count)	Redis - The total number of commands that are list-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more lists. Shown as command
aws.elasticache.master_link_health_status (gauge)	Redis - A value of 0 indicates that data in the Elasticache primary node is not in sync with Redis on EC2. A value of 1 indicates that the data is in sync.
aws.elasticache.memory_fragmentation_ratio (gauge)	Redis - Indicates the efficiency in the allocation of memory of the Redis engine.
aws.elasticache.network_bytes_in (count)	The number of bytes the host has read from the network. Shown as byte
aws.elasticache.network_bytes_out (count)	The number of bytes the host has written to the network. Shown as byte
aws.elasticache.network_packets_in (count)	The number of packets received on all network interfaces by the instance. Shown as packet
aws.elasticache.network_packets_out (count)	The number of packets sent out on all network interfaces by the instance. Shown as packet
aws.elasticache.new_connections (count)	Redis - The total number of connections that have been accepted by the server during this period. Memcached - The number of new connections the cache has received. This is derived from the memcached totalconnections statistic by recording the change in totalconnections across a period of time. This will always be at least 1, due to a connection reserved for a ElastiCache. Shown as connection
aws.elasticache.new_items (count)	Memcached - The number of new items the cache has stored. This is derived from the memcached totalitems statistic by recording the change in totalitems across a period of time. Shown as item
aws.elasticache.node_count (count)	The number of Elasticache nodes. Shown as node
aws.elasticache.reclaimed (count)	Redis - The total number of key expiration events. Memcached - The number of expired items the cache evicted to allow space for new writes.
aws.elasticache.replication_bytes (gauge)	Redis - For primaries with attached replicas, ReplicationBytes reports the number of bytes that the primary is sending to all of its replicas. This metric is representative of the write load on the replication group. For replicas and standalone primaries, ReplicationBytes is always 0. Shown as byte
aws.elasticache.replication_lag (gauge)	Redis - This metric is only applicable for a cache node running as a read replica. It represents how far behind, in seconds, the replica is in applying changes from the primary cache cluster. Shown as second
aws.elasticache.save_in_progress (gauge)	Redis - This binary metric returns 1 whenever a background save (forked or forkless) is in progress, and 0 otherwise. A background save process is typically used during snapshots and syncs. These operations can cause degraded performance. Using the SaveInProgress metric, you can diagnose whether or not degraded performance was caused by a background save process.
aws.elasticache.set_based_cmds (count)	Redis - The total number of commands that are set-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more sets. Shown as command
aws.elasticache.set_based_cmds_latency (gauge)	Redis - The latency of set-based commands. Shown as microsecond
aws.elasticache.set_type_cmds (count)	Redis - The total number of write types of commands. This is derived from the Redis OSS commandstats statistic by summing all of the mutative types of commands that operate on data (set, hset, sadd, lpop, and so on.) Shown as command
aws.elasticache.set_type_cmds_latency (gauge)	Redis - The latency of write commands. Shown as microsecond
aws.elasticache.slabs_moved (count)	Memcached - The total number of slab pages that have been moved. Shown as page
aws.elasticache.sorted_set_based_cmds (count)	Redis - The total number of commands that are sorted set-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more sorted sets. Shown as command
aws.elasticache.sorted_set_based_cmds_latency (gauge)	Redis - The latency of sorted-based commands. Shown as microsecond
aws.elasticache.stream_based_cmds (count)	Redis - The total number of commands that are stream-based. Shown as command
aws.elasticache.stream_based_cmds_latency (gauge)	Redis - The latency of stream-based commands. Shown as microsecond
aws.elasticache.string_based_cmds (count)	Redis - The total number of commands that are string-based. This is derived from the Redis commandstats statistic by summing all of the commands that act upon one or more strings. Shown as command
aws.elasticache.string_based_cmds_latency (gauge)	Redis - The latency of string-based commands. Shown as microsecond
aws.elasticache.swap_usage (gauge)	The amount of swap used on the host. Shown as byte
aws.elasticache.touch_hits (count)	Memcached - The number of keys that have been touched and were given a new expiration time. Shown as hit
aws.elasticache.touch_misses (count)	Memcached - The number of items that have been touched, but were not found. Shown as miss
aws.elasticache.unused_memory (gauge)	Memcached - The amount of unused memory the cache can use to store items. This is derived from the memcached statistics limitmaxbytes and bytes by subtracting bytes from limitmaxbytes. Shown as byte

AWS에서 검색된 각 메트릭에는 AWS 콘솔에 나타나는 것과 동일한 태그가 할당됩니다, 호스트 이름, 보안 그룹 등을 포함하되 이에 국한되지 않습니다.

이벤트

Amazon ElastiCache 통합은 클러스터, 캐ㅣ 보안 그룹 및 캐시 파라미터 그룹에 대한 이벤트를 포함합니다. 아래에서 예시 이벤트를 참조하세요.

Amazon ElastiCache

개요

설정

Datadog 에이전트 없는 설치

Datadog 에이전트를 사용한 설치(권장)

에이전트를 통해 네이티브 메트릭 수집

작업 방법

단계별

ElastiCache 인스턴스를 위한 연결 상세 정보 수집

Agent 설정

메트릭을 함께 시각화

수집한 데이터

메트릭

이벤트

서비스 검사

트러블슈팅

참고 자료

Amazon ElastiCache

개요

설정

Datadog 에이전트 없는 설치

Datadog 에이전트를 사용한 설치(권장)

에이전트를 통해 네이티브 메트릭 수집

작업 방법

단계별

ElastiCache 인스턴스를 위한 연결 상세 정보 수집

Agent 설정

메트릭을 함께 시각화

수집한 데이터

메트릭

이벤트

서비스 검사

트러블슈팅

참고 자료

수집한 데이터