oci.gpu_infrastructure_health.gpu_ecc_double_bit_errors (count) | The number of GPU double-bit ECC errors reported. Shown as error |
oci.gpu_infrastructure_health.gpu_ecc_single_bit_errors (count) | The number of GPU single-bit ECC errors reported. Shown as error |
oci.gpu_infrastructure_health.gpu_memory_utilization (gauge) | The percentage of the GPU memory resource in use. Shown as percent |
oci.gpu_infrastructure_health.gpu_power_draw (gauge) | The amount of GPU power used. |
oci.gpu_infrastructure_health.gpu_temperature (gauge) | The GPU temperature reported. |
oci.gpu_infrastructure_health.gpu_utilization (gauge) | Activity level from GPU. Expressed as a percentage of total time. For instance pools, the value is averaged across all instances in the pool. Shown as percent |
oci.computeagent.cpu_utilization (gauge) | Activity level from CPU. Expressed as a percentage of total time. For instance pools, the value is averaged across all instances in the pool. Shown as percent |
oci.computeagent.disk_bytes_read (count) | Read throughput. Expressed as bytes read per interval. Shown as byte |
oci.computeagent.disk_bytes_written (count) | Write throughput. Expressed as bytes written per interval. Shown as byte |
oci.computeagent.disk_iops_read (count) | Activity level from I/O reads. Expressed as reads per interval. Shown as operation |
oci.computeagent.disk_iops_written (count) | Activity level from I/O writes. Expressed as writes per interval. Shown as operation |
oci.computeagent.load_average (gauge) | Average system load calculated over a 1-minute period. Shown as process |
oci.computeagent.memory_allocation_stalls (count) | Number of times page reclaim was called directly. |
oci.computeagent.memory_utilization (gauge) | Space currently in use. Measured by pages. Expressed as a percentage of used pages. For instance pools, the value is averaged across all instances in the pool. Shown as percent |
oci.computeagent.networks_bytes_in (count) | Network receipt throughput. Expressed as bytes received. Shown as byte |
oci.computeagent.networks_bytes_out (count) | Network transmission throughput. Expressed as bytes transmitted. Shown as byte |
oci.rdma_infrastructure_health.rdma_rx_bytes (count) | The bytes received on the RDMA interface. Shown as byte |
oci.rdma_infrastructure_health.rdma_rx_packets (count) | The number of RDMA interface packets received. Shown as packet |
oci.rdma_infrastructure_health.rdma_tx_bytes (count) | The bytes transmitted on the RDMA interface. Shown as byte |
oci.rdma_infrastructure_health.rdma_tx_packets (count) | The number of RDMA interface packets transmitted. Shown as packet |
oci.compute_infrastructure_health.health_status (count) | The number of health issues for an instance. Any non-zero value indicates a health defect. This metric is available only for bare metal instances. Shown as error |
oci.compute_infrastructure_health.instance_status (gauge) | The status of a running instance. A value of 0 indicates that the instance is available (up). A value of 1 indicates that the instance is not available (down) due to an infrastructure issue. If the instance is stopped, then the metric does not have a value. This metric is available only for VM instances. Shown as instance |
oci.compute_infrastructure_health.maintenance_status (gauge) | The maintenance status of an instance. A value of 0 indicates that the instance is not scheduled for an infrastructure maintenance event. A value of 1 indicates that the instance is scheduled for an infrastructure maintenance event. This metric is available for both VM and bare metal instances. Shown as instance |