- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
This check collects TokuMX metrics, including:
The TokuMX check is included in the Datadog Agent package. No additional installation is needed on your server.
Install the Python MongoDB module on your MongoDB server using the following command:
sudo pip install --upgrade "pymongo<3.0"
You can verify that the module is installed using this command:
python -c "import pymongo" 2>&1 | grep ImportError && \
echo -e "\033[0;31mpymongo python module - Missing\033[0m" || \
echo -e "\033[0;32mpymongo python module - OK\033[0m"
Start the Mongo shell. In the shell, create a read-only user for the Datadog Agent in the admin
database:
# Authenticate as the admin user.
use admin
db.auth("admin", "<YOUR_TOKUMX_ADMIN_PASSWORD>")
# Add a user for Datadog Agent
db.addUser("datadog", "<UNIQUEPASSWORD>", true)
Verify that you created the user with the following command (not in the Mongo shell).
python -c 'from pymongo import Connection; print Connection().admin.authenticate("datadog", "<UNIQUEPASSWORD>")' | \
grep True && \
echo -e "\033[0;32mdatadog user - OK\033[0m" || \
echo -e "\033[0;31mdatadog user - Missing\033[0m"
For more details about creating and managing users in MongoDB, see the MongoDB Security documentation.
To configure this check for an Agent running on a host:
Edit the tokumx.d/conf.yaml
file in the conf.d/
folder at the root of your Agent’s configuration directory.
See the sample tokumx.d/conf.yaml for all available configuration options:
init_config:
instances:
- server: "mongodb://<USER>:<PASSWORD>@localhost:27017"
Restart the Agent to start sending TokuMX metrics to Datadog.
For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.
Parameter | Value |
---|---|
<INTEGRATION_NAME> | tokumx |
<INIT_CONFIG> | blank or {} |
<INSTANCE_CONFIG> | {"server": "mongodb://<USER>:<PASSWORD>@%%host%%:27017"} |
Run the Agent’s status
subcommand and look for tokumx
under the Checks section.
tokumx.asserts.msgps (gauge) | The number of message assertions raised per second. Shown as assertion |
tokumx.asserts.regularps (gauge) | The number of regular assertions raised per second. Shown as assertion |
tokumx.asserts.rolloversps (gauge) | The number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions. Shown as assertion |
tokumx.asserts.userps (gauge) | The number of user assertions raised per second. Shown as assertion |
tokumx.asserts.warningps (gauge) | The number of warnings raised per second. Shown as assertion |
tokumx.connections.available (gauge) | The number of unused available incoming connections the database can provide. Shown as connection |
tokumx.connections.current (gauge) | The number of connections to the database server from clients. Shown as connection |
tokumx.cursors.timedOut (gauge) | The total number of cursors that have timed out since the server process started. Shown as cursor |
tokumx.cursors.totalOpen (gauge) | The number of cursors that tokumx is maintaining for clients. Shown as cursor |
tokumx.ft.alerts.checkpointFailures (gauge) | The number of checkpoints that have failed for any reason. Shown as event |
tokumx.ft.alerts.locktreeRequestsPending (gauge) | The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks. Shown as request |
tokumx.ft.alerts.longWaitEvents.cachePressure.countps (gauge) | Rate at which a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed. Shown as event |
tokumx.ft.alerts.longWaitEvents.cachePressure.timeps (gauge) | Fraction of time (microseconds/second) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed. Shown as fraction |
tokumx.ft.alerts.longWaitEvents.checkpointBegin.countps (gauge) | Rate at which the begin checkpoint phase of checkpoint has run (these should be fairly quick). Shown as event |
tokumx.ft.alerts.longWaitEvents.checkpointBegin.timeps (gauge) | Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads. Shown as fraction |
tokumx.ft.alerts.longWaitEvents.fsync.countps (gauge) | Rate at which fsync operations took more than 1 second. Shown as event |
tokumx.ft.alerts.longWaitEvents.fsync.timeps (gauge) | Fraction of time (microseconds/second) spent performing fsync operations that took longer than 1 second. Shown as fraction |
tokumx.ft.alerts.longWaitEvents.locktreeWait.countps (gauge) | Rate at which a thread had to wait more than 1 second to acquire a document-level lock in the locktree. Shown as event |
tokumx.ft.alerts.longWaitEvents.locktreeWait.timeps (gauge) | Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock in the locktree. Shown as fraction |
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.countps (gauge) | Rate at which a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation. Shown as event |
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.timeps (gauge) | Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation. Shown as fraction |
tokumx.ft.alerts.longWaitEvents.logBufferWaitps (gauge) | Rate at which a writing client had to wait more than 100ms for access to the log buffer. Shown as event |
tokumx.ft.cachetable.evictions.full.leaf.clean.bytesps (gauge) | Rate of full evictions of leaf nodes. Shown as byte |
tokumx.ft.cachetable.evictions.full.leaf.clean.countps (gauge) | Rate of full evictions of leaf nodes. Shown as event |
tokumx.ft.cachetable.evictions.full.leaf.dirty.bytesps (gauge) | Rate of full evictions of leaf nodes that need to be written back to disk. Shown as byte |
tokumx.ft.cachetable.evictions.full.leaf.dirty.countps (gauge) | Rate of full evictions of leaf nodes that need to be written back to disk. Shown as event |
tokumx.ft.cachetable.evictions.full.leaf.dirty.timeps (gauge) | Fraction of time (microseconds/second) spent performing full evictions leaf nodes, including the time spent serializing, compressing, and writing those nodes to disk. Shown as fraction |
tokumx.ft.cachetable.evictions.full.nonleaf.clean.bytesps (gauge) | Rate of full evictions of nonleaf nodes. Shown as byte |
tokumx.ft.cachetable.evictions.full.nonleaf.clean.countps (gauge) | Rate of full evictions of nonleaf nodes. Shown as event |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.bytesps (gauge) | Rate of full evictions of nonleaf nodes that need to be written back to disk. Shown as byte |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.countps (gauge) | Rate of full evictions of nonleaf nodes that need to be written back to disk. Shown as event |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.timeps (gauge) | Fraction of time (microseconds/second) spent performing full evictions nonleaf nodes, including the time spent serializing, compressing, and writing those nodes to disk. Shown as fraction |
tokumx.ft.cachetable.evictions.partial.leaf.clean.bytesps (gauge) | Rate of partial evictions of leaf nodes. Shown as byte |
tokumx.ft.cachetable.evictions.partial.leaf.clean.countps (gauge) | Rate of partial evictions of leaf nodes. Shown as event |
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.bytesps (gauge) | Rate of partial evictions of nonleaf nodes. Shown as byte |
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.countps (gauge) | Rate of partial evictions of nonleaf nodes. Shown as event |
tokumx.ft.cachetable.miss.countps (gauge) | Rate of internal cache misses. This metric is similar to MongoDB's btree misses and page faults. Shown as miss |
tokumx.ft.cachetable.miss.full.countps (gauge) | Rate of full internal cache misses. Shown as miss |
tokumx.ft.cachetable.miss.full.timeps (gauge) | Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a full cache miss. Shown as fraction |
tokumx.ft.cachetable.miss.partial.countps (gauge) | Rate of partial internal cache misses. Shown as miss |
tokumx.ft.cachetable.miss.partial.timeps (gauge) | Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a partial cache miss. Shown as fraction |
tokumx.ft.cachetable.miss.timeps (gauge) | Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for cache misses. Shown as fraction |
tokumx.ft.cachetable.size.current (gauge) | Total amount of uncompressed data currently in the database's internal cache. Shown as byte |
tokumx.ft.cachetable.size.limit (gauge) | Total amount of uncompressed data that will fit in TokuMX's internal cache. Shown as byte |
tokumx.ft.cachetable.size.writing (gauge) | Total size of nodes that are currently queued up to be written to disk for eviction. Shown as byte |
tokumx.ft.checkpoint.begin.timeps (gauge) | Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads. Shown as fraction |
tokumx.ft.checkpoint.countps (gauge) | Rate at which checkpoints are completed. Shown as event |
tokumx.ft.checkpoint.lastComplete.time (gauge) | The time spent, in seconds, by the most recently completed checkpoint. Shown as second |
tokumx.ft.checkpoint.timeps (gauge) | Fraction of time (seconds/second) spent doing checkpoints. Shown as fraction |
tokumx.ft.checkpoint.write.leaf.bytes.compressedps (gauge) | The rate at which leaf nodes are written to disk during checkpoints, after compression. Shown as byte |
tokumx.ft.checkpoint.write.leaf.bytes.uncompressedps (gauge) | The rate at which leaf nodes are written to disk during checkpoints, before compression. Shown as byte |
tokumx.ft.checkpoint.write.leaf.countps (gauge) | The rate at which leaf nodes are written to disk during checkpoints. Shown as write |
tokumx.ft.checkpoint.write.leaf.timeps (gauge) | The fraction of time spent writing leaf nodes to disk during checkpoints. Shown as fraction |
tokumx.ft.checkpoint.write.nonleaf.bytes.compressedps (gauge) | The rate at which nonleaf nodes are written to disk during checkpoints, after compression. Shown as byte |
tokumx.ft.checkpoint.write.nonleaf.bytes.uncompressedps (gauge) | The rate at which nonleaf nodes are written to disk during checkpoints, before compression. Shown as byte |
tokumx.ft.checkpoint.write.nonleaf.countps (gauge) | The rate at which nonleaf nodes are written to disk during checkpoints. Shown as write |
tokumx.ft.checkpoint.write.nonleaf.timeps (gauge) | The fraction of time spent writing nonleaf nodes to disk during checkpoints. Shown as fraction |
tokumx.ft.compressionRatio.leaf (gauge) | The size ratio of leaf nodes before and after compression. Shown as fraction |
tokumx.ft.compressionRatio.nonleaf (gauge) | The size ratio of nonleaf nodes before and after compression. Shown as fraction |
tokumx.ft.compressionRatio.overall (gauge) | The size ratio of nodes before and after compression. Shown as fraction |
tokumx.ft.fsync.countps (gauge) | The rate at which the database flushed the operating system's file buffers to disk. Shown as operation |
tokumx.ft.fsync.timeps (gauge) | The fraction of time (microseconds/second) used to fsync to disk. Shown as fraction |
tokumx.ft.locktree.size.current (gauge) | Total memory the locktree is currently using. Shown as byte |
tokumx.ft.locktree.size.limit (gauge) | Maximum number of bytes that the locktree is allowed to use. Shown as byte |
tokumx.ft.log.bytesps (gauge) | The rate at which the logger writes to disk. Shown as byte |
tokumx.ft.log.countps (gauge) | The rate of of individual log writes. Shown as write |
tokumx.ft.log.timeps (gauge) | The fraction of time spent performing log writes. Shown as fraction |
tokumx.ft.serializeTime.leaf.compressps (gauge) | Fraction of time spent compressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty). Shown as fraction |
tokumx.ft.serializeTime.leaf.decompressps (gauge) | Fraction of time spent decompressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty). Shown as fraction |
tokumx.ft.serializeTime.leaf.deserializeps (gauge) | Fraction of time spent deserializing leaf nodes and their partitions after reading them off disk. Shown as fraction |
tokumx.ft.serializeTime.leaf.serializeps (gauge) | Fraction of time spent serializing leaf nodes and their partitions after reading them off disk. Shown as fraction |
tokumx.ft.serializeTime.nonleaf.compressps (gauge) | Fraction of time spent compressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty). Shown as fraction |
tokumx.ft.serializeTime.nonleaf.decompressps (gauge) | Fraction of time spent decompressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty). Shown as fraction |
tokumx.ft.serializeTime.nonleaf.deserializeps (gauge) | Fraction of time spent deserializing nonleaf nodes and their partitions after reading them off disk. Shown as fraction |
tokumx.ft.serializeTime.nonleaf.serializeps (gauge) | Fraction of time spent serializing nonleaf nodes and their partitions after reading them off disk. Shown as fraction |
tokumx.mem.resident (gauge) | The amount of memory currently used by the database process. Shown as mebibyte |
tokumx.mem.virtual (gauge) | The amount of virtual memory used by the database process. Shown as mebibyte |
tokumx.metrics.document.deletedps (gauge) | The number of documents deleted per second. Shown as document |
tokumx.metrics.document.insertedps (gauge) | The number of documents inserted per second. Shown as document |
tokumx.metrics.document.returnedps (gauge) | The number of documents returned by queries per second. Shown as document |
tokumx.metrics.document.updatedps (gauge) | The number of documents updated per second. Shown as document |
tokumx.metrics.getLastError.wtime.numps (gauge) | The number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation. Shown as operation |
tokumx.metrics.getLastError.wtime.totalMillisps (gauge) | The number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError. Shown as event |
tokumx.metrics.getLastError.wtimeoutsps (gauge) | The fraction of time (ms/s) spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation. Shown as fraction |
tokumx.metrics.operation.idhackps (gauge) | The rate of queries that contain the _id field. Shown as query |
tokumx.metrics.operation.scanAndOrderps (gauge) | The rate of queries that return sorted numbers that cannot perform the sort operation using an index. Shown as query |
tokumx.metrics.queryExecutor.scannedps (gauge) | The rate of index items scanned during queries and query-plan evaluation. Shown as operation |
tokumx.metrics.repl.apply.batches.numps (gauge) | The number of batches applied across all databases per second. Shown as operation |
tokumx.metrics.repl.apply.batches.totalMillisps (gauge) | The fraction of time (ms/s) spent applying operations from the oplog. Shown as fraction |
tokumx.metrics.repl.apply.opsps (gauge) | The rate of oplog operations. Shown as operation |
tokumx.metrics.repl.buffer.count (gauge) | The number of operations in the oplog buffer. Shown as operation |
tokumx.metrics.repl.buffer.sizeBytes (gauge) | The current size of the contents of the oplog buffer. Shown as byte |
tokumx.metrics.repl.network.bytesps (gauge) | The rate at which data is read from the replication sync source. Shown as byte |
tokumx.metrics.repl.network.getmores.numps (gauge) | The rate of getmore operations. Shown as operation |
tokumx.metrics.repl.network.getmores.totalMillisps (gauge) | The fraction of time (ms/s) spent collecting data from getmore operations. Shown as fraction |
tokumx.metrics.repl.network.opsps (gauge) | The rate of operations read from the replication source. Shown as operation |
tokumx.metrics.repl.network.readersCreatedps (gauge) | The rate at which oplog query processes are created. Shown as process |
tokumx.metrics.repl.oplog.insert.numps (gauge) | The rate at which operations are inserted into the oplog. Shown as operation |
tokumx.metrics.repl.oplog.insert.totalMillisps (gauge) | The fraction of time (ms/s) spent inserting operations into the oplog. Shown as fraction |
tokumx.metrics.repl.oplog.insertBytesps (gauge) | The rate (in bytes) at which data is inserted into the oplog. Shown as byte |
tokumx.metrics.ttl.deletedDocumentsps (gauge) | The rate at which documents are deleted from collections with a ttl index. Shown as document |
tokumx.metrics.ttl.passesps (gauge) | The number of times per second the background process removes documents from collections with a ttl index. Shown as event |
tokumx.opcounters.commandps (gauge) | The total number of commands per second issued to the database. Shown as command |
tokumx.opcounters.deleteps (gauge) | The number of delete operations per second. Shown as operation |
tokumx.opcounters.getmoreps (gauge) | The number of getmore operations per second. Shown as operation |
tokumx.opcounters.insertps (gauge) | The number of insert operations per second. Shown as operation |
tokumx.opcounters.queryps (gauge) | The total number of queries per second. Shown as query |
tokumx.opcounters.updateps (gauge) | The number of update operations per second. Shown as operation |
tokumx.opcountersRepl.commandps (gauge) | The total number of replicated commands issued to the database per second. Shown as command |
tokumx.opcountersRepl.deleteps (gauge) | The number of replicated delete operations per second. Shown as operation |
tokumx.opcountersRepl.getmoreps (gauge) | The number of replicated getmore operations per second. Shown as operation |
tokumx.opcountersRepl.insertps (gauge) | The number of replicated insert operations per second. Shown as operation |
tokumx.opcountersRepl.queryps (gauge) | The total number of replicated queries per second. Shown as query |
tokumx.opcountersRepl.updateps (gauge) | The number of replicated update operations per second. Shown as operation |
tokumx.stats.coll.count (gauge) | The number of objects or documents in this collection. Shown as document |
tokumx.stats.coll.nindexes (gauge) | The number of indexes on this collection. Shown as index |
tokumx.stats.coll.nindexesbeingbuilt (gauge) | The number of indexes currently being built. Shown as index |
tokumx.stats.coll.size (gauge) | The total size in memory of all records in a collection. Does not include the record header, but does include the record's padding. Does not include the size of any indexes associated with the collection. Shown as byte |
tokumx.stats.coll.storageSize (gauge) | The total amount of storage allocated to this collection for document storage. Shown as byte |
tokumx.stats.coll.totalIndexSize (gauge) | The total size of all indexes on this collection. Shown as byte |
tokumx.stats.coll.totalIndexStorageSize (gauge) | The total size on disk of all indexes on this collection (after compression). Shown as byte |
tokumx.stats.dataSize (gauge) | The total size of the data held in this database including the padding factor. Shown as byte |
tokumx.stats.db.avgObjSize (gauge) | The average size of each document. Shown as byte |
tokumx.stats.db.collections (gauge) | The number of collections in the database. |
tokumx.stats.db.dataSize (gauge) | The total size of the data held in this database including the padding factor. Shown as byte |
tokumx.stats.db.indexSize (gauge) | The total size of all indexes created on this database. Shown as byte |
tokumx.stats.db.indexStorageSize (gauge) | The total size on disk of all indexes created on this database (after compression). Shown as byte |
tokumx.stats.db.indexes (gauge) | The total number of indexes across all collections in the database. Shown as index |
tokumx.stats.db.objects (gauge) | The number of documents in the database across all collections. Shown as document |
tokumx.stats.db.storageSize (gauge) | The total amount of space allocated to collections in this database for document storage. Shown as byte |
tokumx.stats.idx.avgObjSize (gauge) | The average size of each index entry. Shown as byte |
tokumx.stats.idx.count (gauge) | The number of documents in this index. Shown as index |
tokumx.stats.idx.deletes (gauge) | The number of delete operations performed on this index. Shown as operation |
tokumx.stats.idx.inserts (gauge) | The number of insert operations performed on this index. Shown as operation |
tokumx.stats.idx.nscanned (gauge) | The number of index entries scanned for queries using this index. Shown as index |
tokumx.stats.idx.nscannedObjects (gauge) | The number of collection objects examined after scanning an index entry for a query using this index. Shown as object |
tokumx.stats.idx.queries (gauge) | The number of query operations performed using this index. Shown as query |
tokumx.stats.idx.size (gauge) | The total size of this index. Shown as byte |
tokumx.stats.idx.storageSize (gauge) | The total size on disk of this index (after compression). Shown as byte |
tokumx.stats.indexSize (gauge) | The total size of all indexes created on this database. Shown as byte |
tokumx.stats.indexes (gauge) | The total number of indexes across all collections in the database. Shown as index |
tokumx.stats.objects (gauge) | The number of documents in the database across all collections. Shown as document |
tokumx.stats.storageSize (gauge) | The total amount of space allocated to collections in this database for document storage. Shown as byte |
tokumx.uptime (gauge) | The time that the tokumx process has been active. Shown as second |
Replication state changes:
This check emits an event each time a TokuMX node has a change in its replication state.
tokumx.can_connect
Returns CRITICAL
if the Agent is unable to connect to the monitored TokuMX instance. Returns OK
otherwise.
Statuses: ok, critical
Need help? Contact Datadog support.