TokuMX

Supported OS Linux Mac OS Windows

Intégration3.2.0

Présentation

Ce check recueille des métriques TokuMX comme :

  • Le nombre d’opérations effectuées
  • Le délai de réplication
  • L’utilisation et la taille d’une table de cache

Configuration

Installation

Le check TokuMX est inclus avec le package de l’Agent Datadog. Vous n’avez donc rien d’autre à installer sur votre serveur.

Configuration

Préparer TokuMX

  1. Installez le module Python pour MongoDB sur votre serveur MongoDB avec la commande suivante :

    sudo pip install --upgrade "pymongo<3.0"
    
  2. Vous pouvez vérifier que le module est installé avec cette commande :

    python -c "import pymongo" 2>&1 | grep ImportError && \
    echo -e "\033[0;31mpymongo python module - Missing\033[0m" || \
    echo -e "\033[0;32mpymongo python module - OK\033[0m"
    
  3. Lancez le shell Mongo, puis créez un utilisateur en lecture seule pour l’Agent Datadog dans la base de données admin :

    # Authenticate as the admin user.
    use admin
    db.auth("admin", "<YOUR_TOKUMX_ADMIN_PASSWORD>")
    # Add a user for Datadog Agent
    db.addUser("datadog", "<UNIQUEPASSWORD>", true)
    
  4. Vérifiez que vous avez créé l’utilisateur avec la commande suivante (en dehors du shell Mongo).

    python -c 'from pymongo import Connection; print Connection().admin.authenticate("datadog", "<UNIQUEPASSWORD>")' | \
    grep True && \
    echo -e "\033[0;32mdatadog user - OK\033[0m" || \
    echo -e "\033[0;31mdatadog user - Missing\033[0m"
    

Pour en savoir plus sur la création et la gestion des utilisateurs dans MongoDB, consultez documentation MongoDB sur la sécurité (en anglais).

Host

Pour configurer ce check lorsque l’Agent est exécuté sur un host :

  1. Modifiez le fichier tokumx.d/conf.yaml dans le dossier conf.d/ à la racine du répertoire de configuration de votre Agent. Consultez le fichier d’exemple tokumx.d/conf.yaml pour découvrir toutes les options de configuration disponibles.

    init_config:
    
    instances:
      - server: "mongodb://<USER>:<PASSWORD>@localhost:27017"
    
  2. Redémarrez l’Agent pour commencer à envoyer des métriques TokuMX à Datadog.

Environnement conteneurisé

Consultez la documentation relative aux modèles d’intégration Autodiscovery pour découvrir comment appliquer les paramètres ci-dessous à un environnement conteneurisé.

ParamètreValeur
<NOM_INTÉGRATION>tokumx
<CONFIG_INIT>vide ou {}
<CONFIG_INSTANCE>{"server": "mongodb://<UTILISATEUR>:<MOTDEPASSE>@%%host%%:27017"}

Validation

Lancez la sous-commande status de l’Agent et cherchez tokumx dans la section Checks.

Données collectées

Métriques

tokumx.asserts.msgps
(gauge)
The number of message assertions raised per second.
Shown as assertion
tokumx.asserts.regularps
(gauge)
The number of regular assertions raised per second.
Shown as assertion
tokumx.asserts.rolloversps
(gauge)
The number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions.
Shown as assertion
tokumx.asserts.userps
(gauge)
The number of user assertions raised per second.
Shown as assertion
tokumx.asserts.warningps
(gauge)
The number of warnings raised per second.
Shown as assertion
tokumx.connections.available
(gauge)
The number of unused available incoming connections the database can provide.
Shown as connection
tokumx.connections.current
(gauge)
The number of connections to the database server from clients.
Shown as connection
tokumx.cursors.timedOut
(gauge)
The total number of cursors that have timed out since the server process started.
Shown as cursor
tokumx.cursors.totalOpen
(gauge)
The number of cursors that tokumx is maintaining for clients.
Shown as cursor
tokumx.ft.alerts.checkpointFailures
(gauge)
The number of checkpoints that have failed for any reason.
Shown as event
tokumx.ft.alerts.locktreeRequestsPending
(gauge)
The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks.
Shown as request
tokumx.ft.alerts.longWaitEvents.cachePressure.countps
(gauge)
Rate at which a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
Shown as event
tokumx.ft.alerts.longWaitEvents.cachePressure.timeps
(gauge)
Fraction of time (microseconds/second) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.checkpointBegin.countps
(gauge)
Rate at which the begin checkpoint phase of checkpoint has run (these should be fairly quick).
Shown as event
tokumx.ft.alerts.longWaitEvents.checkpointBegin.timeps
(gauge)
Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.fsync.countps
(gauge)
Rate at which fsync operations took more than 1 second.
Shown as event
tokumx.ft.alerts.longWaitEvents.fsync.timeps
(gauge)
Fraction of time (microseconds/second) spent performing fsync operations that took longer than 1 second.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.locktreeWait.countps
(gauge)
Rate at which a thread had to wait more than 1 second to acquire a document-level lock in the locktree.
Shown as event
tokumx.ft.alerts.longWaitEvents.locktreeWait.timeps
(gauge)
Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock in the locktree.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.countps
(gauge)
Rate at which a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
Shown as event
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.timeps
(gauge)
Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
Shown as fraction
tokumx.ft.alerts.longWaitEvents.logBufferWaitps
(gauge)
Rate at which a writing client had to wait more than 100ms for access to the log buffer.
Shown as event
tokumx.ft.cachetable.evictions.full.leaf.clean.bytesps
(gauge)
Rate of full evictions of leaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.full.leaf.clean.countps
(gauge)
Rate of full evictions of leaf nodes.
Shown as event
tokumx.ft.cachetable.evictions.full.leaf.dirty.bytesps
(gauge)
Rate of full evictions of leaf nodes that need to be written back to disk.
Shown as byte
tokumx.ft.cachetable.evictions.full.leaf.dirty.countps
(gauge)
Rate of full evictions of leaf nodes that need to be written back to disk.
Shown as event
tokumx.ft.cachetable.evictions.full.leaf.dirty.timeps
(gauge)
Fraction of time (microseconds/second) spent performing full evictions leaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
Shown as fraction
tokumx.ft.cachetable.evictions.full.nonleaf.clean.bytesps
(gauge)
Rate of full evictions of nonleaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.full.nonleaf.clean.countps
(gauge)
Rate of full evictions of nonleaf nodes.
Shown as event
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.bytesps
(gauge)
Rate of full evictions of nonleaf nodes that need to be written back to disk.
Shown as byte
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.countps
(gauge)
Rate of full evictions of nonleaf nodes that need to be written back to disk.
Shown as event
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.timeps
(gauge)
Fraction of time (microseconds/second) spent performing full evictions nonleaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
Shown as fraction
tokumx.ft.cachetable.evictions.partial.leaf.clean.bytesps
(gauge)
Rate of partial evictions of leaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.partial.leaf.clean.countps
(gauge)
Rate of partial evictions of leaf nodes.
Shown as event
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.bytesps
(gauge)
Rate of partial evictions of nonleaf nodes.
Shown as byte
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.countps
(gauge)
Rate of partial evictions of nonleaf nodes.
Shown as event
tokumx.ft.cachetable.miss.countps
(gauge)
Rate of internal cache misses. This metric is similar to MongoDB's btree misses and page faults.
Shown as miss
tokumx.ft.cachetable.miss.full.countps
(gauge)
Rate of full internal cache misses.
Shown as miss
tokumx.ft.cachetable.miss.full.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a full cache miss.
Shown as fraction
tokumx.ft.cachetable.miss.partial.countps
(gauge)
Rate of partial internal cache misses.
Shown as miss
tokumx.ft.cachetable.miss.partial.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a partial cache miss.
Shown as fraction
tokumx.ft.cachetable.miss.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for cache misses.
Shown as fraction
tokumx.ft.cachetable.size.current
(gauge)
Total amount of uncompressed data currently in the database's internal cache.
Shown as byte
tokumx.ft.cachetable.size.limit
(gauge)
Total amount of uncompressed data that will fit in TokuMX's internal cache.
Shown as byte
tokumx.ft.cachetable.size.writing
(gauge)
Total size of nodes that are currently queued up to be written to disk for eviction.
Shown as byte
tokumx.ft.checkpoint.begin.timeps
(gauge)
Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
Shown as fraction
tokumx.ft.checkpoint.countps
(gauge)
Rate at which checkpoints are completed.
Shown as event
tokumx.ft.checkpoint.lastComplete.time
(gauge)
The time spent, in seconds, by the most recently completed checkpoint.
Shown as second
tokumx.ft.checkpoint.timeps
(gauge)
Fraction of time (seconds/second) spent doing checkpoints.
Shown as fraction
tokumx.ft.checkpoint.write.leaf.bytes.compressedps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints, after compression.
Shown as byte
tokumx.ft.checkpoint.write.leaf.bytes.uncompressedps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints, before compression.
Shown as byte
tokumx.ft.checkpoint.write.leaf.countps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints.
Shown as write
tokumx.ft.checkpoint.write.leaf.timeps
(gauge)
The fraction of time spent writing leaf nodes to disk during checkpoints.
Shown as fraction
tokumx.ft.checkpoint.write.nonleaf.bytes.compressedps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints, after compression.
Shown as byte
tokumx.ft.checkpoint.write.nonleaf.bytes.uncompressedps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints, before compression.
Shown as byte
tokumx.ft.checkpoint.write.nonleaf.countps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints.
Shown as write
tokumx.ft.checkpoint.write.nonleaf.timeps
(gauge)
The fraction of time spent writing nonleaf nodes to disk during checkpoints.
Shown as fraction
tokumx.ft.compressionRatio.leaf
(gauge)
The size ratio of leaf nodes before and after compression.
Shown as fraction
tokumx.ft.compressionRatio.nonleaf
(gauge)
The size ratio of nonleaf nodes before and after compression.
Shown as fraction
tokumx.ft.compressionRatio.overall
(gauge)
The size ratio of nodes before and after compression.
Shown as fraction
tokumx.ft.fsync.countps
(gauge)
The rate at which the database flushed the operating system's file buffers to disk.
Shown as operation
tokumx.ft.fsync.timeps
(gauge)
The fraction of time (microseconds/second) used to fsync to disk.
Shown as fraction
tokumx.ft.locktree.size.current
(gauge)
Total memory the locktree is currently using.
Shown as byte
tokumx.ft.locktree.size.limit
(gauge)
Maximum number of bytes that the locktree is allowed to use.
Shown as byte
tokumx.ft.log.bytesps
(gauge)
The rate at which the logger writes to disk.
Shown as byte
tokumx.ft.log.countps
(gauge)
The rate of of individual log writes.
Shown as write
tokumx.ft.log.timeps
(gauge)
The fraction of time spent performing log writes.
Shown as fraction
tokumx.ft.serializeTime.leaf.compressps
(gauge)
Fraction of time spent compressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.leaf.decompressps
(gauge)
Fraction of time spent decompressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.leaf.deserializeps
(gauge)
Fraction of time spent deserializing leaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.ft.serializeTime.leaf.serializeps
(gauge)
Fraction of time spent serializing leaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.ft.serializeTime.nonleaf.compressps
(gauge)
Fraction of time spent compressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.nonleaf.decompressps
(gauge)
Fraction of time spent decompressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
Shown as fraction
tokumx.ft.serializeTime.nonleaf.deserializeps
(gauge)
Fraction of time spent deserializing nonleaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.ft.serializeTime.nonleaf.serializeps
(gauge)
Fraction of time spent serializing nonleaf nodes and their partitions after reading them off disk.
Shown as fraction
tokumx.mem.resident
(gauge)
The amount of memory currently used by the database process.
Shown as mebibyte
tokumx.mem.virtual
(gauge)
The amount of virtual memory used by the database process.
Shown as mebibyte
tokumx.metrics.document.deletedps
(gauge)
The number of documents deleted per second.
Shown as document
tokumx.metrics.document.insertedps
(gauge)
The number of documents inserted per second.
Shown as document
tokumx.metrics.document.returnedps
(gauge)
The number of documents returned by queries per second.
Shown as document
tokumx.metrics.document.updatedps
(gauge)
The number of documents updated per second.
Shown as document
tokumx.metrics.getLastError.wtime.numps
(gauge)
The number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
Shown as operation
tokumx.metrics.getLastError.wtime.totalMillisps
(gauge)
The number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError.
Shown as event
tokumx.metrics.getLastError.wtimeoutsps
(gauge)
The fraction of time (ms/s) spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
Shown as fraction
tokumx.metrics.operation.idhackps
(gauge)
The rate of queries that contain the _id field.
Shown as query
tokumx.metrics.operation.scanAndOrderps
(gauge)
The rate of queries that return sorted numbers that cannot perform the sort operation using an index.
Shown as query
tokumx.metrics.queryExecutor.scannedps
(gauge)
The rate of index items scanned during queries and query-plan evaluation.
Shown as operation
tokumx.metrics.repl.apply.batches.numps
(gauge)
The number of batches applied across all databases per second.
Shown as operation
tokumx.metrics.repl.apply.batches.totalMillisps
(gauge)
The fraction of time (ms/s) spent applying operations from the oplog.
Shown as fraction
tokumx.metrics.repl.apply.opsps
(gauge)
The rate of oplog operations.
Shown as operation
tokumx.metrics.repl.buffer.count
(gauge)
The number of operations in the oplog buffer.
Shown as operation
tokumx.metrics.repl.buffer.sizeBytes
(gauge)
The current size of the contents of the oplog buffer.
Shown as byte
tokumx.metrics.repl.network.bytesps
(gauge)
The rate at which data is read from the replication sync source.
Shown as byte
tokumx.metrics.repl.network.getmores.numps
(gauge)
The rate of getmore operations.
Shown as operation
tokumx.metrics.repl.network.getmores.totalMillisps
(gauge)
The fraction of time (ms/s) spent collecting data from getmore operations.
Shown as fraction
tokumx.metrics.repl.network.opsps
(gauge)
The rate of operations read from the replication source.
Shown as operation
tokumx.metrics.repl.network.readersCreatedps
(gauge)
The rate at which oplog query processes are created.
Shown as process
tokumx.metrics.repl.oplog.insert.numps
(gauge)
The rate at which operations are inserted into the oplog.
Shown as operation
tokumx.metrics.repl.oplog.insert.totalMillisps
(gauge)
The fraction of time (ms/s) spent inserting operations into the oplog.
Shown as fraction
tokumx.metrics.repl.oplog.insertBytesps
(gauge)
The rate (in bytes) at which data is inserted into the oplog.
Shown as byte
tokumx.metrics.ttl.deletedDocumentsps
(gauge)
The rate at which documents are deleted from collections with a ttl index.
Shown as document
tokumx.metrics.ttl.passesps
(gauge)
The number of times per second the background process removes documents from collections with a ttl index.
Shown as event
tokumx.opcounters.commandps
(gauge)
The total number of commands per second issued to the database.
Shown as command
tokumx.opcounters.deleteps
(gauge)
The number of delete operations per second.
Shown as operation
tokumx.opcounters.getmoreps
(gauge)
The number of getmore operations per second.
Shown as operation
tokumx.opcounters.insertps
(gauge)
The number of insert operations per second.
Shown as operation
tokumx.opcounters.queryps
(gauge)
The total number of queries per second.
Shown as query
tokumx.opcounters.updateps
(gauge)
The number of update operations per second.
Shown as operation
tokumx.opcountersRepl.commandps
(gauge)
The total number of replicated commands issued to the database per second.
Shown as command
tokumx.opcountersRepl.deleteps
(gauge)
The number of replicated delete operations per second.
Shown as operation
tokumx.opcountersRepl.getmoreps
(gauge)
The number of replicated getmore operations per second.
Shown as operation
tokumx.opcountersRepl.insertps
(gauge)
The number of replicated insert operations per second.
Shown as operation
tokumx.opcountersRepl.queryps
(gauge)
The total number of replicated queries per second.
Shown as query
tokumx.opcountersRepl.updateps
(gauge)
The number of replicated update operations per second.
Shown as operation
tokumx.stats.coll.count
(gauge)
The number of objects or documents in this collection.
Shown as document
tokumx.stats.coll.nindexes
(gauge)
The number of indexes on this collection.
Shown as index
tokumx.stats.coll.nindexesbeingbuilt
(gauge)
The number of indexes currently being built.
Shown as index
tokumx.stats.coll.size
(gauge)
The total size in memory of all records in a collection. Does not include the record header, but does include the record's padding. Does not include the size of any indexes associated with the collection.
Shown as byte
tokumx.stats.coll.storageSize
(gauge)
The total amount of storage allocated to this collection for document storage.
Shown as byte
tokumx.stats.coll.totalIndexSize
(gauge)
The total size of all indexes on this collection.
Shown as byte
tokumx.stats.coll.totalIndexStorageSize
(gauge)
The total size on disk of all indexes on this collection (after compression).
Shown as byte
tokumx.stats.dataSize
(gauge)
The total size of the data held in this database including the padding factor.
Shown as byte
tokumx.stats.db.avgObjSize
(gauge)
The average size of each document.
Shown as byte
tokumx.stats.db.collections
(gauge)
The number of collections in the database.
tokumx.stats.db.dataSize
(gauge)
The total size of the data held in this database including the padding factor.
Shown as byte
tokumx.stats.db.indexSize
(gauge)
The total size of all indexes created on this database.
Shown as byte
tokumx.stats.db.indexStorageSize
(gauge)
The total size on disk of all indexes created on this database (after compression).
Shown as byte
tokumx.stats.db.indexes
(gauge)
The total number of indexes across all collections in the database.
Shown as index
tokumx.stats.db.objects
(gauge)
The number of documents in the database across all collections.
Shown as document
tokumx.stats.db.storageSize
(gauge)
The total amount of space allocated to collections in this database for document storage.
Shown as byte
tokumx.stats.idx.avgObjSize
(gauge)
The average size of each index entry.
Shown as byte
tokumx.stats.idx.count
(gauge)
The number of documents in this index.
Shown as index
tokumx.stats.idx.deletes
(gauge)
The number of delete operations performed on this index.
Shown as operation
tokumx.stats.idx.inserts
(gauge)
The number of insert operations performed on this index.
Shown as operation
tokumx.stats.idx.nscanned
(gauge)
The number of index entries scanned for queries using this index.
Shown as index
tokumx.stats.idx.nscannedObjects
(gauge)
The number of collection objects examined after scanning an index entry for a query using this index.
Shown as object
tokumx.stats.idx.queries
(gauge)
The number of query operations performed using this index.
Shown as query
tokumx.stats.idx.size
(gauge)
The total size of this index.
Shown as byte
tokumx.stats.idx.storageSize
(gauge)
The total size on disk of this index (after compression).
Shown as byte
tokumx.stats.indexSize
(gauge)
The total size of all indexes created on this database.
Shown as byte
tokumx.stats.indexes
(gauge)
The total number of indexes across all collections in the database.
Shown as index
tokumx.stats.objects
(gauge)
The number of documents in the database across all collections.
Shown as document
tokumx.stats.storageSize
(gauge)
The total amount of space allocated to collections in this database for document storage.
Shown as byte
tokumx.uptime
(gauge)
The time that the tokumx process has been active.
Shown as second

Événements

Changements d’état de réplication :

Ce check émet un événement à chaque fois que l’état de réplication d’un nœud TokuMX change.

Checks de service

tokumx.can_connect
Renvoie CRITICAL si l’Agent n’est pas capable de se connecter à l’instance TokuMX qu’il surveille. Si ce n’est pas le cas, renvoie OK.
Statuses: ok, critical

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.

Pour aller plus loin

PREVIEWING: may/unit-testing