Source:
HashiCorp Vault by HTTP
Overview
The template to monitor HashiCorp Vault by Áú»¢¶Ä²© that work without any external scripts. Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.
Template Vault by HTTP
¡ª collects metrics by HTTP agent from /sys/metrics
API endpoint.
See .
Requirements
Áú»¢¶Ä²© version: 7.2 and higher.
Tested versions
This template has been tested on:
- Vault 1.6
Configuration
Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.
Setup
See Áú»¢¶Ä²© template operation for basic instructions.
Configure Vault API. See .
Create a Vault service token and set it to the macro {$VAULT.TOKEN}
.
Macros used
Name | Description | Default |
---|---|---|
{$VAULT.API.PORT} | Vault port. |
8200 |
{$VAULT.API.SCHEME} | Vault API scheme. |
http |
{$VAULT.HOST} | Vault host name. |
<PUT YOUR VAULT HOST> |
{$VAULT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors for trigger expression. |
90 |
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} | Maximum number of Vault leadership setup failed. |
5 |
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} | Maximum number of Vault leadership losses. |
5 |
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} | Maximum number of Vault leadership step downs. |
5 |
{$VAULT.LLD.FILTER.STORAGE.MATCHES} | Filter of discoverable storage backends. |
.+ |
{$VAULT.TOKEN} | Vault auth token. |
<PUT YOUR AUTH TOKEN> |
{$VAULT.TOKEN.ACCESSORS} | Vault accessors separated by spaces for monitoring token expiration time. |
|
{$VAULT.TOKEN.TTL.MIN.CRIT} | Token TTL critical threshold. |
3d |
{$VAULT.TOKEN.TTL.MIN.WARN} | Token TTL warning threshold. |
7d |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Get health | HTTP agent | vault.get_health Preprocessing
|
|
Get leader | HTTP agent | vault.get_leader Preprocessing
|
|
Get metrics | HTTP agent | vault.get_metrics Preprocessing
|
|
Clear metrics | Dependent item | vault.clear_metrics Preprocessing
|
|
Get tokens | Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}". |
Script | vault.get_tokens |
Check WAL discovery | Dependent item | vault.check_wal_discovery Preprocessing
|
|
Check replication discovery | Dependent item | vault.check_replication_discovery Preprocessing
|
|
Check storage discovery | Dependent item | vault.check_storage_discovery Preprocessing
|
|
Check mountpoint discovery | Dependent item | vault.check_mountpoint_discovery Preprocessing
|
|
Initialized | Initialization status. |
Dependent item | vault.health.initialized Preprocessing
|
Sealed | Seal status. |
Dependent item | vault.health.sealed Preprocessing
|
Standby | Standby status. |
Dependent item | vault.health.standby Preprocessing
|
Performance standby | Performance standby status. |
Dependent item | vault.health.performance_standby Preprocessing
|
Performance replication | Performance replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replication_performance_mode Preprocessing
|
Disaster Recovery replication | Disaster recovery replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replication_dr_mode Preprocessing
|
Version | Server version. |
Dependent item | vault.health.version Preprocessing
|
Healthcheck | Vault healthcheck. |
Dependent item | vault.health.check Preprocessing
|
HA enabled | HA enabled status. |
Dependent item | vault.leader.ha_enabled Preprocessing
|
Is leader | Leader status. |
Dependent item | vault.leader.is_self Preprocessing
|
Get metrics error | Get metrics error. |
Dependent item | vault.get_metrics.error Preprocessing
|
Process CPU seconds, total | Total user and system CPU time spent in seconds. |
Dependent item | vault.metrics.process.cpu.seconds.total Preprocessing
|
Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | vault.metrics.process.max.fds Preprocessing
|
Open file descriptors, current | Number of open file descriptors. |
Dependent item | vault.metrics.process.open.fds Preprocessing
|
Process resident memory | Resident memory size in bytes. |
Dependent item | vault.metrics.process.resident_memory.bytes Preprocessing
|
Uptime | Server uptime. |
Dependent item | vault.metrics.process.uptime Preprocessing
|
Process virtual memory, current | Virtual memory size in bytes. |
Dependent item | vault.metrics.process.virtual_memory.bytes Preprocessing
|
Process virtual memory, max | Maximum amount of virtual memory available in bytes. |
Dependent item | vault.metrics.process.virtual_memory.max.bytes Preprocessing
|
Audit log requests, rate | Number of all audit log requests across all audit log devices. |
Dependent item | vault.metrics.audit.log.request.rate Preprocessing
|
Audit log request failures, rate | Number of audit log request failures. |
Dependent item | vault.metrics.audit.log.request.failure.rate Preprocessing
|
Audit log response, rate | Number of audit log responses across all audit log devices. |
Dependent item | vault.metrics.audit.log.response.rate Preprocessing
|
Audit log response failures, rate | Number of audit log response failures. |
Dependent item | vault.metrics.audit.log.response.failure.rate Preprocessing
|
Barrier DELETE ops, rate | Number of DELETE operations at the barrier. |
Dependent item | vault.metrics.barrier.delete.rate Preprocessing
|
Barrier GET ops, rate | Number of GET operations at the barrier. |
Dependent item | vault.metrics.vault.barrier.get.rate Preprocessing
|
Barrier LIST ops, rate | Number of LIST operations at the barrier. |
Dependent item | vault.metrics.barrier.list.rate Preprocessing
|
Barrier PUT ops, rate | Number of PUT operations at the barrier. |
Dependent item | vault.metrics.barrier.put.rate Preprocessing
|
Cache hit, rate | Number of times a value was retrieved from the LRU cache. |
Dependent item | vault.metrics.cache.hit.rate Preprocessing
|
Cache miss, rate | Number of times a value was not in the LRU cache. The results in a read from the configured storage. |
Dependent item | vault.metrics.cache.miss.rate Preprocessing
|
Cache write, rate | Number of times a value was written to the LRU cache. |
Dependent item | vault.metrics.cache.write.rate Preprocessing
|
Check token, rate | Number of token checks handled by Vault core. |
Dependent item | vault.metrics.core.check.token.rate Preprocessing
|
Fetch ACL and token, rate | Number of ACL and corresponding token entry fetches handled by Vault core. |
Dependent item | vault.metrics.core.fetch.acl_and_token Preprocessing
|
Requests, rate | Number of requests handled by Vault core. |
Dependent item | vault.metrics.core.handle.request Preprocessing
|
Leadership setup failed, counter | Cluster leadership setup failures which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership.setup_failed Preprocessing
|
Leadership setup lost, counter | Cluster leadership losses which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership_lost Preprocessing
|
Post-unseal ops, counter | Duration of time taken by post-unseal operations handled by Vault core. |
Dependent item | vault.metrics.core.post_unseal Preprocessing
|
Pre-seal ops, counter | Duration of time taken by pre-seal operations. |
Dependent item | vault.metrics.core.pre_seal Preprocessing
|
Requested seal ops, counter | Duration of time taken by requested seal operations. |
Dependent item | vault.metrics.core.seal_with_request Preprocessing
|
Seal ops, counter | Duration of time taken by seal operations. |
Dependent item | vault.metrics.core.seal Preprocessing
|
Internal seal ops, counter | Duration of time taken by internal seal operations. |
Dependent item | vault.metrics.core.seal_internal Preprocessing
|
Leadership step downs, counter | Cluster leadership step down. |
Dependent item | vault.metrics.core.step_down Preprocessing
|
Unseal ops, counter | Duration of time taken by unseal operations. |
Dependent item | vault.metrics.core.unseal Preprocessing
|
Fetch lease times, counter | Time taken to fetch lease times. |
Dependent item | vault.metrics.expire.fetch.lease.times Preprocessing
|
Fetch lease times by token, counter | Time taken to fetch lease times by token. |
Dependent item | vault.metrics.expire.fetch.lease.times.by_token Preprocessing
|
Number of expiring leases | Number of all leases which are eligible for eventual expiry. |
Dependent item | vault.metrics.expire.num_leases Preprocessing
|
Expire revoke, count | Time taken to revoke a token. |
Dependent item | vault.metrics.expire.revoke Preprocessing
|
Expire revoke force, count | Time taken to forcibly revoke a token. |
Dependent item | vault.metrics.expire.revoke.force Preprocessing
|
Expire revoke prefix, count | Tokens revoke on a prefix. |
Dependent item | vault.metrics.expire.revoke.prefix Preprocessing
|
Revoke secrets by token, count | Time taken to revoke all secrets issued with a given token. |
Dependent item | vault.metrics.expire.revoke.by_token Preprocessing
|
Expire renew, count | Time taken to renew a lease. |
Dependent item | vault.metrics.expire.renew Preprocessing
|
Renew token, count | Time taken to renew a token which does not need to invoke a logical backend. |
Dependent item | vault.metrics.expire.renew_token Preprocessing
|
Register ops, count | Time taken for register operations. |
Dependent item | vault.metrics.expire.register Preprocessing
|
Register auth ops, count | Time taken for register authentication operations which create lease entries without lease ID. |
Dependent item | vault.metrics.expire.register.auth Preprocessing
|
Policy GET ops, rate | Number of operations to get a policy. |
Dependent item | vault.metrics.policy.get_policy.rate Preprocessing
|
Policy LIST ops, rate | Number of operations to list policies. |
Dependent item | vault.metrics.policy.list_policies.rate Preprocessing
|
Policy DELETE ops, rate | Number of operations to delete a policy. |
Dependent item | vault.metrics.policy.delete_policy.rate Preprocessing
|
Policy SET ops, rate | Number of operations to set a policy. |
Dependent item | vault.metrics.policy.set_policy.rate Preprocessing
|
Token create, count | The time taken to create a token. |
Dependent item | vault.metrics.token.create Preprocessing
|
Token createAccessor, count | The time taken to create a token accessor. |
Dependent item | vault.metrics.token.createAccessor Preprocessing
|
Token lookup, rate | Number of token look up. |
Dependent item | vault.metrics.token.lookup.rate Preprocessing
|
Token revoke, count | The time taken to look up a token. |
Dependent item | vault.metrics.token.revoke Preprocessing
|
Token revoke tree, count | Time taken to revoke a token tree. |
Dependent item | vault.metrics.token.revoke.tree Preprocessing
|
Token store, count | Time taken to store an updated token entry without writing to the secondary index. |
Dependent item | vault.metrics.token.store Preprocessing
|
Runtime allocated bytes | Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value. |
Dependent item | vault.metrics.runtime.alloc.bytes Preprocessing
|
Runtime freed objects | Number of freed objects. |
Dependent item | vault.metrics.runtime.free.count Preprocessing
|
Runtime heap objects | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.heap.objects Preprocessing
|
Runtime malloc count | Cumulative count of allocated heap objects. |
Dependent item | vault.metrics.runtime.malloc.count Preprocessing
|
Runtime num goroutines | Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.num_goroutines Preprocessing
|
Runtime sys bytes | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. |
Dependent item | vault.metrics.runtime.sys.bytes Preprocessing
|
Runtime GC pause, total | The total garbage collector pause time since Vault was last started. |
Dependent item | vault.metrics.total.gc.pause Preprocessing
|
Runtime GC runs, total | Total number of garbage collection runs since Vault was last started. |
Dependent item | vault.metrics.runtime.total.gc.runs Preprocessing
|
Token count, total | Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes. |
Dependent item | vault.metrics.token Preprocessing
|
Token count by auth, total | Total number of service tokens that were created by an auth method. |
Dependent item | vault.metrics.token.by_auth Preprocessing
|
Token count by policy, total | Total number of service tokens that have a policy attached. |
Dependent item | vault.metrics.token.by_policy Preprocessing
|
Token count by ttl, total | Number of service tokens, grouped by the TTL range they were assigned at creation. |
Dependent item | vault.metrics.token.by_ttl Preprocessing
|
Token creation, rate | Number of service or batch tokens created. |
Dependent item | vault.metrics.token.creation.rate Preprocessing
|
Secret kv entries | Number of entries in each key-value secret engine. |
Dependent item | vault.metrics.secret.kv.count Preprocessing
|
Token secret lease creation, rate | Counts the number of leases created by secret engines. |
Dependent item | vault.metrics.secret.lease.creation.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Vault: Vault server is sealed | https://www.vaultproject.io/docs/concepts/seal |
last(/HashiCorp Vault by HTTP/vault.health.sealed)=1 |
Average | |
HashiCorp Vault: Version has changed | Vault version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0 |
Info | Manual close: Yes |
HashiCorp Vault: Vault server is not responding | last(/HashiCorp Vault by HTTP/vault.health.check)=0 |
High | ||
HashiCorp Vault: Failed to get metrics | length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0 |
Warning | Depends on:
|
|
HashiCorp Vault: Current number of open files is too high | min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN} |
Warning | ||
HashiCorp Vault: has been restarted | Uptime is less than 10 minutes. |
last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m |
Info | Manual close: Yes |
HashiCorp Vault: High frequency of leadership setup failures | There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} |
Average | |
HashiCorp Vault: High frequency of leadership losses | There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} |
Average | |
HashiCorp Vault: High frequency of leadership step downs | There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} |
Average |
LLD rule Storage metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Storage backend metrics discovery. |
Dependent item | vault.storage.discovery |
Item prototypes for Storage metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage [{#STORAGE}] {#OPERATION} ops, rate | Number of a {#OPERATION} operation against the {#STORAGE} storage backend. |
Dependent item | vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}] Preprocessing
|
LLD rule Mountpoint metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Mountpoint metrics discovery | Mountpoint metrics discovery. |
Dependent item | vault.mountpoint.discovery |
Item prototypes for Mountpoint metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Rollback attempt [{#MOUNTPOINT}] ops, rate | Number of operations to perform a rollback operation on the given mount point. |
Dependent item | vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}] Preprocessing
|
Route rollback [{#MOUNTPOINT}] ops, rate | Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors. |
Dependent item | vault.metrics.route.rollback.rate[{#MOUNTPOINT}] Preprocessing
|
LLD rule WAL metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
WAL metrics discovery | Discovery for WAL metrics. |
Dependent item | vault.wal.discovery |
Item prototypes for WAL metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Delete WALs, count{#SINGLETON} | Time taken to delete a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing
|
GC deleted WAL{#SINGLETON} | Number of Write Ahead Logs (WAL) deleted during each garbage collection run. |
Dependent item | vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing
|
WALs on disk, total{#SINGLETON} | Total Number of Write Ahead Logs (WAL) on disk. |
Dependent item | vault.metrics.wal.gc.total[{#SINGLETON}] Preprocessing
|
Load WALs, count{#SINGLETON} | Time taken to load a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing
|
Persist WALs, count{#SINGLETON} | Time taken to persist a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing
|
Flush ready WAL, count{#SINGLETON} | Time taken to flush a ready Write Ahead Log (WAL) to storage. |
Dependent item | vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing
|
LLD rule Replication metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication metrics discovery | Discovery for replication metrics. |
Dependent item | vault.replication.discovery |
Item prototypes for Replication metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream WAL missing guard, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}] Preprocessing
|
Stream WAL guard found, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}] Preprocessing
|
Merkle commit index{#SINGLETON} | The last committed index in the Merkle Tree. |
Dependent item | vault.metrics.replication.merkle.commit_index[{#SINGLETON}] Preprocessing
|
Last WAL{#SINGLETON} | The index of the last WAL. |
Dependent item | vault.metrics.replication.wal.last_wal[{#SINGLETON}] Preprocessing
|
Last DR WAL{#SINGLETON} | The index of the last DR WAL. |
Dependent item | vault.metrics.replication.wal.last_dr_wal[{#SINGLETON}] Preprocessing
|
Last performance WAL{#SINGLETON} | The index of the last Performance WAL. |
Dependent item | vault.metrics.replication.wal.last_performance_wal[{#SINGLETON}] Preprocessing
|
Last remote WAL{#SINGLETON} | The index of the last remote WAL. |
Dependent item | vault.metrics.replication.fsm.last_remote_wal[{#SINGLETON}] Preprocessing
|
LLD rule Token metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Token metrics discovery | Tokens metrics discovery. |
Dependent item | vault.tokens.discovery |
Item prototypes for Token metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Token [{#TOKEN_NAME}] error | Token lookup error text. |
Dependent item | vault.token_via_accessor.error["{#ACCESSOR}"] Preprocessing
|
Token [{#TOKEN_NAME}] has TTL | The Token has TTL. |
Dependent item | vault.token_via_accessor.has_ttl["{#ACCESSOR}"] Preprocessing
|
Token [{#TOKEN_NAME}] TTL | The TTL period of the token. |
Dependent item | vault.token_via_accessor.ttl["{#ACCESSOR}"] Preprocessing
|
Trigger prototypes for Token metrics discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Vault: Token [{#TOKEN_NAME}] lookup error occurred | length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0 |
Warning | Depends on:
|
|
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT} |
Average | ||
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN} |
Warning | Depends on:
|
Feedback
Please report any issues with the template at
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums