Commit 4ad1a2ec authored by JooHan Hong's avatar JooHan Hong

prometheus, alert add

parent 938f1ee4
Pipeline #5196 passed with stages
in 47 seconds
......@@ -96,6 +96,90 @@ placement:
constraints: [node.hostname == TB2-DOCKER]
```
- **prometheus**의 구성내역
```bash
# cat prometheus.yml
global:
scrape_interval: 15s
external_labels:
monitor: 'hongsnet'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['127.0.0.1:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['172.24.0.245:9100','172.24.0.151:9100','172.16.0.158:9100','172.16.0.251:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['172.24.0.245:8080','172.24.0.151:8080','172.16.0.158:8080','172.16.0.251:8080']
rule_files:
- 'alert.rules'
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
```
또한 **alert.rules** 내역은 다음과 같다.
```bash
# cat alert.rules
groups:
- name: host
rules:
- alert: high_cpu_load
expr: node_load1 > 1.5
for: 30s
labels:
severity: warning
annotations:
summary: "Server under high load"
description: "Docker host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
- alert: high_memory_load
expr: (sum(node_memory_MemTotal_bytes) - sum(node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) ) / sum(node_memory_MemTotal_bytes) * 100 > 85
for: 30s
labels:
severity: warning
annotations:
summary: "Server memory is almost full"
description: "Docker host memory usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
- alert: high_storage_load
expr: (node_filesystem_size_bytes{fstype="aufs"} - node_filesystem_free_bytes{fstype="aufs"}) / node_filesystem_size_bytes{fstype="aufs"} * 100 > 85
for: 30s
labels:
severity: warning
annotations:
summary: "Server storage is almost full"
description: "Docker host storage usage is {{ humanize $value}}%. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
- name: containers
rules:
- alert: High Pod Memory
expr: sum(container_memory_usage_bytes) > 1
for: 30s
labels:
severity: critical
annotations:
summary: "memory usage test"
description: "test container is down for more than 30 seconds."
- alert: ContainerKilled
expr: time() - container_last_seen > 60
for: 1m
labels:
severity: warning
annotations:
summary: "Container killed (instance {{ $labels.instance }})"
description: "A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
```
이제 **monitor** 서비스를 다음과 같이 실행한다.
```bash
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment