晨曦's Blog

This is a window to the soul

检查

1
2
3
cd /var/lib/kubelet/pki/ && openssl x509 -in kubelet.crt -text -noout  |grep After

openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -dates

处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# ansible剧本
- hosts: "{{ host }}"
remote_user: root
gather_facts: no
tasks:
- name: 删除 /var/lib/kubelet/pki/ 目录下的所有文件
file:
path: /var/lib/kubelet/pki/
state: absent
- name: 重启 kubelet 容器
shell: docker restart kubelet

# ansible执行
ansible-playbook -i hosts -e "host=region" upgrade_certificate.yaml

# ansible检查
ansible -i ./hosts all -m shell -a "openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -dates"

后续

1
2
3
kubectl rollout restart ds calico-node -n kube-system
或者
kubectl delete po -l k8s-app=calico-node -n kube-system

Consul 安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# docker-compose.yaml
version: '3.6'
services:
consul:
image: swr.cn-south-1.myhuaweicloud.com/starsl.cn/consul:latest
hostname: consul
container_name: consul
restart: always
ports:
- "8500:8500"
volumes:
- /opt/consul/data:/consul/data
- /opt/consul/config:/consul/config
- /usr/share/zoneinfo/PRC:/etc/localtime
command: "agent"
networks:
- TenSunS
networks:
TenSunS:
name: TenSunS
driver: bridge
ipam:
driver: default
# TenSunS 为consul 管理界面

WEB UI:http://10.168.140.45:8500/ui/dc1/services 可以直接访问 8500 端口

配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
log_level = "error"
data_dir = "/consul/data"
client_addr = "0.0.0.0"
ui_config{
enabled = true
}
ports = {
grpc = -1
https = -1
dns = -1
grpc_tls = -1
serf_wan = -1
}
peering {
enabled = false
}
connect {
enabled = false
}
server = true
bootstrap_expect=1
acl = {
enabled = true
default_policy = "deny"
enable_token_persistence = true
tokens {
initial_management = "9602e8a5-c754-43f0-b0ce-861b0df1b5df"
agent = "9602e8a5-c754-43f0-b0ce-861b0df1b5df"
}
}

默认配置:

https://developer.hashicorp.com/consul/docs/agent/config/config-files

数据注册

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 结构类似于
ConsulData{
ID: sid,
Name: "node_exporter",
Address: ip,
Port: 9100,
Tags: []string{vendor},
Meta: map[string]string{
"vendor": vendor,
"region": region,
"name": name,
"projectChineseName": projectFullName,
"projectShortName": projectShortName,
"instance": exportAddress,
"env": env,
},
Check: map[string]string{
"tcp": exportAddress,
"interval": "60s",
},
}

服务注册文档:

https://developer.hashicorp.com/consul/api-docs/agent/service#register-service

VM 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
scrape_configs:
# Scrape OpenStack instances
- job_name: "Openstack"
consul_sd_configs:
- server: "10.168.140.45:8500"
datacenter: 'dc1'
token: '9602e8a5-c754-43f0-b0ce-861b0df1b5df'
relabel_configs:
- source_labels: [__meta_consul_service]
regex: "consul"
action: drop
- regex: __meta_consul_service_metadata_(.+)
replacement: ${1}
action: labelmap

参考:

https://cloud.tencent.com/developer/article/1611091

安装

1
2
3
4
wget -c https://download.flashcat.cloud/categraf-v0.3.77-linux-amd64.tar.gz
tar -zxvf categraf-v0.3.77-linux-amd64.tar.gz
mkdir -pv /opt/categraf
cp -r ./categraf-v0.3.77-linux-amd64/* /opt/categraf/

配置

N9E: 这里是配置像哪个 VM 推送数据 参考

1
2
3
[[Pushgw.Writers]]
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
Url = "http://10.168.140.45:8428/api/v1/write"

categraf:

1
2
3
4
5
6
[[writers]]
url = "http://10.168.137.144:17000/prometheus/v1/write"

[heartbeat]
enable = true
url = "http://10.168.137.144:17000/v1/n9e/heartbeat"

inputs:

1
2
3
4
[[instances]]
targets = [
"http://10.113.75.134:5000/v1/datasets/8a9302de-ded1-493c-8a12-e8acf3d80772/files?apikey=ka-admin123"
]

启动

linux systemd 托管不太好控制单个模块采集

1
2
nohup ./categraf --inputs http_response &
pkill categraf

常用命令

1
2
3
4
5
./categraf --test --inputs http_response
./categraf --inputs http_response

pkill n9e
nohup ./n9e &> n9e.log &

参考:

https://flashcat.cloud/docs/content/flashcat-monitor/categraf/2-installation/

安装单机版本

1
2
3
4
5
6
# 下载victoriametrics
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.63.0/victoria-metrics-amd64-v1.63.0.tar.gz
tar -zxv -f victoria-metrics-amd64-v1.63.0.tar.gz
mkdir -pv /usr/local/victoriametrics/{bin,conf,data}
mv victoria-metrics-prod /usr/local/victoriametrics/bin/
mkdir -pv /run/victoriametrics

配置启动参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# vim /usr/local/victoriametrics/conf/victoriametrics

VICTORIAMETRICS_OPT=-http.connTimeout=5m \
-maxConcurrentInserts=20000 \
-maxInsertRequestSize=100MB \
-maxLabelsPerTimeseries=20000 \
-insert.maxQueueDuration=5m \
-dedup.minScrapeInterval=60s \
-retentionPeriod=180d \
-search.maxQueryDuration=10m \
-search.maxQueryLen=30MB \
-search.maxQueueDuration=60s \
-search.maxConcurrentRequests=32 \
-storageDataPath=/usr/local/victoriametrics/data \
-promscrape.config=/usr/local/victoriametrics/conf/prometheus.yml \
-vmui.defaultTimezone="Asia/Shanghai" \

开机启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# vim /usr/lib/systemd/system/victoriametrics.service

[Unit]
Description=victoriametrics
After=network.target

[Service]
Type=simple
LimitNOFILE=1024000
LimitNPROC=1024000
LimitCORE=infinity
LimitMEMLOCK=infinity
EnvironmentFile=-/usr/local/victoriametrics/conf/victoriametrics
PIDFile=/run/victoriametrics/victoriametrics.pid
ExecStart=/usr/local/victoriametrics/bin/victoria-metrics-prod $VICTORIAMETRICS_OPT
ExecStop=/bin/kill -s SIGTERM $MAINPID
Restart=on-failure
RestartSec=1
KillMode=process

[Install]
WantedBy=multi-user.target

# systemctl daemon-reload
# systemctl start victoriametrics
# systemctl status victoriametrics

vmalert

告警部分可以交给夜莺即可

安装

1
2
3
4
wget -c https://ghproxy.com/https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
tar -zxvf blackbox_exporter-0.25.0.linux-amd64.tar.gz
mkdir /opt/blackbox_exporter
cp blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter /opt/blackbox_exporter/blackbox_exporter

config

vim /opt/blackbox_exporter/config.yml

1
2
3
4
5
6
7
8
9
modules:
http_2xx:
prober: http
http:
method: GET
http_post_2xx:
prober: http
http:
method: POST

service

vim /usr/lib/systemd/system/blackbox_exporter.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[Unit]
Description=Blackbox Exporter
Documentation=https://github.com/prometheus/blackbox_exporter
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/opt/blackbox_exporter/blackbox_exporter \
--config.file=/opt/blackbox_exporter/config.yml \
--web.listen-address=:9116
Restart=always

[Install]
WantedBy=multi-user.target

启动

1
2
3
systemctl daemon-reload
systemctl start blackbox_exporter
systemctl enable blackbox_exporter

测试

1
curl -iv "http://127.0.0.1:9116/probe?module=http_2xx&target=baidu.com"

Prometheus 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- http://10.113.75.134:5000/v1/datasets/8a9302de-ded1-493c-8a12-e8acf3d80772/files?apikey=ka-admin123
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
regex: '(http|https)://([^/]+)/.*'
replacement: '${2}'
target_label: site
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9116

问题排查

1
journalctl -xe -u blackbox_exporter

参考:

https://flashcat.cloud/docs/content/flashcat-partner/prometheus/exporter/commonly/blackbox-exporter/

错误

1
No route to host

原因

目标服务器防火墙导致

解决

1
2
3
4
5
systemctl status firewalld

systemctl stop firewalld

systemctl disable firewalld

1
2
3
4
5
6
RUN yum install -y kde-l10n-Chinese && \
yum install -y glibc-common && \
localedef -c -f UTF-8 -i zh_CN zh_CN.utf8

# ENV LANG zh_CN.UTF-8
ENV LC_ALL zh_CN.UTF-8

错误

1
2
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
14: curl#6 - "Could not resolve host: mirrorlist.centos.org; Unknown error"

解决

1
2
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.huaweicloud.com/repository/conf/CentOS-7-anon.repo

替换成华为源

参考:

https://developer.aliyun.com/mirror/centos

https://mirrors.huaweicloud.com/mirrorDetail/5ea14ecab05943f36fb75ee5

ansible

1
2
3
4
5
6
7
8
9
10
11
12
13
# 1、安装yumdownloader
yum install yum-utils -y

# 2、获取ansible安装包及依赖
mkdir /tmp/ansible
yumdownloader --resolve --destdir /tmp/ansible ansible
tar zcf ansible.tar.gz /tmp/ansible

# 3、在离线服务器上解压压缩包
tar zxf /tmp/ansible.tar.gz

# 4、执行安装
rpm -ivh ansible-2.9.27-1.el7.noarch.rpm

参考:

https://github.com/wgrice/notebook/blob/master/devops/ansible/ansible%E7%A6%BB%E7%BA%BF%E5%AE%89%E8%A3%85.md

Mac

https://github.com/ktver/NgX 注:会修改底层 Wifi 中的代理设置

Win

https://github.com/2dust/v2rayN

https://github.com/KevinSHIT/NaiveSharp

参考

https://github.com/klzgrad/naiveproxy/issues/158

NgX

  1. NgX 不要给超管,会起一个 socks5 的代理在 10800 端口上
  2. 可以手动在电脑 Wifi - 详细信息 - 代理 中设置 SOCKS 代理,用以利用 NgX 的代理规则
  3. 可以设置不走代理域,如:* .jakehu.me,192.168. .*

0%