Commit 18db5982 authored by nuxer's avatar nuxer

MHA 검증내역 추가:

parent 464a0bfd
[![logo](https://www.hongsnet.net/images/logo.gif)](https://www.hongsnet.net)
# MHA Fail-Over 및 Fail-Back 검증
> 구성중인 MHA 솔루션이 제대로 동작하는지를 검증한다. 또한 아래의 과정은 다음과 같이 Network 구성이 변동된 내역을 다룬다.
- 서비스 IP : WEB 서버와의 통신
- `HeartBeat` : MHA 모니터링 및 Replication 전용 네트워크 추가
# MHA 시스템구성 현황
| NO | 구분 | IP주소 | HeartBeat | Role | 비고 |
| ------ | ------ | ------ | ------ | ------ | ------ |
| 1 | **MASTER** | 180.180.180.226 | 220.220.220.228 | **Active** | |
| 2 | **MASTER** | 180.180.180.23 | 220.220.220.229 | **Active Backup** | |
| 3 | SLAVE | 180.180.180.242 | 220.220.220.230 | Slave01 | |
| 4 | SLAVE | 180.180.180.243 | 220.220.220.227 | MHA | |
| 5 | SLAVE | 180.180.180.237 | N/A | Service VIP | |
# MHA Manager 상태 확인
```bash
# masterha_check_status --conf=/MHA/conf/mha.cnf
mha (pid:24807) is running(0:PING_OK), master:220.220.220.228
```
또한 **/MHA/logs** 디렉토리 안에는 다음과 같은 파일들이 위치한다.
```bash
$ pwd
/MHA/logs
$ ls
manager.log mha.log mha.master_status.health
$ cat mha.master_status.health
24807 0:PING_OK master:220.220.220.228
```
# MHA Fail-Over 검증
**먼저 서비스 VIP 상태를 확인**한다.
```bash
- Master-Active
root@MHA-MASTER-ACTIVE:~# ip addr |grep 180.180.180.240
inet 180.180.180.240/24 brd 180.180.180.255 scope global secondary eth1:0
- Master-Backup
root@MHA-MASTER-BACKUP:~# ip addr |grep 180.180.180.240
root@MHA-MASTER-BACKUP:~#
```
그럼, 현재 MASTER DBMS의 장애를 유발하기 위해 mariadb 데몬을 강제로 중지한다.
```bash
root@MHA-MASTER-ACTIVE:~# systemctl stop mariadb
root@MHA-MASTER-ACTIVE:~#
```
일단 세부적인 Log를 분석하기 전에 VIP의 상태를 확인해 보자.
```bash
- MASTER-Active
root@MHA-MASTER-ACTIVE:~# ip addr |grep 180.180.180.240
root@MHA-MASTER-ACTIVE:~#
- MASTER-Backup
root@MHA-MASTER-BACKUP:~# ip addr |grep 180.180.180.240
inet 180.180.180.240/24 brd 180.180.180.255 scope global secondary eth1:0
```
**여기까지 확인된 결과로는 서비스 VIP 이관은 정상적으로 수행**되었다. 그럼 다음과 같이 MHA log를 참조하여, 내역분석을 진행한다.
# MHA 장애처리 5 단계
1. **Configure Check**
- 1.1 : Check Connect to server
- 1.2 : Find Dead Server and Alive Server
2. **Dead Master Sutdown**
- 2.1 : Stop Slave I/O Thead
- 2.2 : Run master_ip_failover and shutdown script
3. **Master Recovery**
- 3.1 : Getting Latest Slaves
- 3.2 : Saving Dead Master's binlog file
- 3.3 : Determining New Master
- 3.4 : New Master Different Log Generation
- 3.5 : New Master Log Apply
- 3.6 : Run master_ip_failover script
4. **Slaves Recovery**
- 4.1 : Starting Parallel Slave Different Log Generation
- 4.2 : Starting Parallel Slave Log Apply
5. **New Master Cleanup**
- 5.1 : Resetting Slave info on the New Master
- 5.2 : Clearing Slave info
# MHA Fail-Back 로그 분석
```bash
mhauser@MHA:~/logs$ ls
manager.log mha.failover.complete mha.log
```
**!참고** : **mha.failover.complete 파일이 존재**할 경우 MHA에서 FailOver가 정상적으로 성공했다는 것을 의미한다.
- **DBMS Fault Log**
```bash
Wed Dec 30 01:47:07 2020 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Wed Dec 30 01:47:07 2020 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin
Wed Dec 30 01:47:07 2020 - [info] HealthCheck: SSH to 220.220.220.228 is reachable.
Wed Dec 30 01:47:10 2020 - [warning] Got error on MySQL connect: 2002 (Can't connect to MySQL server on '220.220.220.228' (115))
Wed Dec 30 01:47:10 2020 - [warning] Connection failed 2 time(s)..
Wed Dec 30 01:47:13 2020 - [warning] Got error on MySQL connect: 2002 (Can't connect to MySQL server on '220.220.220.228' (115))
Wed Dec 30 01:47:13 2020 - [warning] Connection failed 3 time(s)..
Wed Dec 30 01:47:16 2020 - [warning] Got error on MySQL connect: 2002 (Can't connect to MySQL server on '220.220.220.228' (115))
Wed Dec 30 01:47:16 2020 - [warning] Connection failed 4 time(s)..
Wed Dec 30 01:47:16 2020 - [warning] Master is not reachable from health checker!
Wed Dec 30 01:47:16 2020 - [warning] Master 220.220.220.228(220.220.220.228:3306) is not reachable!
Wed Dec 30 01:47:16 2020 - [warning] SSH is reachable.
Wed Dec 30 01:47:16 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /MHA/conf/mha.cnf again, and trying to connect to all servers to check server status..
Wed Dec 30 01:47:16 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 30 01:47:16 2020 - [info] Reading application default configuration from /MHA/conf/mha.cnf..
Wed Dec 30 01:47:16 2020 - [info] Reading server configuration from /MHA/conf/mha.cnf..
Wed Dec 30 01:47:17 2020 - [info] GTID failover mode = 0
Wed Dec 30 01:47:17 2020 - [info] Dead Servers:
Wed Dec 30 01:47:17 2020 - [info] 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:17 2020 - [info] Alive Servers:
Wed Dec 30 01:47:17 2020 - [info] 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 01:47:17 2020 - [info] 220.220.220.230(220.220.220.230:3306)
Wed Dec 30 01:47:17 2020 - [info] Alive Slaves:
Wed Dec 30 01:47:17 2020 - [info] 220.220.220.229(220.220.220.229:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:17 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:17 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 30 01:47:17 2020 - [info] 220.220.220.230(220.220.220.230:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:17 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:17 2020 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 30 01:47:17 2020 - [info] Checking slave configurations..
Wed Dec 30 01:47:17 2020 - [warning] relay_log_purge=0 is not set on slave 220.220.220.229(220.220.220.229:3306).
Wed Dec 30 01:47:17 2020 - [warning] relay_log_purge=0 is not set on slave 220.220.220.230(220.220.220.230:3306).
Wed Dec 30 01:47:17 2020 - [info] Checking replication filtering settings..
Wed Dec 30 01:47:17 2020 - [info] Replication filtering check ok.
Wed Dec 30 01:47:17 2020 - [info] Master is down!
Wed Dec 30 01:47:17 2020 - [info] Terminating monitoring script.
Wed Dec 30 01:47:17 2020 - [info] Got exit code 20 (Master dead).
Wed Dec 30 01:47:17 2020 - [info] MHA::MasterFailover version 0.57.
Wed Dec 30 01:47:17 2020 - [info] Starting master failover.
Wed Dec 30 01:47:17 2020 - [info]
```
* [ **Phase 1** ] : Configure Check
```bash
Wed Dec 30 01:47:17 2020 - [info] * Phase 1: Configuration Check Phase..
Wed Dec 30 01:47:17 2020 - [info]
Wed Dec 30 01:47:18 2020 - [info] GTID failover mode = 0
Wed Dec 30 01:47:18 2020 - [info] Dead Servers:
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:18 2020 - [info] Checking master reachability via MySQL(double check)...
Wed Dec 30 01:47:18 2020 - [info] ok.
Wed Dec 30 01:47:18 2020 - [info] Alive Servers:
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.230(220.220.220.230:3306)
Wed Dec 30 01:47:18 2020 - [info] Alive Slaves:
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.229(220.220.220.229:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:18 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:18 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.230(220.220.220.230:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:18 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:18 2020 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 30 01:47:18 2020 - [info] Starting Non-GTID based failover.
Wed Dec 30 01:47:18 2020 - [info]
Wed Dec 30 01:47:18 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Dec 30 01:47:18 2020 - [info]
```
* [ **Phase 2** ] : Dead Master Sutdown
```bash
Wed Dec 30 01:47:18 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Dec 30 01:47:18 2020 - [info]
Wed Dec 30 01:47:18 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Dec 30 01:47:18 2020 - [info] Executing master IP deactivation script:
Wed Dec 30 01:47:18 2020 - [info] /usr/local/bin/master_ip_failover --orig_master_host=220.220.220.228 --orig_master_ip=220.220.220.228 --orig_master_port=3306 --command=stopssh --ssh_user=mhauser --orig_master_ssh_port=22222
Unknown option: orig_master_ssh_port
IN SCRIPT TEST====sudo /sbin/ifconfig eth1:0 down==sudo /sbin/ifconfig eth1:0 180.180.180.240 netmask 255.255.255.0 broadcast 180.180.180.255 up===
stopsshDisabling the VIP on old master: 220.220.220.228
Wed Dec 30 01:47:18 2020 - [info] done.
Wed Dec 30 01:47:18 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Dec 30 01:47:18 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Dec 30 01:47:18 2020 - [info]
```
* [ **Phase 3** ] : Master Recovery
```bash
Wed Dec 30 01:47:18 2020 - [info] * Phase 3: Master Recovery Phase..
Wed Dec 30 01:47:18 2020 - [info]
Wed Dec 30 01:47:18 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Dec 30 01:47:18 2020 - [info]
Wed Dec 30 01:47:18 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000004:12079
Wed Dec 30 01:47:18 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.229(220.220.220.229:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:18 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:18 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.230(220.220.220.230:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:18 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:18 2020 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 30 01:47:18 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000004:12079
Wed Dec 30 01:47:18 2020 - [info] Oldest slaves:
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.229(220.220.220.229:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:18 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:18 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 30 01:47:18 2020 - [info] 220.220.220.230(220.220.220.230:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:18 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:18 2020 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 30 01:47:18 2020 - [info]
Wed Dec 30 01:47:18 2020 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Dec 30 01:47:18 2020 - [info]
Wed Dec 30 01:47:18 2020 - [info] Fetching dead master's binary logs..
Wed Dec 30 01:47:18 2020 - [info] Executing command on the dead master 220.220.220.228(220.220.220.228:3306): save_binary_logs --command=save --start_file=mysql-bin.000004 --start_pos=12079 --binlog_dir=/var/lib/mysql --output_file=/var/tmp/saved_master_binlog_from_220.220.220.228_3306_20201230014717.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000004 pos 12079 to mysql-bin.000004 EOF into /var/tmp/saved_master_binlog_from_220.220.220.228_3306_20201230014717.binlog ..
Failed to save binary log: Redundant argument in sprintf at /usr/local/share/perl/5.24.1/MHA/NodeUtil.pm line 184.
Wed Dec 30 01:47:19 2020 - [error][/usr/local/share/perl/5.24.1/MHA/MasterFailover.pm, ln760] Failed to save binary log events from the orig master. Maybe disks on binary logs are not accessible or binary log itself is corrupt?
Wed Dec 30 01:47:19 2020 - [info]
Wed Dec 30 01:47:19 2020 - [info] * Phase 3.3: Determining New Master Phase..
Wed Dec 30 01:47:19 2020 - [info]
Wed Dec 30 01:47:19 2020 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Dec 30 01:47:19 2020 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Dec 30 01:47:19 2020 - [info] Searching new master from slaves..
Wed Dec 30 01:47:19 2020 - [info] Candidate masters from the configuration file:
Wed Dec 30 01:47:19 2020 - [info] 220.220.220.229(220.220.220.229:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:19 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 30 01:47:19 2020 - [info] Non-candidate masters:
Wed Dec 30 01:47:19 2020 - [info] 220.220.220.230(220.220.220.230:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 01:47:19 2020 - [info] Replicating from 220.220.220.228(220.220.220.228:3306)
Wed Dec 30 01:47:19 2020 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 30 01:47:19 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Dec 30 01:47:19 2020 - [info] New master is 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 01:47:19 2020 - [info] Starting master failover..
Wed Dec 30 01:47:19 2020 - [info]
From:
220.220.220.228(220.220.220.228:3306) (current master)
+--220.220.220.229(220.220.220.229:3306)
+--220.220.220.230(220.220.220.230:3306)
To:
220.220.220.229(220.220.220.229:3306) (new master)
+--220.220.220.230(220.220.220.230:3306)
Wed Dec 30 01:47:19 2020 - [info]
Wed Dec 30 01:47:19 2020 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Dec 30 01:47:19 2020 - [info]
Wed Dec 30 01:47:19 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Dec 30 01:47:19 2020 - [info]
Wed Dec 30 01:47:19 2020 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Dec 30 01:47:19 2020 - [info]
Wed Dec 30 01:47:19 2020 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Dec 30 01:47:19 2020 - [info] Starting recovery on 220.220.220.229(220.220.220.229:3306)..
Wed Dec 30 01:47:19 2020 - [info] This server has all relay logs. Waiting all logs to be applied..
Wed Dec 30 01:47:19 2020 - [info] done.
Wed Dec 30 01:47:19 2020 - [info] All relay logs were successfully applied.
Wed Dec 30 01:47:19 2020 - [info] Getting new master's binlog name and position..
Wed Dec 30 01:47:19 2020 - [info] mysql-bin.000002:328
Wed Dec 30 01:47:19 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='220.220.220.229', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=328, MASTER_USER='mhauser', MASTER_PASSWORD='xxx';
Wed Dec 30 01:47:19 2020 - [info] Executing master IP activate script:
Wed Dec 30 01:47:19 2020 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=mhauser --orig_master_host=220.220.220.228 --orig_master_ip=220.220.220.228 --orig_master_port=3306 --new_master_host=220.220.220.229 --new_master_ip=220.220.220.229 --new_master_port=3306 --new_master_user='mhauser' --orig_master_ssh_port=22222 --new_master_ssh_port=22222 --new_master_password=xxx
Unknown option: orig_master_ssh_port
Unknown option: new_master_ssh_port
IN SCRIPT TEST====sudo /sbin/ifconfig eth1:0 down==sudo /sbin/ifconfig eth1:0 180.180.180.240 netmask 255.255.255.0 broadcast 180.180.180.255 up===
Enabling the VIP - 180.180.180.240 on the new master - 220.220.220.229
Wed Dec 30 01:47:23 2020 - [info] OK.
Wed Dec 30 01:47:23 2020 - [info] Setting read_only=0 on 220.220.220.229(220.220.220.229:3306)..
Wed Dec 30 01:47:23 2020 - [info] ok.
Wed Dec 30 01:47:23 2020 - [info] ** Finished master recovery successfully.
Wed Dec 30 01:47:23 2020 - [info] * Phase 3: Master Recovery Phase completed.
Wed Dec 30 01:47:23 2020 - [info]
```
* [ **Phase 4** ] : Slaves Recovery
```bash
Wed Dec 30 01:47:23 2020 - [info] * Phase 4: Slaves Recovery Phase..
Wed Dec 30 01:47:23 2020 - [info]
Wed Dec 30 01:47:23 2020 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Dec 30 01:47:23 2020 - [info]
Wed Dec 30 01:47:23 2020 - [info] -- Slave diff file generation on host 220.220.220.230(220.220.220.230:3306) started, pid: 28584. Check tmp log /MHA/logs/220.220.220.230_3306_20201230014717.log if it takes time..
Wed Dec 30 01:47:24 2020 - [info]
Wed Dec 30 01:47:24 2020 - [info] Log messages from 220.220.220.230 ...
Wed Dec 30 01:47:24 2020 - [info]
Wed Dec 30 01:47:23 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Dec 30 01:47:24 2020 - [info] End of log messages from 220.220.220.230.
Wed Dec 30 01:47:24 2020 - [info] -- 220.220.220.230(220.220.220.230:3306) has the latest relay log events.
Wed Dec 30 01:47:24 2020 - [info] Generating relay diff files from the latest slave succeeded.
Wed Dec 30 01:47:24 2020 - [info]
Wed Dec 30 01:47:24 2020 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Dec 30 01:47:24 2020 - [info]
Wed Dec 30 01:47:24 2020 - [info] -- Slave recovery on host 220.220.220.230(220.220.220.230:3306) started, pid: 28586. Check tmp log /MHA/logs/220.220.220.230_3306_20201230014717.log if it takes time..
Wed Dec 30 01:47:25 2020 - [info]
Wed Dec 30 01:47:25 2020 - [info] Log messages from 220.220.220.230 ...
Wed Dec 30 01:47:25 2020 - [info]
Wed Dec 30 01:47:24 2020 - [info] Starting recovery on 220.220.220.230(220.220.220.230:3306)..
Wed Dec 30 01:47:24 2020 - [info] This server has all relay logs. Waiting all logs to be applied..
Wed Dec 30 01:47:24 2020 - [info] done.
Wed Dec 30 01:47:24 2020 - [info] All relay logs were successfully applied.
Wed Dec 30 01:47:24 2020 - [info] Resetting slave 220.220.220.230(220.220.220.230:3306) and starting replication from the new master 220.220.220.229(220.220.220.229:3306)..
Wed Dec 30 01:47:25 2020 - [info] Executed CHANGE MASTER.
Wed Dec 30 01:47:25 2020 - [info] Slave started.
Wed Dec 30 01:47:25 2020 - [info] End of log messages from 220.220.220.230.
Wed Dec 30 01:47:25 2020 - [info] -- Slave recovery on host 220.220.220.230(220.220.220.230:3306) succeeded.
Wed Dec 30 01:47:25 2020 - [info] All new slave servers recovered successfully.
Wed Dec 30 01:47:25 2020 - [info]
```
* [ **Phase 5** ] : New Master Cleanup
```bash
Wed Dec 30 01:47:25 2020 - [info] * Phase 5: New master cleanup phase..
Wed Dec 30 01:47:25 2020 - [info]
Wed Dec 30 01:47:25 2020 - [info] Resetting slave info on the new master..
Wed Dec 30 01:47:25 2020 - [info] 220.220.220.229: Resetting slave info succeeded.
Wed Dec 30 01:47:25 2020 - [info] Master failover to 220.220.220.229(220.220.220.229:3306) completed successfully.
Wed Dec 30 01:47:25 2020 - [info]
```
* [ **summary** ] : Failover Reporting
```bash
mha: MySQL Master failover 220.220.220.228(220.220.220.228:3306) to 220.220.220.229(220.220.220.229:3306) succeeded
Master 220.220.220.228(220.220.220.228:3306) is down!
Check MHA Manager logs at MHA:/MHA/logs/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 220.220.220.228(220.220.220.228:3306)
The latest slave 220.220.220.229(220.220.220.229:3306) has all relay logs for recovery.
Selected 220.220.220.229(220.220.220.229:3306) as a new master.
220.220.220.229(220.220.220.229:3306): OK: Applying all logs succeeded.
220.220.220.229(220.220.220.229:3306): OK: Activated master IP address.
220.220.220.230(220.220.220.230:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
220.220.220.230(220.220.220.230:3306): OK: Applying all logs succeeded. Slave started, replicating from 220.220.220.229(220.220.220.229:3306)
220.220.220.229(220.220.220.229:3306): Resetting slave info succeeded.
Master failover to 220.220.220.229(220.220.220.229:3306) completed successfully.
```
여기까지 완료되면, MASTER-Active 서버의 장애에 대한 FailOver가 완료된 과정(위의 내역은 220.220.220.228 -> 220.220.220.229으로의 이관완료) 이다. 그리고 SLAVE01번 서버의 Replication 상태를 확인해보면, FailOver된(220.220.220.229) 서버로 MASTER를 바라보고 있다.
```bash
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 220.220.220.229
Master_User: mhauser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 328
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 328
Relay_Log_Space: 865
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 229
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Slave_DDL_Groups: 16
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 294
1 row in set (0.002 sec)
```
# MHA Fail-Back 검증
이젠 장애가 발생한 MASTER-Active 서버가 복구되었다고 가정한다. 그럼 원래의 MASTER-Active로 FailBack을 해야하는데, 다음의 절차를 통해 가능하다.
`!중요` : Fail-Back 순서 및 주의사항
- /MHA/logs/`mha.failover.complete` 파일을 반드시 제거해야한다.
- 장애가 발생한 MASTER-Active를 현재 MASTER 시스템과 동기화하여, `최신의 데이터를 유지하는 작업을 진행`한다.
- `masterha_master_switch` 명령을 통해 원래의 MASTER-Active로 FailBack을 수행한다.
- **MHA Manager 데몬을 시작**하여, 모니터링을 시작한다.
* [ **STEP 1** ] mha.failover.complete 파일을 제거한다.
```bash
mhauser@MHA:~/logs$ rm -rf mha.failover.complete
```
* [ **STEP 2** ] 장애가 발생한 MASTER-Active 서버에 현재 MASTER의 데이터를 최신으로 동기화 한다.
manager.log(위의 로그참조)에 보면, FailOver되면서 Latest의 CHANAGE MASTER~ 쿼리문을 기록한다. 이 쿼리를 시스템복구가 완료된 MASTER-Active 서버에 수행해서 데이터를 동기화한다.
> Wed Dec 30 01:47:19 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='220.220.220.229', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=328, MASTER_USER='mhauser', MASTER_PASSWORD='xxx';
```bash
root@MHA-MASTER-ACTIVE:~# systemctl start mariadb
root@MHA-MASTER-ACTIVE:~# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 9
Server version: 10.3.27-MariaDB-1:10.3.27+maria~stretch-log mariadb.org binary distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='220.220.220.229', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=328, MASTER_USER='mhauser', MASTER_PASSWORD='ghdwngkselql';
Query OK, 0 rows affected (0.101 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)
```
그럼 다음과 같이 현재 MASTER 시스템과의 리플리케이션을 수행한다.
```bash
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 220.220.220.229
Master_User: mhauser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 328
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 328
Relay_Log_Space: 865
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 229
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 0
1 row in set (0.001 sec)
```
이제 최종적으로 현재 MASTER와 동기화가 완료된 후 다음과 같이 `수동 FailBack을 진행`한다.
- **Phase 1: Configuration Check Phase**
```bash
mhauser@MHA:~/scripts$ masterha_master_switch --master_state=alive --conf=/MHA/conf/mha.cnf
Wed Dec 30 02:18:15 2020 - [info] MHA::MasterRotate version 0.57.
Wed Dec 30 02:18:15 2020 - [info] Starting online master switch..
Wed Dec 30 02:18:15 2020 - [info]
Wed Dec 30 02:18:15 2020 - [info] * Phase 1: Configuration Check Phase..
Wed Dec 30 02:18:15 2020 - [info]
Wed Dec 30 02:18:15 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 30 02:18:15 2020 - [info] Reading application default configuration from /MHA/conf/mha.cnf..
Wed Dec 30 02:18:15 2020 - [info] Reading server configuration from /MHA/conf/mha.cnf..
Wed Dec 30 02:18:16 2020 - [info] GTID failover mode = 0
Wed Dec 30 02:18:16 2020 - [info] Current Alive Master: 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 02:18:16 2020 - [info] Alive Slaves:
Wed Dec 30 02:18:16 2020 - [info] 220.220.220.228(220.220.220.228:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 02:18:16 2020 - [info] Replicating from 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 02:18:16 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 30 02:18:16 2020 - [info] 220.220.220.230(220.220.220.230:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 02:18:16 2020 - [info] Replicating from 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 02:18:16 2020 - [info] Not candidate for the new Master (no_master is set)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 220.220.220.229(220.220.220.229:3306)? (YES/no): YES
Wed Dec 30 02:18:19 2020 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Wed Dec 30 02:18:19 2020 - [info] ok.
Wed Dec 30 02:18:19 2020 - [info] Checking MHA is not monitoring or doing failover..
Wed Dec 30 02:18:19 2020 - [info] Checking replication health on 220.220.220.228..
Wed Dec 30 02:18:19 2020 - [info] ok.
Wed Dec 30 02:18:19 2020 - [info] Checking replication health on 220.220.220.230..
Wed Dec 30 02:18:19 2020 - [info] ok.
Wed Dec 30 02:18:19 2020 - [info] Searching new master from slaves..
Wed Dec 30 02:18:19 2020 - [info] Candidate masters from the configuration file:
Wed Dec 30 02:18:19 2020 - [info] 220.220.220.228(220.220.220.228:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 02:18:19 2020 - [info] Replicating from 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 02:18:19 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 30 02:18:19 2020 - [info] 220.220.220.229(220.220.220.229:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log log-bin:enabled
Wed Dec 30 02:18:19 2020 - [info] Non-candidate masters:
Wed Dec 30 02:18:19 2020 - [info] 220.220.220.230(220.220.220.230:3306) Version=10.3.27-MariaDB-1:10.3.27+maria~stretch-log (oldest major version between slaves) log-bin:enabled
Wed Dec 30 02:18:19 2020 - [info] Replicating from 220.220.220.229(220.220.220.229:3306)
Wed Dec 30 02:18:19 2020 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 30 02:18:19 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Dec 30 02:18:19 2020 - [info]
From:
220.220.220.229(220.220.220.229:3306) (current master)
+--220.220.220.228(220.220.220.228:3306)
+--220.220.220.230(220.220.220.230:3306)
To:
220.220.220.228(220.220.220.228:3306) (new master)
+--220.220.220.230(220.220.220.230:3306)
Starting master switch from 220.220.220.229(220.220.220.229:3306) to 220.220.220.228(220.220.220.228:3306)? (yes/NO): yes
Wed Dec 30 02:18:21 2020 - [info] Checking whether 220.220.220.228(220.220.220.228:3306) is ok for the new master..
Wed Dec 30 02:18:21 2020 - [info] ok.
Wed Dec 30 02:18:21 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Dec 30 02:18:21 2020 - [info]
```
- **Phase 2: Rejecting updates Phase..**
```bash
Wed Dec 30 02:18:21 2020 - [info] * Phase 2: Rejecting updates Phase..
Wed Dec 30 02:18:21 2020 - [info]
Wed Dec 30 02:18:21 2020 - [info] Executing master ip online change script to disable write on the current master:
Wed Dec 30 02:18:21 2020 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=220.220.220.229 --orig_master_ip=220.220.220.229 --orig_master_port=3306 --orig_master_user='mhauser' --new_master_host=220.220.220.228 --new_master_ip=220.220.220.228 --new_master_port=3306 --new_master_user='mhauser' --orig_master_ssh_user=mhauser --new_master_ssh_user=mhauser --orig_master_ssh_port=22222 --new_master_ssh_port=22222 --orig_master_password=xxx --new_master_password=xxx
Wed Dec 30 02:18:22 2020 079293 Set read_only on the new master.. ok.
Wed Dec 30 02:18:22 2020 086265 Drpping app user on the orig master..
Wed Dec 30 02:18:22 2020 087206 Set read_only=1 on the orig master.. ok.
Wed Dec 30 02:18:22 2020 090144 Killing all application threads..
Wed Dec 30 02:18:22 2020 090233 done.
Wed Dec 30 02:18:22 2020 - [info] ok.
Wed Dec 30 02:18:22 2020 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Wed Dec 30 02:18:22 2020 - [info] Executing FLUSH TABLES WITH READ LOCK..
Wed Dec 30 02:18:22 2020 - [info] ok.
Wed Dec 30 02:18:22 2020 - [info] Orig master binlog:pos is mysql-bin.000002:328.
Wed Dec 30 02:18:22 2020 - [info] Waiting to execute all relay logs on 220.220.220.228(220.220.220.228:3306)..
Wed Dec 30 02:18:22 2020 - [info] master_pos_wait(mysql-bin.000002:328) completed on 220.220.220.228(220.220.220.228:3306). Executed 0 events.
Wed Dec 30 02:18:22 2020 - [info] done.
Wed Dec 30 02:18:22 2020 - [info] Getting new master's binlog name and position..
Wed Dec 30 02:18:22 2020 - [info] mysql-bin.000005:342
Wed Dec 30 02:18:22 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='220.220.220.228', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=342, MASTER_USER='mhauser', MASTER_PASSWORD='xxx';
Wed Dec 30 02:18:22 2020 - [info] Executing master ip online change script to allow write on the new master:
Wed Dec 30 02:18:22 2020 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=220.220.220.229 --orig_master_ip=220.220.220.229 --orig_master_port=3306 --orig_master_user='mhauser' --new_master_host=220.220.220.228 --new_master_ip=220.220.220.228 --new_master_port=3306 --new_master_user='mhauser' --orig_master_ssh_user=mhauser --new_master_ssh_user=mhauser --orig_master_ssh_port=22222 --new_master_ssh_port=22222 --orig_master_password=xxx --new_master_password=xxx
Wed Dec 30 02:18:22 2020 508146 Set read_only=0 on the new master.
Wed Dec 30 02:18:22 2020 509683 Creating app user on the new master..
Wed Dec 30 02:18:27 2020 - [info] ok.
Wed Dec 30 02:18:27 2020 - [info]
Wed Dec 30 02:18:27 2020 - [info] * Switching slaves in parallel..
Wed Dec 30 02:18:27 2020 - [info]
Wed Dec 30 02:18:27 2020 - [info] -- Slave switch on host 220.220.220.230(220.220.220.230:3306) started, pid: 29168
Wed Dec 30 02:18:27 2020 - [info]
Wed Dec 30 02:18:28 2020 - [info] Log messages from 220.220.220.230 ...
Wed Dec 30 02:18:28 2020 - [info]
Wed Dec 30 02:18:27 2020 - [info] Waiting to execute all relay logs on 220.220.220.230(220.220.220.230:3306)..
Wed Dec 30 02:18:27 2020 - [info] master_pos_wait(mysql-bin.000002:328) completed on 220.220.220.230(220.220.220.230:3306). Executed 0 events.
Wed Dec 30 02:18:27 2020 - [info] done.
Wed Dec 30 02:18:27 2020 - [info] Resetting slave 220.220.220.230(220.220.220.230:3306) and starting replication from the new master 220.220.220.228(220.220.220.228:3306)..
Wed Dec 30 02:18:27 2020 - [info] Executed CHANGE MASTER.
Wed Dec 30 02:18:27 2020 - [info] Slave started.
Wed Dec 30 02:18:28 2020 - [info] End of log messages from 220.220.220.230 ...
Wed Dec 30 02:18:28 2020 - [info]
Wed Dec 30 02:18:28 2020 - [info] -- Slave switch on host 220.220.220.230(220.220.220.230:3306) succeeded.
Wed Dec 30 02:18:28 2020 - [info] Unlocking all tables on the orig master:
Wed Dec 30 02:18:28 2020 - [info] Executing UNLOCK TABLES..
Wed Dec 30 02:18:28 2020 - [info] ok.
Wed Dec 30 02:18:28 2020 - [info] All new slave servers switched successfully.
Wed Dec 30 02:18:28 2020 - [info]
```
- **Phase 5: New master cleanup phase..**
```bash
Wed Dec 30 02:18:28 2020 - [info] * Phase 5: New master cleanup phase..
Wed Dec 30 02:18:28 2020 - [info]
Wed Dec 30 02:18:28 2020 - [info] 220.220.220.228: Resetting slave info succeeded.
Wed Dec 30 02:18:28 2020 - [info] Switching master to 220.220.220.228(220.220.220.228:3306) completed successfully.
```
그럼 먼저 VIP가 제대로 FailBack 되었는지를 확인해보자.
```bash
- MASTER-Active
root@MHA-MASTER-ACTIVE:~# ip addr |grep 180.180.180.240
inet 180.180.180.240/24 brd 180.180.180.255 scope global secondary eth1:0
- MASTER-Backup
root@MHA-MASTER-BACKUP:~# ip addr |grep 180.180.180.240
root@MHA-MASTER-BACKUP:~#
```
위와 같이 VIP 이관 및 서비스도 정상적으로 이뤄지고 있다. 그럼 위에서 살펴본 SLVAE01 서버가 원래 기존의 MASTER인 220.220.220.228번으로 Replication을 바라보고 있는지 확인해보자.
```bash
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 220.220.220.228
Master_User: mhauser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 342
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 342
Relay_Log_Space: 865
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 228
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Slave_DDL_Groups: 16
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 294
1 row in set (0.002 sec)
```
여기도 정상이다. 그럼, FailOver된 기존의 MASTER는 어떤 상태일까?
```bash
root@MHA-MASTER-BACKUP:~# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 107
Server version: 10.3.27-MariaDB-1:10.3.27+maria~stretch-log mariadb.org binary distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> show slave status\G
Empty set (0.001 sec)
```
보다시피, 리플리케이션을 구성해줘야 한다. 이 과정도 FailOver과 동일하게, FailBack 진행 시 기록된 다음의 로그를 기준으로 수행하면 된다.
> CHANGE MASTER TO MASTER_HOST='220.220.220.228', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=342, MASTER_USER='mhauser', MASTER_PASSWORD='xxx'
```bash
root@MHA-MASTER-BACKUP:~# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 107
Server version: 10.3.27-MariaDB-1:10.3.27+maria~stretch-log mariadb.org binary distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='220.220.220.228', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=342, MASTER_USER='mhauser', MASTER_PASSWORD='ghdwngkselql';
Query OK, 0 rows affected (0.139 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)
```
Replication 상태도 정상이다.
```bash
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 220.220.220.228
Master_User: mhauser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 342
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 342
Relay_Log_Space: 865
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 228
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Slave_DDL_Groups: 16
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 294
1 row in set (0.000 sec)
```
이로써 FailBack도 정상적으로 수행되었다. 그럼 마지막으로 MHA Manager 데몬을 실행해주고, OK 상태를 확인하면 된다.
```bash
mhauser@MHA:~/scripts$ nohup masterha_manager --conf=/MHA/conf/mha.cnf < /dev/null >> /MHA/logs/mha.log 2>&1 &
[1] 29303
```
이로써 최종적으로 MHA 모니터링까지 복구가 완료되었다.
```bash
mhauser@MHA:~/scripts$ masterha_check_status --conf=/MHA/conf/mha.cnf
mha (pid:29303) is running(0:PING_OK), master:220.220.220.228
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment