请问Consul、ProxySQL、MGR如何搭建高可用mysql集群?
网友回复
MGR是 MySQL 官方主推的一套高可用方案。在 MGR 的基础上,结合Consul、ProxySQL可以组合出几乎满足所有业务场景的高可用方案。 这个架构方案分三层: Consul:dns解析、服务发现、健康检查;
ProxySQL:负载均衡、读写分离、故障发现;
MGR:单主模式、故障转移、强一致性。
具体的操作步骤如下:
操作环境:CentOS7 5.7.22 Group Replication MySQL5.7.22
mysql部分
在三台db服务器上面设置/etc/hosts映射,如下:
192.168.1.101 mydb1
192.168.1.102 mydb2
192.168.1.103 mydb3
安装的数据库服务器:
数据库服务器地址 端口 数据目录 Server-id
192.168.1.101(mydb1) 3306 /app/mysqldata/3306/data/ 1013306
192.168.1.102(mydb2) 3306 /app/mysqldata/3306/data/ 1023306
192.168.1.103(mydb3) 3306 /app/mysqldata/3306/data/ 1033306
在3台mysql实例上mydb1、mydb2、mydb3分配账号:
mysql> set sql_log_bin=0;
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.%' IDENTIFIED BY 'repl';
mysql> set sql_log_bin=1;
在3台my.cnf上面配置gtid:
[mysqld]
gtid_mode=ON
log-slave-updates=ON
enforce-gtid-consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=none
#group replication config
plugin-load = group_replication.so
transaction_write_set_extraction = XXHASH64
group_replication_start_on_boot = OFF
group_replication_bootstrap_group = OFF
group_replication_group_name = '__GROUP_UUID__'
group_replication_local_address = 'mydb1:6606'
group_replication_group_seeds = 'mydb1:6606,mydb2:6606,mydb3:6606'
group_replication_single_primary_mode = true
group_replication_enforce_update_everywhere_checks = false
[mysqld]
gtid_mode=ON
log-slave-updates=ON
enforce-gtid-consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=none
#group replication config
plugin-load = group_replication.so
transaction_write_set_extraction = XXHASH64
group_replication_start_on_boot = OFF
group_replication_bootstrap_group = OFF
group_replication_group_name = '__GROUP_UUID__'
group_replication_local_address = 'mydb2:6606'
group_replication_group_seeds = 'mydb1:6606,mydb2:6606,mydb3:6606'
group_replication_single_primary_mode = true
group_replication_enforce_update_everywhere_checks = false
[mysqld]
gtid_mode=ON
log-slave-updates=ON
enforce-gtid-consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=none
#group replication config
plugin-load = group_replication.so
transaction_write_set_extraction = XXHASH64
group_replication_start_on_boot = OFF
group_replication_bootstrap_group = OFF
group_replication_group_name = '__GROUP_UUID__'
group_replication_local_address = 'mydb3:6606'
group_replication_group_seeds = 'mydb1:6606,mydb2:6606,mydb3:6606'
group_replication_single_primary_mode = true
group_replication_enforce_update_everywhere_checks = false
--------------
group_replication_allow_local_disjoint_gtids_join=off
group_replication_member_weight=80
group_replication_unreachable_majority_timeout=5
group_replication_compression_threshold=131072
group_replication_transaction_size_limit=20971520
重启3台mysql服务
mysql> show plugins;
+----------------------------+----------+--------------------+----------------------+---------+
| Name | Status | Type | Library | License |
+----------------------------+----------+--------------------+----------------------+---------+
| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |
…………
| group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
+----------------------------+----------+--------------------+----------------------+---------+
45 rows in set (0.00 sec)
开始构建集群,在mydb1(master)上执行:
# 构建集群
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
#开启group_replication
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
mysql> select * from performance_schema.replication_group_members;
mysql> select * from performance_schema.replication_group_member_statsG
mysql> select * from performance_schema.replication_connection_status;
mysql> select * from performance_schema.replication_applier_status;
mydb2、mydb3上加入
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
START GROUP_REPLICATION;
验证集群复制功能
create database test;
use test;
create table t(id int primary key auto_increment,name varchar(20));
insert into t(name) values ('allen');
insert into t(name) values ('bob');
select * from t;
默认为单主模式,如果要使用多主模式,则需要在加入组前将参数设置为set global group_replication_single_primary_mode=off;
基于主键的并行执行
set global slave_parallel_type='LOGICAL_CLOCK';
set global slave_parallel_workers=N;
set global slave_preserve_commit_order=ON;
Group Replication在开启并行复制时,要求必须要设置slave_preserve_commit_order的值为ON
打开这个参数可以保证Applier上执行事务的提交顺序和源MySQL服务器上的提交顺序相同
强制移除故障成员,只需要在列表中的任意一个成员上设置即可
set global group_replication_force_members = 'mydb1:6606,mydb2:6606';
Group Replication的监控
Group Replication的状态信息被存储到了以下五个performance_schema表中,可以很方便地进行SQL语句查询:
replication_group_members
replication_group_member_stats
replication_connection_status
replication_applier_status
threads
查看主节点是哪个
show global status like "group_replication_primary_member";
select * from performance_schema.replication_group_members where member_id =(select variable_value from performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member');
本节点执行的GTID
select @@global.gtid_executedG
获取的GTID
SELECT Received_transaction_set FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier';
Group Replication是MySQL复制今后的发展方向
Group Replication创建了一个独立的TCP端口进行通信,各个MySQL服务器上的Group Replication插件通过这个端口连接在一起,两两之间可以直接通信。Binlog Event的传输并不像异步复制那样是简单的点到点之间的传输。Group Replication在传输数据时,使用了Paxos协议。Paxos协议保证了数据传输的一致性和原子性。Group Replication基于Paxos协议构建了一个分布式的状态复制机制,这是实现多主复制的核心技术。Group Replication中不会出现脑裂的现象。Group Replication的冗余能力很好,能够保证Binlog Event至少被复制到超过一半的成员上
单主模式:组内的成员自动选举出主成员。初始化时,被初始化的成员自动选举为主成员,其他加入组的成员自动成为从成员。当主成员发生故障或离开组时,会从组内的其他成员中选出一个新的主成员。选取主成员的方法很简单,首先对所有在线成员的UUID进行排序,选取UUID最小的成员作为主成员
读写模式的自动切换
Set global super_read_only=1;
Set global super_read_only=0;
多主模式:在启动Group Replication插件时,它会检测用户是否配置了MySQL的自增变量。如果用户没有配置这两个变量(auto_increment_offset和auto_increment_increment都为1),则会自动将group_replication_auto_increment_increment和server-id的值设置到MySQL的auto_increment_increment和auto_increment_offset全局变量中
注意使用mysqldump进行导入的时候gtid_purged
测试1:MGR主节点宕机
测试2:MGR一个从节点宕机
测试3:过半从节点同时宕机
测试4:主节点连续宕机
测试5:主节点网络抖动测试
测试6:从节点网络抖动测试
测试7:主从节点网络分区
测试8:主从节点网络延迟
测试9:主节点KILL大事务
测试10:对MyISAM表的处理
测试11:MGR对外键的支持-单主模式
测试12:mysqldump对MGR的支持
测试13:无主键表的支持
测试14:gh-ost对MGR的支持
测试15:pt-osc对MGR的支持
测试16:xtrabackup对MGR的支持
测试17:binlog_format=statement的影响
--------------------------------------------------------------------------------------------------------------------------------------------
Consul部分
consul_1.2.3_linux_amd64.zip
192.168.1.121 consul server
192.168.1.101 consul client mgr
192.168.1.102 consul client mgr
192.168.1.103 consul client mgr
1、新建目录(server和client都要建立)
[root@kettle1 ~]# mkdir /etc/consul.d/ //存放consul配置文件
[root@kettle1 ~]# mkdir -p /data/consul //数据目录
[root@kettle1 ~]# mkdir -p /data/consul/shell //存放检查脚本
[root@mydb1 ~]# mkdir /etc/consul.d/ //存放consul配置文件
[root@mydb1 ~]# mkdir -p /data/consul/ //数据目录
[root@mydb1 ~]# mkdir -p /data/consul/shell/ //存放检查脚本
[root@mydb2 ~]# mkdir /etc/consul.d/ //存放consul配置文件
[root@mydb2 ~]# mkdir -p /data/consul/ //数据目录
[root@mydb2 ~]# mkdir -p /data/consul/shell/ //存放检查脚本
[root@mydb3 ~]# mkdir /etc/consul.d/ //存放consul配置文件
[root@mydb3 ~]# mkdir -p /data/consul/ //数据目录
[root@mydb3 ~]# mkdir -p /data/consul/shell/ //存放检查脚本
2、新建server.json,将相关配置写入(两种方法:一种写入配置文件,启动时指定;另一种启动时候写成命令行的形式)
[root@kettle1 ~]# cat /etc/consul.d/server.json
{
"data_dir": "/data/consul",
"datacenter": "dc1",
"log_level": "INFO", //打印日志级别
"server": true, //是否是consul server
"node_name": "wsh", //集群node,集群中必须唯一,默认为主机名
"bootstrap_expect": 1, //通知consul我们准备加入的server节点个数,该参数是为了延迟日志复制的启动直到我们指定数量的server节点成功加入后启动
"bind_addr": "192.168.1.121", //指明节点的ip地址,用于集群之间的交流
"client_addr": "192.168.1.121", //client绑定的接口,默认是127.0.0.1
"ui":true
}
配置的时候将注释去掉
3、启动server
[root@kettle1 ~]# consul agent -config-dir=/etc/consul.d -enable-script-checks > /data/consul/consul.log &
[root@kettle1 soft]# consul members -http-addr='192.168.1.121:8500'
4、配置client:除了自身的client.json,还加入了检查MGR的脚本--192.168.1.101/102/103
[root@mydb1 ~]# cat /etc/consul.d/client.json
{
"data_dir": "/data/consul",
"enable_script_checks": true,
"bind_addr": "192.168.1.101",
"retry_join": ["192.168.1.121"],
"retry_interval": "30s",
"rejoin_after_leave": true,
"start_join": ["192.168.1.121"] ,
"node_name": "mgr_client1"
}
[root@mydb2 ~]# cat /etc/consul.d/client.json
{
"data_dir": "/data/consul",
"enable_script_checks": true,
"bind_addr": "192.168.1.102",
"retry_join": ["192.168.1.121"],
"retry_interval": "30s",
"rejoin_after_leave": true,
"start_join": ["192.168.1.121"] ,
"node_name": "mgr_client2"
}
[root@mydb3 ~]# cat /etc/consul.d/client.json
{
"data_dir": "/data/consul",
"enable_script_checks": true,
"bind_addr": "192.168.1.103",
"retry_join": ["192.168.1.121"],
"retry_interval": "30s",
"rejoin_after_leave": true,
"start_join": ["192.168.1.121"] ,
"node_name": "mgr_client3"
}
[root@mydb1 ~]# cat /etc/consul.d/r-test-mgr-ser.json
{
"service": {
"name": "r-test-3306-mydb-ser",
"tags": ["测试-3306"],
"address": "192.168.1.101",
"meta": {
"meta": "for my service"
},
"port": 3306,
"enable_tag_override": false,
"checks": [
{
"args": ["/data/consul/shell/check_mysql_mgr_slave.sh"],
"interval": "1s"
}
]
}
}
[root@mydb1 ~]# cat /etc/consul.d/w-test-mgr-ser.json
{
"service": {
"name": "w-test-3306-mydb-ser",
"tags": ["测试-3306"],
"address": "192.168.1.101",
"meta": {
"meta": "for my service"
},
"port": 3306,
"enable_tag_override": false,
"checks": [
{
"args": ["/data/consul/shell/check_mysql_mgr_master.sh"],
"interval": "10s"
}
]
}
}
注意在mydb2,mydb3上调整ip
检测脚本如下
[root@mydb1 ~]# cat /data/consul/shell/check_mysql_mgr_master.sh
#!/bin/bash
host="192.168.1.101"
port=3306
user="dba_user"
passwod="msds007"
comm="/usr/local/mysql/bin/mysql -u$user -h$host -P $port -p$passwod"
value=`$comm -Nse "select 1"`
primary_member=`$comm -Nse "select variable_value from performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member'"`
server_uuid=`$comm -Nse "select variable_value from performance_schema.global_variables where VARIABLE_NAME='server_uuid';"`
# 判断MySQL是否存活
if [ -z $value ]
then
echo "mysql $port is down....."
exit 2
fi
# 判断节点状态,是否存活
node_state=`$comm -Nse "select MEMBER_STATE from performance_schema.replication_group_members where MEMBER_ID='$server_uuid'"`
if [ $node_state != "ONLINE" ]
then
echo "MySQL $port state is not online...."
exit 2
fi
# 判断是不是主节点
if [[ $server_uuid == $primary_member ]]
then
echo "MySQL $port Instance is master ........"
exit 0
else
echo "MySQL $port Instance is slave ........"
exit 2
fi
[root@mydb1 ~]# cat /data/consul/shell/check_mysql_mgr_slave.sh
#!/bin/bash
host="192.168.1.101"
port=3306
user="dba_user"
passwod="msds007"
comm="/usr/local/mysql/bin/mysql -u$user -h$host -P $port -p$passwod"
value=`$comm -Nse "select 1"`
primary_member=`$comm -Nse "select variable_value from performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member'"`
server_uuid=`$comm -Nse "select variable_value from performance_schema.global_variables where VARIABLE_NAME='server_uuid';"`
# 判断mysql是否存活
if [ -z $value ]
then
echo "mysql $port is down....."
exit 2
fi
# 判断节点状态
node_state=`$comm -Nse "select MEMBER_STATE from performance_schema.replication_group_members where MEMBER_ID='$server_uuid'"`
if [ $node_state != "ONLINE" ]
then
echo "MySQL $port state is not online...."
exit 2
fi
# 判断是不是主节点
if [[ $server_uuid != $primary_member ]]
then
echo "MySQL $port Instance is slave ........"
exit 0
else
node_num=`$comm -Nse "select count(*) from performance_schema.replication_group_members"`
# 判断如果没有任何从节点,主节点也注册从角色服务。
if [ $node_num -eq 1 ]
then
echo "MySQL $port Instance is slave ........"
exit 0
else
echo "MySQL $port Instance is master ........"
exit 2
fi
fi
5、启动3台client
[root@mydb1 ~]# consul agent -config-dir=/etc/consul.d -enable-script-checks > /data/consul/consul.log &
[root@mydb2 ~]# consul agent -config-dir=/etc/consul.d -enable-script-checks > /data/consul/consul.log &
[root@mydb3 ~]# consul agent -config-dir=/etc/consul.d -enable-script-checks > /data/consul/consul.log &
6、查看集群状态
[root@kettle1 ~]# consul members -http-addr='192.168.1.121:8500'
Node Address Status Type Build Protocol DC Segment
wsh 192.168.1.121:8301 alive server 1.2.3 2 dc1
点击查看剩余70%