==================
Host OS: Ubuntu 17.10 (artful)
|_ Virtualization: VirtualBox 5.1.34_Ubuntu r121010 (Qt5.9.1)
|_ Virtual Machine OS: CentOS Linux release 7.4.1708 (Core)
Setup
=====
1. Install packages (perform on all nodes)
[root@node1 ~]# yum install -y pcs pacemaker resource-agents
[...]
Complete
[root@node1 ~]#
[...]
Complete
[root@node1 ~]#
2. Enable pcs service (perform on all nodes)
[root@node1 ~]# systemctl enable --now pcsd
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@node1 ~]#
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@node1 ~]#
3. Configure firewall (perform on all nodes)
[root@node1 ~]# firewall-cmd --add-service=high-availability --permanent
success
[root@node1 ~]# firewall-cmd --reload
success
[root@node1 ~]#
success
[root@node1 ~]# firewall-cmd --reload
success
[root@node1 ~]#
4. Change "hacluster" password (perform on all nodes)
[root@node1 ~]# echo samplepass123 | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@node1 ~]#
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@node1 ~]#
5. Setup cluster authentication (perform on node1 only)
[root@node1 ~]# pcs cluster auth node1 node2 node3 -u hacluster -p samplepass123 --force
node1: Authorized
node3: Authorized
node2: Authorized
[root@node1 ~]#
node1: Authorized
node3: Authorized
node2: Authorized
[root@node1 ~]#
6. Create cluster an populate with nodes (perform on node1 only)
[root@node1 ~]# pcs cluster setup --force --name mycluster node1 node2 node3
Destroying cluster on nodes: node1, node2, node3...
node1: Stopping Cluster (pacemaker)...
node3: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node3: Successfully destroyed cluster
node2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node1', 'node2', 'node3'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
node3: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded
node3: Succeeded
Synchronizing pcsd certificates on nodes node1, node2, node3...
node1: Success
node3: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node3: Success
node2: Success
[root@node1 ~]#
Destroying cluster on nodes: node1, node2, node3...
node1: Stopping Cluster (pacemaker)...
node3: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node3: Successfully destroyed cluster
node2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node1', 'node2', 'node3'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
node3: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded
node3: Succeeded
Synchronizing pcsd certificates on nodes node1, node2, node3...
node1: Success
node3: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node3: Success
node2: Success
[root@node1 ~]#
7. Start cluster (perform on node1 only)
[root@node1 ~]# pcs cluster start --all
node2: Starting Cluster...
node3: Starting Cluster...
node1: Starting Cluster...
[root@node1 ~]#
node2: Starting Cluster...
node3: Starting Cluster...
node1: Starting Cluster...
[root@node1 ~]#
8. Disable fencing (perform on node1 only)
[root@node1 ~]# pcs property set stonith-enabled=false
[root@node1 ~]#
[root@node1 ~]#
9. For demo only, force sevices to move to another node after single failure (perform on node1 only)
[root@node1 ~]# pcs resource defaults migration-threshold=1
[root@node1 ~]#
[root@node1 ~]#
10. Add a resource (perform on node1 only)
[root@node1 ~]# pcs resource create sample_service ocf:heartbeat:Dummy op monitor interval=120s
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:02:12 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1
3 nodes configured
1 resource configured
Online: [ node1 node2 node3 ]
Full list of resources:
sample_service (ocf::heartbeat:Dummy): Started node1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]#
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:02:12 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1
3 nodes configured
1 resource configured
Online: [ node1 node2 node3 ]
Full list of resources:
sample_service (ocf::heartbeat:Dummy): Started node1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]#
11. Simulate a single failure (perform on node1 only)
[root@node1 ~]# crm_resource --resource sample_service --force-stop
Operation stop for sample_service (ocf:heartbeat:Dummy) returned 0
> stderr: DEBUG: sample_service stop : 0
[root@node1 ~]#
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:06:32 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1
3 nodes configured
1 resource configured
Online: [ node1 node2 node3 ]
Full list of resources:
sample_service (ocf::heartbeat:Dummy): Started node2
Failed Actions:
* sample_service_monitor_120000 on node1 'not running' (7): call=7, status=complete, exitreason='none',
last-rc-change='Sun Apr 22 09:05:44 2018', queued=0ms, exec=0ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]#
Operation stop for sample_service (ocf:heartbeat:Dummy) returned 0
> stderr: DEBUG: sample_service stop : 0
[root@node1 ~]#
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:06:32 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1
3 nodes configured
1 resource configured
Online: [ node1 node2 node3 ]
Full list of resources:
sample_service (ocf::heartbeat:Dummy): Started node2
Failed Actions:
* sample_service_monitor_120000 on node1 'not running' (7): call=7, status=complete, exitreason='none',
last-rc-change='Sun Apr 22 09:05:44 2018', queued=0ms, exec=0ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]#
Notice that "sample_service" was started on node2 after it fail on node1. This
is a simple simulation on how the high-availability works on pacemaker.
No comments:
Post a Comment