My Lazy Admin: Pacemaker in Centos 7 (no fencing)

Lab Specifications
==================

Host OS: Ubuntu 17.10 (artful)
|_ Virtualization: VirtualBox 5.1.34_Ubuntu r121010 (Qt5.9.1)
|_ Virtual Machine OS: CentOS Linux release 7.4.1708 (Core)

Setup
=====

1. Install packages (perform on all nodes)

[root@node1 ~]# yum install -y pcs pacemaker resource-agents
[...]
Complete
[root@node1 ~]#

2. Enable pcs service (perform on all nodes)

[root@node1 ~]# systemctl enable --now pcsd
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@node1 ~]#

3. Configure firewall (perform on all nodes)

[root@node1 ~]# firewall-cmd --add-service=high-availability --permanent
success
[root@node1 ~]# firewall-cmd --reload
success
[root@node1 ~]#

4. Change "hacluster" password (perform on all nodes)

[root@node1 ~]# echo samplepass123 | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@node1 ~]#

5. Setup cluster authentication (perform on node1 only)

[root@node1 ~]# pcs cluster auth node1 node2 node3 -u hacluster -p samplepass123 --force
node1: Authorized
node3: Authorized
node2: Authorized
[root@node1 ~]#

6. Create cluster an populate with nodes (perform on node1 only)

[root@node1 ~]# pcs cluster setup --force --name mycluster node1 node2 node3
Destroying cluster on nodes: node1, node2, node3...
node1: Stopping Cluster (pacemaker)...
node3: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node3: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node1', 'node2', 'node3'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
node3: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded
node3: Succeeded

Synchronizing pcsd certificates on nodes node1, node2, node3...
node1: Success
node3: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node3: Success
node2: Success
[root@node1 ~]#

7. Start cluster (perform on node1 only)

[root@node1 ~]# pcs cluster start --all
node2: Starting Cluster...
node3: Starting Cluster...
node1: Starting Cluster...
[root@node1 ~]#

8. Disable fencing (perform on node1 only)

[root@node1 ~]# pcs property set stonith-enabled=false
[root@node1 ~]#

9. For demo only, force sevices to move to another node after single failure (perform on node1 only)

[root@node1 ~]# pcs resource defaults migration-threshold=1
[root@node1 ~]#

10. Add a resource (perform on node1 only)

[root@node1 ~]# pcs resource create sample_service ocf:heartbeat:Dummy op monitor interval=120s
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:02:12 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1

3 nodes configured
1 resource configured

Online: [ node1 node2 node3 ]

Full list of resources:

sample_service (ocf::heartbeat:Dummy): Started node1

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]#

11. Simulate a single failure (perform on node1 only)

[root@node1 ~]# crm_resource --resource sample_service --force-stop
Operation stop for sample_service (ocf:heartbeat:Dummy) returned 0
> stderr: DEBUG: sample_service stop : 0
[root@node1 ~]#
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:06:32 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1

3 nodes configured
1 resource configured

Online: [ node1 node2 node3 ]

Full list of resources:

sample_service (ocf::heartbeat:Dummy): Started node2

Failed Actions:
* sample_service_monitor_120000 on node1 'not running' (7): call=7, status=complete, exitreason='none',
last-rc-change='Sun Apr 22 09:05:44 2018', queued=0ms, exec=0ms

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@node1 ~]#

Notice that "sample_service" was started on node2 after it fail on node1. This
is a simple simulation on how the high-availability works on pacemaker.

My Lazy Admin

Sunday, March 28, 2021

Pacemaker in Centos 7 (no fencing)

No comments:

Post a Comment