Wednesday, March 31, 2021

Pacemaker/Corosync on Ubuntu

One common way of achieving High-Availability setup is by installing pacemaker
and corosync on your nodes. Pacemaker controls and manages the resources which
is dependent on Corosync which controls the communication between nodes.

We will configure an active/passive setup on 2 nodes running on Ubuntu Server
17.04 and mimic a real-world scenario on which 1 node went down. Do note that
most of the commands below needs to be executed on both nodes unless the task
explicitly designated for one (or any) node only.

Configuring the cluster
=======================

1. In any clustering software, time is a critical factor in ensuring
synchronization between the nodes so let's configure our time/data settings
properly by installing ntp. After installation, wait for few minutes and check
the date and time if they are already correct.

sudo apt-get install ntp
sudo systemctl start ntp
sudo systemctl enable ntp
[...]
date

2. Install our cluster suite. In Ubuntu 17.04, which is the OS of our choice,
"corosync" and other required packages can be installed by just installin
"pacemaker". After installation, make sure required services are running and
enabled at boot.

sudo apt-get install pacemaker
sudo systemctl start pacemaker corosync
sudo systemctl enable pacemaker corosync

3. (Do this one one node only) Corosync requires an authkey (authorization key)
to be present on all members of the cluster. To create one, install an entropy
package, generate authkey and send that to all members of the cluster. In our
case, we will send it to node2. The generated authkey is located in
/etc/corosync/authkey.

sudo apt-get install haveged
sudo corosync-keygen
scp /etc/corosync/authkey node2:/etc/corosync/authkey

4. Backup the default corosync.conf and replace the contents with the config
below. The important items here are the 3 IP addresses - IPs of each nodes and
the bind IP which will be used by the cluster itself. You must decide on what
bind IP will you use. Just make sure that it is not used by any host on your
network. Later, you will see that it will be automatically generated when we
start our cluster.

sudo cp /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
sudo vi /etc/corosync/corosync.conf

--- START ---
totem {
  version: 2
  cluster_name: lbcluster
  transport: udpu
  interface {
    ringnumber: 0
    bindnetaddr: <put bind IP here>
    broadcast: yes
    mcastport: 5405
  }
}
quorum {
  provider: corosync_votequorum
  two_node: 1
}
nodelist {
  node {
    ring0_addr: <put node1 IP>
    name: primary
    nodeid: 1
  }
  node {
    ring0_addr: <put node2 IP>
    name: secondary
    nodeid: 2
  }
}
logging {
  to_logfile: yes
  logfile: /var/log/corosync/corosync.log
  to_syslog: yes
  timestamp: on
}
--- END ---

5. Now, we need to allow pacemaker service in corosync. We do that by creating
the service directory and "pcmk" file inside it. We need to add one more setting
on the default file.

sudo mkdir /etc/corosync/service.d
sudo vi /etc/corosync/service.d/pcmk

--- START ---
service {
  name: pacemaker
  ver: 1
}
--- END ---

sudo echo "START=yes" >> /etc/default/corosync

6. Let's restart corosync and pacemaker to get the configuration we made.

sudo systemctl restart corosync pacemaker

7. Let's verify if the cluster honored the node IPs. We must have similar output
below where both node IPs were detected.

sudo corosync-cmapctl | grep members

* sample output *

runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.0.1)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.0.2)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

8. We can now interact with pacemaker and see the status of our cluster using
"crm" command. On the output below, you will see that both nodes (primary and
secondary) are online but we still don't have a resource. On the next steps, we
will create a virtual IP resource.

crm status

* sample output *

Stack: corosync
Current DC: primary (version 1.1.16-94ff4df) - partition with quorum
Last updated: Thu Dec 28 20:13:19 2017
Last change: Thu Dec 28 20:10:30 2017 by hacluster via crmd on primary

3 nodes configured
0 resources configured

Node node1: UNCLEAN (offline)
Online: [ primary secondary ]

No resources

9. (Do this on one node only) Before creating our virtual IP resource, let's
disable quorum and fencing settings for simplicity. Whenever we configure any
properties, we can do it on node only since it will be synchronizes to all
members.

sudo crm configure property stonith-enabled=false
sudo crm configure property no-quorum-policy=ignore

10. (Do this on one node only) Let's create our first resource using the command
below. This will be a virtual IP (or the bind IP) that will represent our
cluster. Meaning, access to the cluster must be done via this IP and not on the
individual IPs of the nodes. Since we are aiming for an active/passive setup, it
is better to add `resource-stickiness="100"` as one of the parameters. That
means that when one node is offline, the other node will get the bind IP and
assign it to itself from that moment even after the other node comes back to
life. Be sure to set this IP same as the `bindnetaddr` inside corosync.conf.

sudo crm configure primitive virtual_ip \
ocf:heartbeat:IPaddr2 params ip="10.1.1.3" \
cidr_netmask="32" op monitor interval="10s" \
meta migration-threshold="2" failure-timeout="60s" \
resource-stickiness="100"

Same thing in configuring a cluster property, the command above needs to be ran
on one node only since it will be synchronize across the members.

11. Once a resource is created, it will immediately appear on the status. Let's
verify. From the output below, you can see that the virtual_ip resource is
started on the primary which is pertaining to node1. So if you log in to that
node and inspect the network interfaces, you should see a new one tied to the
bind IP. Also, at this moment, that bind IP is already UP and pingable.

sudo crm status

* sample output *

Stack: corosync                                                                 
Current DC: primary (version 1.1.16-94ff4df) - partition with quorum
Last updated: Thu Dec 28 20:33:39 2017     
Last change: Thu Dec 28 20:32:54 2017 by root via cibadmin on primary
                                           
3 nodes configured   
1 resource configured
                             
Online: [ primary secondary ]           
                 
Full list of resources:
                                       
 virtual_ip     (ocf::heartbeat:IPaddr2):       Started primary

Testing High-Availability
=========================

Now that we have a fully working cluster, the best way to appreciate its magic
is by testing!

1. (Do this on node1 only). Let's replicate a real world scenario where the
primary node went down. What will happen to the cluster? Will the bind IP go
down also? You may mimic such scenario by disconnecting the interface or
powering off the server but for a quicker way, let's use our favorite "crm"
command to switch the primary node into "standby" mode.

sudo crm node standby primary

* sample output *

Stack: corosync
Current DC: primary (version 1.1.16-94ff4df) - partition with quorum
Last updated: Thu Dec 28 20:44:45 2017
Last change: Thu Dec 28 20:40:02 2017 by root via crm_attribute on primary

3 nodes configured
1 resource configured

Node primary: standby
Online: [ secondary ]

Full list of resources:

 virtual_ip     (ocf::heartbeat:IPaddr2):       Started secondary

If you started doing a continuous ping to the bind IP before this step, you will
notice a 1 - 5 second pause. Our HA is doing its magic. It is moving the bind IP
from the primary (node1) to the secondary (node2). And when you login in to
the secondary, you will see that a new interface is created having the bind IP.
The primary node no longer have that interface.

2. Now, let's remove the primary from standby mode.

sudo crm node online primary

* sample output *

Stack: corosync
Current DC: primary (version 1.1.16-94ff4df) - partition with quorum
Last updated: Thu Dec 28 20:48:28 2017
Last change: Thu Dec 28 20:48:25 2017 by root via crm_attribute on primary

3 nodes configured
1 resource configured

Online: [ primary secondary ]

Full list of resources:

 virtual_ip     (ocf::heartbeat:IPaddr2):       Started secondary

Both nodes are now online but the bind IP is still on the secondary since we
added `resource-stickiness="100"` as one of the parameters when we configure
our resource.

That concludes our post for today. Hope you learn something! :)

Tuesday, March 30, 2021

Single-node K8 Cluster from Scratch (Centos)

Docker is the new way of deploying your application. Since more and more are
using it, there are orchestration tools that were published to manage it.
Some of them are docker swarm, cattle, and kubernetes.

On this post, we will setup Centos 7.3 VM running in virtualbox that will act as
the master and node to understand the basics on how each pieces of kubernetes
work. In production scenarios, your node must be a different machine from your
master. We will do a 1 node setup for simplicity.

We will do this from scratch without using the official RPM installers from
kubernetes and will use the latest tarball versions for kubernetes which is
v1.9.2 and v3.2.6 for Etcd as the time of this writing.

This assumes that you have a basic understanding on how docker container works.

Before proceeding, here is a summary of versions used in this tutorial.

Host OS: Ubuntu 17.04 (Zesty)
 |_ Virtualization: VirtualBox 5.2.4 r119785 (Qt5.7.1)
      |_ Virtual Machine OS: CentOS Linux release 7.3.1611 (Core)
           |_ Kubernetes: 1.9.2
           |_ ETCD: 3.2.6
           |_ Docker: 1.12.6

1. First, download the kubernetes and etcd tarballs. Kubernetes is the
orchestration tool and Etcd is the key-value store database where we will store
the whole information on the cluster.

[root@vm01 ~]# wget https://dl.k8s.io/v1.9.2/kubernetes-server-linux-amd64.tar.gz
<output truncated>
[root@vm01 ~]# wget https://github.com/coreos/etcd/releases/download/v3.2.6/etcd-v3.2.6-linux-amd64.tar.gz
<output truncated>
[root@vm01 ~]#

2. Disable swap on your machine. Kubernetes don't want swap because of
performance reasons - swap is slow and running containers in memory are faster.

[root@vm01 ~]# swapoff -a
[root@vm01 ~]# cp /etc/fstab /etc/fstab.orig
[root@vm01 ~]# sed -i 's/.*swap.*//g' /etc/fstab
[root@vm01 ~]# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Fri Aug  4 14:30:07 2017
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/cl-root     /                       xfs     defaults        0 0
UUID=bfc05119-977b-4f3f-a260-5e548d5cdd88 /boot                   xfs     defaults        0 0

[root@vm01 ~]#

3. Since our VM will act also as the node. We must install docker on it. In a
multi-node setup, installing docker on the master is not required.

[root@vm01 ~]# yum install -y docker
<output truncated>
[root@vm01 ~]# systemctl enable --now docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@vm01 ~]#

4. Unpack the Kubernetes tarball. This includes all binary files needed to run
and manage the whole cluster.

[root@vm01 ~]# tar xvf kubernetes-server-linux-amd64.tar.gz
<output truncated>
[root@vm01 ~]# cp kubernetes/server/bin/* /usr/local/bin/

5. Unpack the etcd tarball. Etcd is the main area where we will store all
information about our cluster. That includes configuration and network settings.

[root@vm01 ~]# tar xvf etcd-v3.2.6-linux-amd64.tar.gz
<output truncated>
[root@vm01 ~]# cp etcd-v3.2.6-linux-amd64/etcd* /usr/local/bin/
[root@vm01 ~]#

6. Start "Etcd". As per my understanding on etcd's help page, the listen client
urls is the url where etcd will listen for client traffic while the advertise
url is the one that will be exposed to clients. Actually I'm still confused
with this hehe. You can verify the status of etcd using `etcdctl` command.

[root@vm01 ~]# etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://localhost:2379 &> /tmp/etcd.log &
[root@vm01 ~]# etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from http://192.168.1.111:2379
cluster is healthy
[root@vm01 ~]#

7. In order to talk to etcd, we need to launch the "Kubernetes Apiserver". This
is the only thing we can use to retrieve and put information to the database.
Wait for few seconds for the apiserver to startup or tail the logs. Once
started, you can use curl to verify if you can talk to the api. The default api
port is 8080.

[root@vm01 ~]# kube-apiserver --etcd-servers=http://localhost:2379 --service-cluster-ip-range=10.0.0.0/16 --bind-address=0.0.0.0 --insecure-bind-address=0.0.0.0 &> /tmp/apiserver.log &
[root@vm01 ~]# curl http://localhost:8080/api/
{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "192.168.1.111:6443"
    }
  ]
}[root@vm01 ~]#

8. Launch "Kubelet" and create manifest directory. This process is the one that
interacts with docker daemon and api server. It also watches for pods to
create by looking inside the manifest location. Prior to that, we also need to
specify a kubelet configuration in yaml format to point kubelet to the url of
our api server.

[root@vm01 ~]# mkdir /tmp/manifests
[root@vm01 ~]# mkdir -p /var/lib/kubelet
[root@vm01 ~]# cat << EOF > /var/lib/kubelet/kubeconfig
apiVersion: v1
kind: Config
clusters:
- name: local
  cluster:
    server: http://localhost:8080
users:
- name: kubelet
contexts:
- context:
    cluster: local
    user: kubelet
  name: kubelet-context
current-context: kubelet-context
EOF
[root@vm01 ~]# kubelet --kubeconfig /var/lib/kubelet/kubeconfig --require-kubeconfig --pod-manifest-path /tmp/manifests --cgroup-driver=systemd --kubelet-cgroups=/systemd/system.slice --runtime-cgroups=/etc/systemd/system.slice &> /tmp/kubelet.log &
[root@vm01 ~]#

9. Now that we have a kubelet running, we can create a manifest and let kubelet
pick it up and create a pod.

[root@vm01 ~]# cat << EOF > /tmp/manifests/nginx-pod.yml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
EOF
[root@vm01 ~]#
[root@vm01 ~]#
[root@vm01 ~]# kubectl get pods
NAME         READY     STATUS              RESTARTS   AGE
nginx-vm01   0/1       ContainerCreating   0          3m
[root@vm01 ~]#

There is a minor issue that I encounter here - the pod stays on that status
and I found out in the kubelet.log that there is an issue pulling the image from
google's registry (gcr.io).

[root@vm01 ~]# tail -f /tmp/kubelet.log
I0120 20:42:20.668948    3813 kubelet.go:1767] Starting kubelet main sync loop.
I0120 20:42:20.668994    3813 kubelet.go:1778] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
I0120 20:42:20.669202    3813 server.go:129] Starting to listen on 0.0.0.0:10250
I0120 20:42:20.670390    3813 server.go:299] Adding debug handlers to kubelet server.
F0120 20:42:20.671438    3813 server.go:141] listen tcp 0.0.0.0:10250: bind: address already in use
E0120 20:42:33.715870    3643 kube_docker_client.go:341] Cancel pulling image "gcr.io/google_containers/pause-amd64:3.0" because of no progress for 1m0s, latest progress: "Trying to pull repository gcr.io/google_containers/pause-amd64 ... "
E0120 20:42:33.717152    3643 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling image "gcr.io/google_containers/pause-amd64:3.0": context canceled
E0120 20:42:33.717288    3643 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nginx-vm01_default(9960993506b2e8bf46ae1eb7b1da0edf)" failed: rpc error: code = Unknown desc = failed pulling image "gcr.io/google_containers/pause-amd64:3.0": context canceled
E0120 20:42:33.717333    3643 kuberuntime_manager.go:647] createPodSandbox for pod "nginx-vm01_default(9960993506b2e8bf46ae1eb7b1da0edf)" failed: rpc error: code = Unknown desc = failed pulling image "gcr.io/google_containers/pause-amd64:3.0": context canceled
E0120 20:42:33.718919    3643 pod_workers.go:186] Error syncing pod 9960993506b2e8bf46ae1eb7b1da0edf ("nginx-vm01_default(9960993506b2e8bf46ae1eb7b1da0edf)"), skipping: failed to "CreatePodSandbox" for "nginx-vm01_default(9960993506b2e8bf46ae1eb7b1da0edf)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nginx-vm01_default(9960993506b2e8bf46ae1eb7b1da0edf)\" failed: rpc error: code = Unknown desc = failed pulling image \"gcr.io/google_containers/pause-amd64:3.0\": context canceled"
[root@vm01 ~]#

So I tried puling it myself and it works fine.

[root@vm01 ~]# docker pull gcr.io/google_containers/pause-amd64:3.0
Trying to pull repository gcr.io/google_containers/pause-amd64 ...
3.0: Pulling from gcr.io/google_containers/pause-amd64
a3ed95caeb02: Pull complete
f11233434377: Pull complete
Digest: sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516
[root@vm01 ~]#

Then after few minutes, the error on the logs no longer appear and the pod was
successfully created. So as a workaroung, we need to prepull that image. I
haven't seen any issue similar to this in the internet.

[root@vm01 ~]# docker ps
CONTAINER ID        IMAGE                                                                                     COMMAND                  CREATED             STATUS              PORTS               NAMES
f0439466926c        docker.io/nginx@sha256:285b49d42c703fdf257d1e2422765c4ba9d3e37768d6ea83d7fe2043dad6e63d   "nginx -g 'daemon off"   41 minutes ago      Up 41 minutes                           k8s_nginx_nginx-vm01_default_9960993506b2e8bf46ae1eb7b1da0edf_0
e053d46f5b68        gcr.io/google_containers/pause-amd64:3.0                                                  "/pause"                 46 minutes ago      Up 46 minutes                           k8s_POD_nginx-vm01_default_9960993506b2e8bf46ae1eb7b1da0edf_0
[root@vm01 ~]#
[root@vm01 ~]# kubectl get pods -o wide
NAME         READY     STATUS    RESTARTS   AGE       IP           NODE
nginx-vm01   1/1       Running   0          58m       172.17.0.2   vm01
[root@vm01 ~]#

You can notice that there are 2 running containers that were created from the
manifest we dropped. The first one is for nginx itself and the other is for the
pause container. "pause" container is the one that provides IP address to the
containers. This is the infrastrucure container that is created first when
creating a pod.

To verify that the created pod is working, you should be able to see its content
from inside the node/master using the IP.

[root@vm01 ~]# curl http://172.17.0.2
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@vm01 ~]#

10. Start "Kubernetes Scheduler". It is responsible for assigning pods to nodes.

[root@vm01 ~]# kube-scheduler --master=http://localhost:8080 &> /tmp/kube-scheduler.log &
[root@vm01 ~]#

10. Start "Kubernetes Controller Manager". It is responsible for managing
"Replica Sets" and "Replication Controllers". This is also required so we can
create deployments. Deployments are the rules that defined how to start pods and
how many replicas needs to be started. If deployment is created and you deleted
some replicas, those will be recreated based from deployment's rules.

[root@vm01 ~]# kube-controller-manager --master=http://localhost:8080 &> /tmp/kube-controller-manager.log &
[root@vm01 ~]# cat << EOF > nginx-deployment.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 3
  template:
    metadata:
      labels:
        run: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
EOF
[root@vm01 ~]#
[root@vm01 ~]# kubectl create -f nginx-deployment.yml
deployment "nginx" created
[root@vm01 ~]#
[root@vm01 ~]# kubectl get deployments
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
nginx     3         3         3            3           2m
[root@vm01 ~]#

At this point, we have 3 nginx pods that where created by our deployment and 1
nginx pod that is created via the manifest we dropped.

[root@vm01 ~]# kubectl get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP           NODE
nginx-7587c6fdb6-7c2xk   1/1       Running   0          5m        172.17.0.5   vm01
nginx-7587c6fdb6-d467f   1/1       Running   0          5m        172.17.0.3   vm01
nginx-7587c6fdb6-ls2f4   1/1       Running   0          5m        172.17.0.4   vm01
nginx-vm01               1/1       Running   0          1h        172.17.0.2   vm01
[root@vm01 ~]#

The correct way in producing pods is via deployment because that will provide
the self healing mechanism of kubernetes. We just created a pod manually for
demonstration purposes. If we delete the "nginx-vm01" pod, it will not be
recovered whereas deleting the "nginx-XXXX" pods will be recreated.

11. Start "Kubernetes Proxy". This enables us to create a "Kubernetes Service"
which will allow our pods to be accesible outside the cluster. Once the proxy
is started, let's create a simple service from a yaml file.

[root@vm01 ~]# kube-proxy --master=http://localhost:8080 &> /tmp/kube-proxy.log &
[root@vm01 ~]# cat << EOF > nginx-svc.yml
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    run: nginx
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    nodePort: 30073
  selector:
    run: nginx
EOF
[root@vm01 ~]# kubectl create -f nginx-svc.yml
service "nginx" created
[root@vm01 ~]#

We can now see a the nginx service mapping port 80 (from the container) to port
30073 (to the node).

[root@vm01 ~]# kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.0.0.1       <none>        443/TCP        1h
nginx        NodePort    10.0.207.211   <none>        80:30073/TCP   2s
[root@vm01 ~]#

We should be able to access the pod outside the cluster now.



Notice that we didn't create systemd unit files for the services for simplicity.
On future posts, I will include systemd unit files so we can run the services
on boot.

So that wraps up our very simple setup. On future posts, we will see how does
networking in kubernetes comes into play when multiple nodes are talking to
each other. That needs a type of SDN (Software Defined Network) like CNI,
Flannel, or Weaver.

Monday, March 29, 2021

Multi-node K8 cluster (Fedora server)

In my previous post, we setup a single node cluster from scratch. Let's try
now to make a real cluster having 3 nodes (1 master, 2 minions).

Let's try using Fedora 27 server instead of Centos to make the setup a more
different than usual and let's try to make our life more easier by installing
the packages from DNF repo instead of installing from tarballs.

Here is a summary of version we will use in this tutorial:

Host OS: Ubuntu 17.04 (Zesty)
 |_ Virtualization: VirtualBox 5.2.4 r119785 (Qt5.7.1)
      |_ Virtual Machine OS: Fedora Server 27
           |_ Kubernetes: 1.7.3
           |_ ETCD: 3.2.7
           |_ Flannel: 0.7.0
           |_ Docker: 1.13.1

Preparation
===========

1. Make sure all nodes can ping and resolved each other's hostnames.

2. Internet connectivity is required to download the packages.

3. If you have firewall enabled, turn it off on all nodes. If left opened, this
might cause issues. For example, it will prevent flannel from routing all the
packets properly. In short, pods will not be pingable from other pods.
systemctl disable --now firewalld
systemctl mask firewalld

Setup the master
================

1. Install ETCD. This will store all information about your cluster.
[root@master ~]# dnf install -y etcd
[...]
Installed:
  etcd.x86_64 3.2.7-1.fc27

Complete!
[root@master ~]#

2. Update etcd configuration.
[root@master ~]# cat << EOF > /etc/etcd/etcd.conf
> ETCD_NAME=default
> ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
> ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
> ETCD_ADVERTISE_CLIENT_URLS="http://master:2379"
> EOF
[root@master ~]#

3. Start and enable ETCD.
[root@master ~]# systemctl enable --now etcdctld
Created symlink /etc/systemd/system/multi-user.target.wants/etcd.service → /usr/lib/systemd/system/etcd.service.
[root@master ~]#

4. Verify etcd is running healthy before proceeding on the next step.
[root@master ~]# etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from http://master:2379
cluster is healthy
[root@master ~]#

5. Install kubernetes package. It enables us to configure kube-apiserver,
kube-scheduler, and kube-controller-manager.
[root@master ~]# dnf install -y kubernetes
[...]

Installed:
  kubernetes.x86_64 1.7.3-1.fc27                                         criu.x86_64 3.6-1.fc27                                     oci-register-machine.x86_64 0-5.12.git3c01f0b.fc27                   
  oci-systemd-hook.x86_64 1:0.1.15-1.git2d0b8a3.fc27                     atomic-registries.x86_64 1.20.1-9.fc27                     audit-libs-python3.x86_64 2.7.8-1.fc27                               
  checkpolicy.x86_64 2.7-2.fc27                                          conntrack-tools.x86_64 1.4.4-5.fc27                        container-selinux.noarch 2:2.42-1.fc27                               
  container-storage-setup.noarch 0.8.0-2.git1d27ecf.fc27                 docker.x86_64 2:1.13.1-44.git584d391.fc27                  docker-common.x86_64 2:1.13.1-44.git584d391.fc27                     
  docker-rhel-push-plugin.x86_64 2:1.13.1-44.git584d391.fc27             kubernetes-client.x86_64 1.7.3-1.fc27                      kubernetes-master.x86_64 1.7.3-1.fc27                                
  kubernetes-node.x86_64 1.7.3-1.fc27                                    libcgroup.x86_64 0.41-13.fc27                              libnet.x86_64 1.1.6-14.fc27                                          
  libnetfilter_cthelper.x86_64 1.0.0-12.fc27                             libnetfilter_cttimeout.x86_64 1.0.0-10.fc27                libnetfilter_queue.x86_64 1.0.2-10.fc27                              
  libsemanage-python3.x86_64 2.7-1.fc27                                  libyaml.x86_64 0.1.7-4.fc27                                oci-umount.x86_64 2:2.3.2-1.git3025b19.fc27                          
  policycoreutils-python-utils.x86_64 2.7-1.fc27                         policycoreutils-python3.x86_64 2.7-1.fc27                  protobuf-c.x86_64 1.2.1-7.fc27                                       
  python3-PyYAML.x86_64 3.12-5.fc27                                      python3-pytoml.noarch 0.1.14-2.git7dea353.fc27             setools-python3.x86_64 4.1.1-3.fc27                                  
  skopeo-containers.x86_64 0.1.27-1.git93876ac.fc27                      socat.x86_64 1.7.3.2-4.fc27                                subscription-manager-rhsm-certificates.x86_64 1.21.1-1.fc27          
  systemd-container.x86_64 234-8.fc27                                    yajl.x86_64 2.1.0-8.fc27                                

Complete!
[root@master ~]#

6. Update the kubernetes system config /etc/kubernetes/config to point to
master. This must be the same on all nodes so we will do this to the minions
later
KUBE_MASTER="--master=http://master:8080"

7. Configure the API server. This should be done on master only.
[root@master ~]# cat << EOF > /etc/kubernetes/apiserver
> KUBE_API_ADDRESS="--address=0.0.0.0"
> KUBE_ETCD_SERVERS="--etcd-servers=http://127.0.0.1:2379"
> KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"
> KUBE_API_ARGS=""
> EOF
[root@master ~]#

8. Start and enable the 3 kubernetes services.
[root@master ~]# systemctl enable --now kube-apiserver
Created symlink /etc/systemd/system/multi-user.target.wants/kube-apiserver.service → /usr/lib/systemd/system/kube-apiserver.service.
[root@master ~]# systemctl enable --now kube-scheduler
Created symlink /etc/systemd/system/multi-user.target.wants/kube-scheduler.service → /usr/lib/systemd/system/kube-scheduler.service.
[root@master ~]# systemctl enable --now kube-controller-manager
Created symlink /etc/systemd/system/multi-user.target.wants/kube-controller-manager.service → /usr/lib/systemd/system/kube-controller-manager.service.
[root@master ~]#

9. Verify that the API server is accessible by doing a curl on the endpoint
from all nodes. You must have similar output below. Let's try executing it from
one of the minions.
[root@node1 ~]# curl http://master:8080/api
{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "192.168.1.45:6443"
    }
  ]
}[root@node1 ~]#

10. Create a flannel network by generating it from a json file and storing it in
ETCD.
[root@master ~]# cat flannel-config.json
{
    "Network": "18.16.0.0/16",
    "SubnetLen": 24,
    "Backend": {
        "Type": "vxlan",
        "VNI": 1
     }
}
[root@master ~]#
[root@master ~]# etcdctl set /coreos.com/network/config < flannel-config.json
{
    "Network": "18.16.0.0/16",
    "SubnetLen": 24,
    "Backend": {
        "Type": "vxlan",
        "VNI": 1
     }
}

[root@master ~]#

11. Verify that the key exists in ETCD.
[root@master ~]# etcdctl get /coreos.com/network/config
{
    "Network": "18.16.0.0/16",
    "SubnetLen": 24,
    "Backend": {
        "Type": "vxlan",
        "VNI": 1
     }
}

[root@master ~]#

Setup the minions
=================

All the steps below needs to be done on all minions (node1 and node2).

1. Install kubernetes package. It enables us to configure kube-proxy, kubelet,
and docker on the minions.
[root@node1 ~]# dnf install -y kubernetes
[...]

Installed:
  kubernetes.x86_64 1.7.3-1.fc27                                         criu.x86_64 3.6-1.fc27                                     oci-register-machine.x86_64 0-5.12.git3c01f0b.fc27                   
  oci-systemd-hook.x86_64 1:0.1.15-1.git2d0b8a3.fc27                     atomic-registries.x86_64 1.20.1-9.fc27                     audit-libs-python3.x86_64 2.7.8-1.fc27                               
  checkpolicy.x86_64 2.7-2.fc27                                          conntrack-tools.x86_64 1.4.4-5.fc27                        container-selinux.noarch 2:2.42-1.fc27                               
  container-storage-setup.noarch 0.8.0-2.git1d27ecf.fc27                 docker.x86_64 2:1.13.1-44.git584d391.fc27                  docker-common.x86_64 2:1.13.1-44.git584d391.fc27                     
  docker-rhel-push-plugin.x86_64 2:1.13.1-44.git584d391.fc27             kubernetes-client.x86_64 1.7.3-1.fc27                      kubernetes-master.x86_64 1.7.3-1.fc27                                
  kubernetes-node.x86_64 1.7.3-1.fc27                                    libcgroup.x86_64 0.41-13.fc27                              libnet.x86_64 1.1.6-14.fc27                                          
  libnetfilter_cthelper.x86_64 1.0.0-12.fc27                             libnetfilter_cttimeout.x86_64 1.0.0-10.fc27                libnetfilter_queue.x86_64 1.0.2-10.fc27                              
  libsemanage-python3.x86_64 2.7-1.fc27                                  libyaml.x86_64 0.1.7-4.fc27                                oci-umount.x86_64 2:2.3.2-1.git3025b19.fc27                          
  policycoreutils-python-utils.x86_64 2.7-1.fc27                         policycoreutils-python3.x86_64 2.7-1.fc27                  protobuf-c.x86_64 1.2.1-7.fc27                                       
  python3-PyYAML.x86_64 3.12-5.fc27                                      python3-pytoml.noarch 0.1.14-2.git7dea353.fc27             setools-python3.x86_64 4.1.1-3.fc27                                  
  skopeo-containers.x86_64 0.1.27-1.git93876ac.fc27                      socat.x86_64 1.7.3.2-4.fc27                                subscription-manager-rhsm-certificates.x86_64 1.21.1-1.fc27          
  systemd-container.x86_64 234-8.fc27                                    yajl.x86_64 2.1.0-8.fc27                                

Complete!
[root@node1 ~]#

2. Update /etc/kubernetes/config to point to master.
KUBE_MASTER="--master=http://master:8080"

3. Update the kubelet config so we can register the minion. Make sure to change
the --hostname-override correctly for each minion.
[root@node1 ~]# cat << EOF >> /etc/kubernetes/kubelet                                                                                                                                                     
> KUBELET_ADDRESS="--address=0.0.0.0"                                                                                                                                  
> KUBELET_HOSTNAME="--hostname-override=node1"                                                                                                                                                            
> KUBELET_API_SERVER="--api-servers=http://master:8080"                                                                                                                                                   
> EOF                                                                                                                                                                                                     
[root@node1 ~]#

4. Start and enable kubelet service. This register our minions.
[root@node1 ~]# systemctl enable --now kubelet                                                                                                                                                            
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /usr/lib/systemd/system/kubelet.service.                                                                                    
[root@node1 ~]#
[root@node1 ~]# systemctl status kubelet                                                                                                                                                                  
● kubelet.service - Kubernetes Kubelet Server                                                                                                                                                             
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)                                                                                          
   Active: active (running) since Tue 2018-02-06 20:48:18 +08; 3s ago                                                     
     Docs: https://github.com/GoogleCloudPlatform/kubernetes                                              
 Main PID: 3696 (kubelet)                                                                                                                                                              
    Tasks: 12 (limit: 4915)                                                                                                                                   
   Memory: 39.4M                                                                                                             
      CPU: 1.127s                                                                                                                                                                      
   CGroup: /system.slice/kubelet.service                                                                                                                      
           ├─3580 journalctl -k -f                                                                                                                             
           ├─3696 /usr/bin/kubelet --logtostderr=true --v=0 --api-servers=http://master:8080 --address=0.0.0.0 --hostname-override=node1 --allow-privileged=false --cgroup-driver=systemd
           └─3765 journalctl -k -f
[...]
Feb 06 20:48:19 node1 kubelet[3696]: I0206 20:48:19.587455    3696 kubelet_node_status.go:85] Successfully registered node node1
[...]
[root@node1 ~]#

Once all minions were registered, they will now appear in "kubectl get nodes"
from the master.

Starting kubelet will start docker but will not enable it. Let's do it manually.
[root@node1 ~]# systemctl enable docker
Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /usr/lib/systemd/system/docker.service.
[root@node1 ~]#

Also, docker 1.13 is known to have issues with flannel 0.7.0. By default, this
version of docker sets the iptables FORWARD policy to DROP. That will prevent
inter-pod communications. So we need to manually set it to ACCEPT by using this
command on the nodes.
[root@node1 ~]# iptables -P FORWARD ACCEPT

5. Install flannel to allow inter-pod communication.
[root@node1 ~]# dnf install -y flannel
[...]

Installed:
  flannel.x86_64 0.7.0-5.fc27

Complete!
[root@node1 ~]#

6. Configure flannel by pointing it to ETCD on the master. That is were it will
get all the network information.
[root@node1 ~]# cat << EOF > /etc/sysconfig/flanneld
> FLANNEL_ETCD_ENDPOINTS="http://master:2379"
> FLANNEL_ETCD_PREFIX="/coreos.com/network"
> EOF
[root@node1 ~]#

7. Start and enable flannel.
[root@node1 ~]# systemctl enable --now flanneld
Created symlink /etc/systemd/system/multi-user.target.wants/flanneld.service → /usr/lib/systemd/system/flanneld.service.
Created symlink /etc/systemd/system/docker.service.requires/flanneld.service → /usr/lib/systemd/system/flanneld.service.
[root@node1 ~]#

Let's test the cluster!
=======================

So now that we are done setting up master and minions. Let's try creating a
simple nginx deployment with 2 replicas. Execute this on the master.
[root@master ~]# kubectl run nginx --image=nginx --replicas=2
deployment "nginx" created
[root@master ~]#

Wait for few minutes and our nginx pods should be up and running.
[root@master ~]# kubectl get pods -o wide
NAME                    READY     STATUS    RESTARTS   AGE       IP           NODE
nginx-935182578-hdvqv   1/1       Running   0          58s       172.17.0.2   node1
nginx-935182578-zd8cd   1/1       Running   0          58s       172.17.0.2   node2
[root@master ~]#

If for some reasons the pods are not creating and you see the following message
from "journalctl" on the minions:
 unable to pull sandbox image \"gcr.io/google_containers/pause-amd64:3.0\

Try to do a "docker pull gcr.io/google_containers/pause-amd64:3.0" manually on
each minion. That should do the trick and your pods will now be created.

Notice the pod IP addresses. You should be able to ping that from any minions.
[root@node1 ~]# ping -c 1 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.115 ms

--- 172.17.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.115/0.115/0.115/0.000 ms
[root@node1 ~]#
[root@node2 ~]# ping -c 1 172.17.0.2
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=0.115 ms

--- 172.17.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.115/0.115/0.115/0.000 ms
[root@node2 ~]#

We were able to ping the IPs because of flannel. Without it, the pods can only
be reached on the minion were it is located.

Let's finalize our testing by exposing our application outside the cluster via a service.
[root@master ~]# kubectl expose deployment nginx --port=80 --type=NodePort
service "nginx" exposed
[root@master ~]# kubectl get services

NAME         CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes   10.254.0.1     <none>        443/TCP        3d
nginx        10.254.183.3   <nodes>       80:31046/TCP   5s
[root@master ~]#

And here it is! A static web page that inside kubernetes.


Sunday, March 28, 2021

Pacemaker in Centos 7 (no fencing)

Lab Specifications
==================

Host OS: Ubuntu 17.10 (artful)
 |_ Virtualization: VirtualBox 5.1.34_Ubuntu r121010 (Qt5.9.1)
      |_ Virtual Machine OS: CentOS Linux release 7.4.1708 (Core)

Setup
=====

1. Install packages (perform on all nodes)
[root@node1 ~]# yum install -y pcs pacemaker resource-agents
[...]
Complete
[root@node1 ~]#

2. Enable pcs service (perform on all nodes)
[root@node1 ~]# systemctl enable --now pcsd
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@node1 ~]#

3. Configure firewall (perform on all nodes)
[root@node1 ~]# firewall-cmd --add-service=high-availability --permanent
success
[root@node1 ~]# firewall-cmd --reload
success
[root@node1 ~]#

4. Change "hacluster" password (perform on all nodes)
[root@node1 ~]# echo samplepass123 | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
[root@node1 ~]#

5. Setup cluster authentication (perform on node1 only)
[root@node1 ~]# pcs cluster auth node1 node2 node3 -u hacluster -p samplepass123 --force
node1: Authorized
node3: Authorized
node2: Authorized
[root@node1 ~]#

6. Create cluster an populate with nodes (perform on node1 only)
[root@node1 ~]# pcs cluster setup --force --name mycluster node1 node2 node3                                                                                                                                                               
Destroying cluster on nodes: node1, node2, node3...     
node1: Stopping Cluster (pacemaker)...                   
node3: Stopping Cluster (pacemaker)...                   
node2: Stopping Cluster (pacemaker)...                   
node1: Successfully destroyed cluster                   
node3: Successfully destroyed cluster                   
node2: Successfully destroyed cluster                   

Sending 'pacemaker_remote authkey' to 'node1', 'node2', 'node3'                                                     
node1: successful distribution of the file 'pacemaker_remote authkey'                                               
node2: successful distribution of the file 'pacemaker_remote authkey'                                               
node3: successful distribution of the file 'pacemaker_remote authkey'                                               
Sending cluster config files to the nodes...             
node1: Succeeded                                         
node2: Succeeded                                         
node3: Succeeded                                         

Synchronizing pcsd certificates on nodes node1, node2, node3...                                                     
node1: Success                                           
node3: Success                                           
node2: Success                                           
Restarting pcsd on the nodes in order to reload the certificates...                                                                                                       
node1: Success                     
node3: Success                                           
node2: Success                                           
[root@node1 ~]#

7. Start cluster (perform on node1 only)
[root@node1 ~]# pcs cluster start --all
node2: Starting Cluster...
node3: Starting Cluster...
node1: Starting Cluster...
[root@node1 ~]#

8. Disable fencing (perform on node1 only)
[root@node1 ~]# pcs property set stonith-enabled=false
[root@node1 ~]#

9. For demo only, force sevices to move to another node after single failure (perform on node1 only)
[root@node1 ~]# pcs resource defaults migration-threshold=1
[root@node1 ~]#

10. Add a resource (perform on node1 only)
[root@node1 ~]# pcs resource create sample_service ocf:heartbeat:Dummy op monitor interval=120s
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:02:12 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1

3 nodes configured
1 resource configured

Online: [ node1 node2 node3 ]

Full list of resources:

 sample_service (ocf::heartbeat:Dummy): Started node1

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@node1 ~]#

11. Simulate a single failure (perform on node1 only)
[root@node1 ~]# crm_resource --resource sample_service --force-stop                                                 
Operation stop for sample_service (ocf:heartbeat:Dummy) returned 0
 >  stderr: DEBUG: sample_service stop : 0               
[root@node1 ~]#
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Sun Apr 22 09:06:32 2018
Last change: Sun Apr 22 09:01:43 2018 by root via cibadmin on node1

3 nodes configured
1 resource configured

Online: [ node1 node2 node3 ]

Full list of resources:

 sample_service (ocf::heartbeat:Dummy): Started node2

Failed Actions:
* sample_service_monitor_120000 on node1 'not running' (7): call=7, status=complete, exitreason='none',
    last-rc-change='Sun Apr 22 09:05:44 2018', queued=0ms, exec=0ms


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@node1 ~]#

Notice that "sample_service" was started on node2 after it fail on node1. This
is a simple simulation on how the high-availability works on pacemaker.

Saturday, March 27, 2021

Multi-node K8 Cluster from Scratch (Centos)

In this post, we will setup a multi-node kubernetes cluster from scratch.
This means that we will minimize the use of RPMs to install our software so that
we can understand the minimum required to run the cluster and to know what's
happening under the hood.

Here the summary of versions we will use in this post.

Host OS: Ubuntu 17.04 (Zesty)
 |_ Virtualization: VirtualBox 5.2.4 r119785 (Qt5.7.1)
      |_ Virtual Machine OS: CentOS Linux release 7.3.1611 (Core)
           |_ Kubernetes: 1.9.2
           |_ ETCD: 3.2.6
           |_ Docker: 1.12.6

Requirements
============

1. Disable swap on master and nodes. Kubernetes doesn't want it for performance
reasons.
[root@master ~]# swapoff -a
[root@master ~]# cp /etc/fstab /etc/fstab.orig
[root@master ~]# sed -i 's/.*swap.*//g' /etc/fstab

2. Make sure master and nodes can ping and resolve each other's hostnames.

3. Disable firewall on master and nodes. Without this, pod IPs will not be
reachable on other nodes even if after setting up flanneld.
[root@master ~]# systemctl disable --now firewalld
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
[root@master ~]# systemctl mask firewalld
Created symlink from /etc/systemd/system/firewalld.service to /dev/null.
[root@master ~]#

Setup master
============

1. Download the kubernetes and etcd tarballs.
[root@master ~]# wget https://dl.k8s.io/v1.9.2/kubernetes-server-linux-amd64.tar.gz
[...]
[root@master ~]# wget https://github.com/coreos/etcd/releases/download/v3.2.6/etcd-v3.2.6-linux-amd64.tar.gz
[...]
[root@master ~]#

2. Setup ETCD - this will store all information about our cluster.
[root@master ~]# tar xvf etcd-v3.2.6-linux-amd64.tar.gz
[...]
[root@master ~]# cp etcd-v3.2.6-linux-amd64/etcd* /usr/local/bin/
[root@master ~]# # create the systemd file below
[root@master ~]# cat /etc/systemd/system/etcd.service
[Unit]
Description=ETCD server
After=network.target

[Service]
Type=notify
ExecStart=/usr/local/bin/etcd \
  --data-dir /var/lib/etcd \
  --listen-client-urls http://0.0.0.0:2379 \
  --advertise-client-urls http://master:2379

[Install]
WantedBy=multi-user.target
[root@master ~]# systemctl daemon-reload
[root@master ~]# systemctl enable --now etcd
Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /etc/systemd/system/etcd.service.
[root@master ~]# etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from http://master:2379
cluster is healthy
[root@master ~]#

3. Setup apiserver - we will use this to communicate to ETCD and the rest of
the cluster.
[root@master ~]# tar xvf kubernetes-server-linux-amd64.tar.gz
[...]
[root@master ~]# cp kubernetes/server/bin/* /usr/local/bin/
[root@master ~]# # create the systemd file below
[root@master ~]# cat /etc/systemd/system/kube-apiserver.service
[Unit]
Description=Kube API Server
After=network.target

[Service]
Type=notify
ExecStart=/usr/local/bin/kube-apiserver --etcd-servers=http://localhost:2379 \
  --service-cluster-ip-range=10.0.0.0/16 \
  --bind-address=0.0.0.0 \
  --insecure-bind-address=0.0.0.0

[Install]
WantedBy=multi-user.target
[root@master ~]#
[root@master ~]# systemctl daemon-reload
[root@master ~]# systemctl enable --now kube-apiserver
[root@master ~]#                                                                                     

Verify that the api server is working. You must be able to reach it from any
node.
[root@master ~]# curl http://master:8080/api
{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "192.168.1.111:6443"
    }
  ]
}[root@master ~]#

4. Setup scheduler - this will take care of assigning pods to nodes.
[root@master ~]# cat /etc/systemd/system/kube-scheduler.service
[Unit]
Description=Kube Scheduler
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/kube-scheduler --master=http://localhost:8080

[Install]
WantedBy=multi-user.target
[root@master ~]#
[root@master ~]# systemctl enable --now kube-scheduler
Created symlink from /etc/systemd/system/multi-user.target.wants/kube-scheduler.service to /etc/systemd/system/kube-scheduler.service.
[root@master ~]#

5. Setup controller-manager - this will allow us to create deployments.
[root@master ~]# # create the systemd file below
[root@master ~]# cat /etc/systemd/system/kube-controller-manager.service
[Unit]
Description=Kube Controller Manager
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/kube-controller-manager --master=http://localhost:8080

[Install]
WantedBy=multi-user.target
[root@master ~]# systemctl daemon-reload
[root@master ~]# systemctl enable --now kube-controller-manager
Created symlink from /etc/systemd/system/multi-user.target.wants/kube-controller-manager.service to /etc/systemd/system/kube-controller-manager.service.
[root@master ~]#

6. Create network settings for flannel.
[root@master ~]# etcdctl mkdir /centos/network
[root@master ~]# etcdctl mk /centos/network/config "{ \"Network\": \"172.30.0.0/16\", \"SubnetLen\": 24, \"Backend\": { \"Type\": \"vxlan\" } }"
{ "Network": "172.30.0.0/16", "SubnetLen": 24, "Backend": { "Type": "vxlan" } }
[root@master ~]# 

Setup the nodes
===============

Steps below must be executed on all nodes unless specified otherwise.

1. Download and unpack kubernetes tarball.
[root@node1 ~]# wget https://dl.k8s.io/v1.9.2/kubernetes-server-linux-amd64.tar.gz
[...]
[root@node1 ~]#
[root@node1 ~]# tar xvf kubernetes-server-linux-amd64.tar.gz
kubernetes/
kubernetes/LICENSES
[...]
kubernetes/kubernetes-src.tar.gz
[root@node1 ~]#
[root@node1 ~]# cp -v kubernetes/server/bin/* /usr/local/bin/
‘kubernetes/server/bin/apiextensions-apiserver’ -> ‘/usr/local/bin/apiextensions-apiserver’
[...]
[root@node1 ~]#

2. Install and enable docker but don't start it. Flanneld will take care of it
later.
[root@node1 ~]# yum install -y docker
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.pregi.net
 * extras: mirror.pregi.net
 * updates: mirror.pregi.net
Resolving Dependencies
--> Running transaction check
---> Package docker.x86_64 2:1.12.6-68.gitec8512b.el7.centos will be installed
[...]
Complete!
[root@node1 ~]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@node1 ~]#

3. Setup kubelet - this is responsible for joining the nodes to master and
communicates to ETCD via API server.
[root@node1 ~]# mkdir -p /etc/kubernetes/manifests
[root@node1 ~]# mkdir -p /var/lib/kubelet
[root@node1 ~]# cat << EOF > /etc/kubernetes/kubeconfig
apiVersion: v1
kind: Config
clusters:
- name: centos
  cluster:
    server: http://master:8080
users:
- name: kubelet
contexts:
- context:
    cluster: centos
    user: kubelet
  name: kubelet-context
current-context: kubelet-context
EOF
[root@node1 ~]# # create the systemd file below
[root@node1 ~]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubelet
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/kubelet --kubeconfig /etc/kubernetes/kubeconfig \
  --require-kubeconfig \
  --pod-manifest-path /etc/kubernetes/manifests \
  --cgroup-driver=systemd \
  --kubelet-cgroups=/systemd/system.slice \
  --runtime-cgroups=/etc/systemd/system.slice

[Install]
WantedBy=multi-user.target
[root@node1 ~]# systemctl daemon-reload                                                                                                                                                                 
[root@node1 ~]# systemctl enable --now kubelet                                                                                                                                                           
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /etc/systemd/system/kubelet.service.                                                                             
[root@node1 ~]#

Verify that the nodes were successully registered by going to the master and get
the node list. All nodes must appear.
[root@master ~]# kubectl get nodes -o wide
NAME      STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
node1     Ready     <none>    2m        v1.9.2    <none>        CentOS Linux 7 (Core)   3.10.0-514.el7.x86_64   docker://1.12.6
node2     Ready     <none>    2m        v1.9.2    <none>        CentOS Linux 7 (Core)   3.10.0-514.el7.x86_64   docker://1.12.6
[root@master ~]#

4. Setup up kube-proxy - this allows us to expose our pods outside the cluster
via a service.
[root@node1 ~]# # create the systemd file below
[root@node1 ~]# cat /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kube proxy
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/kube-proxy --master=http://master:8080

[Install]
WantedBy=multi-user.target
[root@node1 ~]# vi /etc/systemd/system/kube-proxy.service
[root@node1 ~]# systemctl daemon-reload
[root@node1 ~]# systemctl enable --now kube-proxy
Created symlink from /etc/systemd/system/multi-user.target.wants/kube-proxy.service to /etc/systemd/system/kube-proxy.service.
[root@node1 ~]#

5. Install flannel - this will allow inter-prod communication between the nodes.
[root@node1 ~]# yum install -y flannel
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.pregi.net
 * extras: mirror.pregi.net
 * updates: mirror.pregi.net
Resolving Dependencies
--> Running transaction check
---> Package flannel.x86_64 0:0.7.1-2.el7 will be installed
[...]
Complete!
[root@node1 ~]#
[root@node1 ~]# cp /etc/sysconfig/flanneld /etc/sysconfig/flanneld.orig
[root@node1 ~]# cat << EOF > /etc/sysconfig/flanneld
FLANNEL_ETCD_ENDPOINTS="http://master:2379"
FLANNEL_ETCD_PREFIX="/centos/network"
EOF
[root@node1 ~]#
[root@node1 ~]# systemctl enable --now flanneld
Created symlink from /etc/systemd/system/multi-user.target.wants/flanneld.service to /usr/lib/systemd/system/flanneld.service.
Created symlink from /etc/systemd/system/docker.service.requires/flanneld.service to /usr/lib/systemd/system/flanneld.service.
[root@node1 ~]#

6. Manually pull pause image from gcr.io. For some reasons, deployments can't
run without doing this manual step.
[root@node1 ~]# docker pull gcr.io/google_containers/pause-amd64:3.0
Trying to pull repository gcr.io/google_containers/pause-amd64 ...
3.0: Pulling from gcr.io/google_containers/pause-amd64
a3ed95caeb02: Pull complete
f11233434377: Pull complete
Digest: sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516
[root@node1 ~]#

Let's test our cluster
======================

1. Let's check if our nodes are ready.
[root@master ~]# kubectl get nodes -o wide
NAME      STATUS    ROLES     AGE       VERSION   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
node1     Ready     <none>    1h        v1.9.2    <none>        CentOS Linux 7 (Core)   3.10.0-514.el7.x86_64   docker://1.12.6
node2     Ready     <none>    1h        v1.9.2    <none>        CentOS Linux 7 (Core)   3.10.0-514.el7.x86_64   docker://1.12.6
[root@master ~]#

2. Create a simple nginx deployment. Containers must be created and pods must
have IPs assigned. The pod IPs must be reachable from any node. If you want it
to be reachable also from the master, install flanneld.
[root@master ~]# kubectl run nginx --image=nginx --replicas=2
deployment "nginx" created
[root@master ~]#
[root@master ~]# kubectl get pods -o wide
NAME                   READY     STATUS    RESTARTS   AGE       IP            NODE
nginx-8586cf59-gknr7   1/1       Running   0          46m       172.30.2.2    node2
nginx-8586cf59-pzjh4   1/1       Running   0          46m       172.30.19.2   node1
[root@master ~]#

3. Expose the deployment via NodePort service. You should be able to access the
static web page on any node.
[root@master ~]# kubectl expose deploy/nginx --port=80 --type=NodePort
service "nginx" exposed
[root@master ~]#
[root@master ~]# kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP        2h
nginx        NodePort    10.0.174.8   <none>        80:32122/TCP   5s
[root@master ~]#
[root@master ~]# curl http://node2:32122
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@master ~]#

With all these manual steps, I prepared an automated way to provision this
using ansible. You may visit my playbook hosted in gitlab.