Saturday, December 29, 2018

Linux Troubleshooting


Diagnosing Memory Issues
------------------------

What is memory leak?

In computer science, a memory leak is a type of resource leak that occurs when a
computer program incorrectly manages memory allocations in such a way that
memory which is no longer needed is not released. A memory leak may also happen
when an object is stored in memory but cannot be accessed by the running code.

Finding top consumers
ps --sort -rss -eo rss,pid,command | head

User Processes
--------------

Process vs Threads:

Process is an executing program and don't share memory spaces while threads are
contained inside a process which shares memory space.

Threads are also called LWP in linux. They also consume kernel.pid_max.

/proc/sys/kernel/pid_max
- number of PIDs that can be created
- assigns sequentially, when limit is reached, counter wraps back to the beginning
- if no available PID can be used, no more processes can be created


Determining current pid_max value
# using sar
see "plist-sz" of "sar -q" command

# using ps (L is to see LWP or light-weight processes in multithreading systems)
ps -eL | wc -l
ps -eT | wc -l
Determining number of user processes
# via ps
ps h -Led -o user | sort | uniq -c | sort -n
ps h -Lu root | wc -l


CPU Loads
---------

Checking under "sar -q"
- high "runq-sz" means there's a lot of processes waiting in line

Thursday, December 27, 2018

Multipath


Basics
------

- configuration is stored in /etc/multipath.conf

Things to note in modifying multipath.conf
------------------------------------------

1. Make sure letters in WWIDs are in lower case. If not, multipath will blacklist them.
2. Make sure WWIDs don't have more than 1 size. If there is, delete the other device.
   It will reject the the WWID if it sees conflict like this:
   reject: ibmdata1 (3600507680c810200e000000000000027) undef IBM,2145

Parts of the "multipath -ll" output
-----------------------------------

data3 (360000970000198701142533030413536) dm-4 EMC,SYMMETRIX --> data3 is the multipath device
size=512G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 2:0:1:4 sdj 8:144  active ready running --> path #1
  |- 2:0:0:4 sde 8:64   active ready running --> path #2
  |- 1:0:0:4 sdo 8:224  active ready running --> path #3
  `- 1:0:1:4 sdt 65:48  active ready running --> path #4

Multipath in Virtual Machines
-----------------------------

In virtual machines, even if direct disk addressing (in VmWare terms raw device
mapping) is used, the underling virtualisation hypervisor handles multipathing
and masks it from the client virtual machine. Thus virtual machine itself
doesn't need to do anything about it.

Tutorials
---------

Determining LUN ID
Inspect "multipath -ll" output. Last number (in X:X:X:X) is the LUN ID.
That LUN ID is the one you will see on the storage array.
 
U01 (3624a9370b15fcb83b6a947a00001d5e7) dm-2 PURE    ,FlashArray     
size=150G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 2:0:0:2 sdk 8:160 active ready running
  |- 2:0:1:2 sdo 8:224 active ready running
  |- 1:0:0:2 sdc 8:32  active ready running

Troubleshooting Docker Issues



Corrupted DB
Issue:

The following message appear when starting docker:
updating the store state of sandbox failed: failed to update store for object type *libnetwork.sbState: json: cannot unmarshal string into Go struct field sbState.ExtDNS of type libnetwork.extDNSEntry

Solution:
 
systemctl stop docker
mv /var/lib/docker/network/files/local-kv.db /root/corrupted-local-kv.db
systemctl start docker

Source:
Can't start docker
Resolution:
Try "journalctl -u docker.service" and see what's happening

Mar 16 13:46:53 gdc-co-ragent01 dockerd[1403]: can't create unix socket /var/run/docker.sock: is a directory
Mar 16 13:46:53 gdc-co-ragent01 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Mar 16 13:46:53 gdc-co-ragent01 systemd[1]: Failed to start Docker Application Container Engine. 

Resolution:
rm -fr /var/run/docker.sock
systemctl start docker
Phantom containers causes
docker unable to run containers
with same name
8a81cf7cb9f3fd03e0139743d3616eb0e/kill returned error: Cannot kill container 78d3e2cc93abc053238e0edd5765f428a81cf7cb9f3fd03e0139743d3616eb0e: Container 78d3e2cc93abc053238e0edd5765f428a81cf7cb9f3fd03e0139743d3616eb0e is not running"

Resolution:
1. check those phantom containers: doker ps -a
2. remove them: docker rm -f
containers cannot ping outside IPs
Things you might want to check:
1. Make sure /proc/sys/net/ipv4/ip_forward is set to 1
No more space left on thin pool
Resolution:
Extend the thinpool:
  lvextend -L +10G /dev/mapper/base-thinpool
Error on docker-compose
Running docker-compose as a container with the following error:

Couldn't find `docker` binary. You might need to install Docker

Resolution:
Create .env on the directory where you are running docker-compose and put the following:
COMPOSE_INTERACTIVE_NO_CLI=1
Cannot login to docker registry due to missing port
Issue:

The following message appear when you login to docker registry.

Error response from daemon: Get https://gitlab.xyz.com:4567/v2/: Get https://gitlab.xyz.com/jwt/auth?account=root&client_id=docker&offline_token=true&service=container_registry: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (Client.Timeout exceeded while awaiting headers)

Resolution:
Make sure docker port and 443/tcp is allowed from source to destination

https://medium.com/@dan.lindow/docker-login-error-awaiting-headers-3fe01e2a1e2f
Resource Temporary unavailable due
to TaskMax setting
- try adding "TasksMax=infinity" in docker.service
- restart docker
- monitor for occurence of issue

https://success.docker.com/article/how-to-reserve-resource-temporarily-unavailable-errors-due-to-tasksmax-setting
https://github.com/chef-cookbooks/docker/issues/871

Wednesday, December 26, 2018

Linux Kernel


Runlevels
---------

0 -- shutdown/halt the system
1 -- single-user mode; usually aliased as s or S
2 -- multiuser mode w/ networking
3 -- multiuser mode w/o networking
4 -- unused
5 -- multiuser mode w/ networking and X Window System
6 -- reboot the system

Kernel Headers
--------------

  - development libraries
  - not installed by default
  - needed to compile other kernel version
  - package name: kernel-headers

Modules
-------

Checks if a module is builtin in the kernel
grep /lib/modules/$(uname -r)/modules.builtin

Crash Dumps
-----------

kexec uses a second kernel to capture the 1st kernels' memory during crashes.

/var/crash - default location of dump files (vmcore)

Tutorials
---------

Booting a RHEL VM from rescue mode
1. power off VM
2. edit settings and find to boot it to BIOS on next startup
3. power on VM
4. when you see this prompt --> "boot: ", type "linux rescue"
5. once the rescue environment finishes booting, choose a language to use
6. choose a keyboard layout to use
7. wait for network interfaces to be located, and activate them, so that
   requested data can be transferred to another host (sometimes this doesn't
   work, or try to restart NIC on Vsphere)
8. the rescue environment will try to find the current Red Hat Enterprise Linux
   installation on the system., select "continue"


RKE (Rancher Kubernetes Engine)


Introduction
------------

- short for Rancher Kubernetes Engine
- lightweight installer of K8 on bare-metal and virtual machines
- solves common issue on K8 installation -- complexity

Commands
--------

rke -d up --ignore-docker-version --config rancher-cluster.yml
rke etcd snapshot-save --name rancher_snapshot.b --config rancher-cluster.yml --ignore-docker-version

Tutorials
---------

Spinning up k8 cluster
Centos 7.5
Docker 17.03
Kubernetes 1.11

1. Download RKE binary
2 ... TBD
Removing a node
- comment out the node information on cluster.yml
- execute "rke up"

Troubleshooting
---------------

ssh: rejected: administratively prohibited
- update openssh to 7.4,and docker version v1.12.6
- set "AllowTcpForwarding yes" "PermitTunnel yes" to /etc/ssh/sshd_config, and
  then restart sshd service
- the host which run rke can ssh to all nodes without password
- run: "groupadd docker" to create docker group,while docker group is not exist.
- run: "useradd -g docker yourusername" to create yourusername user and set it's
  group to docker
- set the docker.service's MountFlags=shared (vi /xxx/xxx/docker.service)
- run:"su yourusername" to change current user,and then restart the docker
  service. so in the user yourusername session the docker.sock will be created
in the path /var/run/docker.sock
- in cluster.yml set the ssh user to yourusername(in setup hosts)

https://github.com/rancher/rke/issues/93
FATA[0088] [workerPlane] Failed to bring up Worker Plane: Can't remove Docker container [rke-log-linker] for host [sample.host]: Error response from daemon: Driver overlay failed to remove root filesystem ABCD: remove /var/lib/docker/overlay/EFGH/merged: device or resource busy
- just retry ./rke up

Wednesday, September 5, 2018

String/Templating Modules


String Pattern Matching
-----------------------

`re`
This module uses regular expressions for advanced string processing.

>>> import re
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']h
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'

For simple cases, you can make use of "string methods" like this one below.

>>> 'tea for too'.replace('too', 'two')
'tea for two'

Compiling an expression
  --> patterns are compiles into bytecode and executed by mactching engine in C
  --> this way makes it run faster

>>> import re
>>> p = re.compile('[a-z]+')
>>> p
re.compile('[a-z]+')
>>> p.match("")
>>> print(p.match(""))
None
>>> print(p.match("abc"))
<_sre .sre_match="" 3="" match="abc" object="" span="(0,">
>>>


match() vs search()
  --> match() - searches at the beginning of string
  --> search() - searches anywhere on the string

>>> re.match('[a-z]+', '123abc456')
>>>
>>> re.search('[a-z]+', '123abc456')
<_sre .sre_match="" 0x7f769f7fd4a8="" at="" object="">
>>> re.match('[a-z]+', 'abc456')
<_sre .sre_match="" 0x7f769ddf5cc8="" at="" object="">
>>>


Templating
----------

`string`
You can use `Template` class from this module to create a base string that has editable
values.
>>> from string import Template
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.substitute(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'

`substitute()` method will raise `KeyErorr` exception if there is a missing key value but you
can bpyass that.
>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unladen swallow')
>>> t.substitute(d)
Traceback (most recent call last):
  ...
KeyError: 'owner'
>>> t.safe_substitute(d)
'Return the unladen swallow to $owner.'

You can also do a batch renamer like this:
>>> import time, os.path
>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
>>> class BatchRename(Template):
...     delimiter = '%'
>>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
Enter rename style (%d-date %n-seqnum %f-format):  Ashley_%n%f

>>> t = BatchRename(fmt)
>>> date = time.strftime('%d%b%y')
>>> for i, filename in enumerate(photofiles):
...     base, ext = os.path.splitext(filename)
...     newname = t.substitute(d=date, n=i, f=ext)
...     print('{0} --> {1}'.format(filename, newname))

img_1074.jpg --> Ashley_0.jpg
img_1076.jpg --> Ashley_1.jpg
img_1077.jpg --> Ashley_2.jpg

Tools for working with Lists
----------------------------

`array`
Stores homogeneous data and stores it compactly.
 
>>> from array import array
>>> a = array('H', [4000, 10, 700, 22222])
>>> sum(a)
26932
>>> a[1:3]
array('H', [10, 700])

* NEED MORE READING *
`deqeue`
Can be used for faster appends and pops from the left but with slower lookups in the middle.
 
>>> from collections import deque
>>> d = deque(["task1", "task2", "task3"])
>>> d.append("task4")
>>> print("Handling", d.popleft())

Handling task1
unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

* NEED MORE READING *
`bisect`
Can manipulate sorted lists by automatically insert element on their correct position.
 
>>> import bisect
>>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
>>> bisect.insort(scores, (300, 'ruby'))
>>> scores
[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
`heapq`
Can be used by applications that repeatedly access the smallest element(s) but don't want to
a run a full list sort.
 
>>> from heapq import heapify, heappop, heappush
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> heapify(data)                      # rearrange the list into heap order
>>> heappush(data, -5)                 # add a new entry
>>> [heappop(data) for i in range(3)]  # fetch the three smallest entries
[-5, 0, 1]

Collections
-----------

collections.Counter()
>>> from collections import Counter
>>> l = ['a', 'c', 'b', 'd', 'a']
>>> c = Counter(l)
>>> c
Counter({'a': 2, 'd': 1, 'c': 1, 'b': 1})

Json
----

Basics
Sample json data:
{
"a": "apple",
"b": "banana",
"c": "carrot"
}
Loading json data
from a file
>>> with open('file.json') as f:
...     data = json.load(f)
>>> data = json.load(open('file.json'))
Loading json data
from a string
>>> json.loads('{"a": "apple", "b": "banana"}')
{'a': 'apple', 'b': 'banana'}
>>>

Random
------

Generate random strings
''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))
''.join(random.choices(string.ascii_uppercase + string.digits, k=N))

Itertools
---------

Generates permutation (useful for cracking passwords)
>>> import itertools
>>> for i in itertools.permutations('abc'):
...   print(i)
...
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
>>>

Sunday, September 2, 2018

File I/O Modules


File Wildcards
--------------

`glob`
This module provides a function for making a lists from directory wildcard searches
>>> import glob
>>> glob.glob('*.py')
['primes.py', 'random.py', 'quote.py']

Working on Binary Data
----------------------

`struct`
Contains `pack()` and `unpack()` functions to loop through header information without using
`zipfile` module.
import struct

with open('myfile.zip', 'rb') as f:
    data = f.read()

start = 0
for i in range(3):                      # show the first 3 file headers
    start += 14
    fields = struct.unpack('
    crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

    start += 16
    filename = data[start:start+filenamesize]
    start += filenamesize
    extra = data[start:start+extra_size]
    print(filename, hex(crc32), comp_size, uncomp_size)

    start += extra_size + comp_size     # skip to the next header

Sample output:
b'config-err-t6uao6' 0x0 0 0
b'gnome-software-34B15Y/' 0x0 0 0
b'gnome-software-GYBW5Y/' 0x0 0 0

Socket
------

Basics
AF_INET - ipv4 address famnily
STREAM - TCP socket type
Methods
socket.gethostname() - returns your computers haostname
socket.socket() - creates a socket object
simple client-server connection setup
1. create server socket
>>> import socket
>>> s = socket.socket()  # creates a socket object
>>> host = socket.gethostname()
>>> port = 8000
>>> s.bind((host, port))
# puts socket in listening state (queues 5 connections before rejecting others)
>>> s.listen(5)
# accepts an incoming connection (returns if a connection was accepted,
# otherwise; it will just hang)
>>> conn, addr = s.accept()

2. create client socket
>>> import socket
>>> s = socket.socket()
>>> host = 'remote.system.com'
>>> port = 8000
# connects to remote system
>>> socket.connect((host, port))

Server's s.accept() and client's s.connect() are peers. If the other
one is not alive, the other will just hang. For example, if client
launched s.connect() first before server launches its s.accept(),
client side will hang and will not return until server launches its
s.accept(). Same true if the other way around has happened.
sending and receiving
1. server
# continuing the example above, will make our server be able to receive
# 100 bytes at a time (buffer size). This will return once it received
# something from the other end.
>>> conn.recv(100)

2. client
# To send a string, add `b` so it will be converted into byte type
>>> s.send(b'Hello world\n')
# You can also open a file in binary mode and send it.
>>> f = open('grocery list.txt', 'rb')
>>> data = f.read()
>>> f.close()
>>> s.send(data)
sending to multiple client sockets
# Continuing the examplese above, let's create 2 client sockets from the server
>>> clientsocket1, addr = s.accept()
# on client1, execute s.connect((host, port))
>>> clientsocket2, addr = s.accept()
# on client2, execute s.connect((host, port))
>>>
# Now, send separate messages to each client using their respective sockets
>>> clientsocket1.send(b'Hi client1')
# On client1, do a s.recv(4096) to receive the data
>>> clientsocket2.send(b'Hi client2')
# On client2, do a s.recv(4096) to receive the data
preparing data for transmission
You can only send data in its binary form. Here are ways on how to do it.
>>> conn.send(b'I am no longer a string')
>>> response = 'Hello {}'.format('world')
>>> conn.send(response.encode('utf-8')

On the receiving end, you can decode the binary data by using .decode:
>>> received_data = s.recv(1024)
>>> print(received_data.decode('utf-8'))
right way of closing connection
If client calls `close()`, server will receive 0 byte response for every `recv()` calls.


Tuesday, August 28, 2018

Math/Number Modules


`math`
main module for mathematical computations
>>> import math
>>> math.cos(math.pi / 4)
0.70710678118654757
>>> math.log(1024, 2)
10.0
`random`
You can use this module to generate random values.

>>> import random
>>> random.choice(['apple', 'pear', 'banana'])
'apple'
>>> random.sample(range(100), 10)   # sampling without replacement
[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
>>> random.random()    # random float
0.17970987693706186
>>> random.randrange(6)    # random integer chosen from range(6)
4
`statistics`
Module for statistical calculation
>>> import statistics
>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
>>> statistics.mean(data)
1.6071428571428572
>>> statistics.median(data)
1.25
>>> statistics.variance(data)
1.3720238095238095
`decimal`
Can be used by applications that requires precise calculations.

This calculates 5% tax on a 70 cent phone.
>>> from decimal import *
>>> round(Decimal('0.70') * Decimal('1.05'), 2)
Decimal('0.74')
>>> round(.70 * 1.05, 2)
0.73

Performs modulo calculations and equality test that are unsuitable for binary float point.
>>> Decimal('1.00') % Decimal('.10')
Decimal('0.00')
>>> 1.00 % 0.10
0.09999999999999995

>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
True
>>> sum([0.1]*10) == 1.0
False

Performs very precise calculations.
>>> getcontext().prec = 36
>>> Decimal(1) / Decimal(7)
Decimal('0.142857142857142857142857142857142857')
functools
>>> from functool import reduce
>>> reduce(lambda x, y: x + y, [1, 2, 3, 4, 5])
15
>>>

Wednesday, August 15, 2018

Python Date/Time modules


`datetime`
Can provide time difference calculations

>>> # dates are easily constructed and formatted
>>> from datetime import date
>>> now = date.today()
>>> now
datetime.date(2003, 12, 2)
>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'
>>> # dates support calendar arithmetic
>>> birthday = date(1964, 7, 31)
>>> age = now - birthday
>>> age.days
14368

Using strptime()

>>> from datetime import datetime
>>>
>>> datetime.strptime('2016-01-01', '%Y-%m-%d')
datetime.datetime(2016, 1, 1, 0, 0)
>>>
>>> datetime.strptime('03-01-2018', '%Y-%m-%d')
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '03-01-2018' does not match format '%Y-%m-%d'
>>>
>>>
>>> datetime.strptime('2016-01-01', '%Y-%m-%d %H:%M:%S')
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '2016-01-01' does not match format '%Y-%m-%d %H:%M:%S'
>>>
>>> datetime.strptime('2016-01-01 04:16:34', '%Y-%m-%d %H:%M:%S')
datetime.datetime(2016, 1, 1, 4, 16, 34)        
>>>


Managing time differences

>>> a = timedelta(days=365)
>>> b = timedelta(days=100)
>>> a - b
datetime.timedelta(265)
>>>

Callable methods on datetime

>>> date = datetime.strptime('2018-03-18', '%Y-%m-%d')
>>>
>>> dir(date)
['add', 'class', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'ne', 'new', 'radd', 'reduce', 'reduce_ex', 'repr', 'rsub', 'setattr', 'sizeof', 'str', 'sub', 'subclasshook', 'astimezone', 'combine', 'ctime', 'date', 'day', 'dst', 'fold', 'fromordinal', 'fromtimestamp', 'hour', 'isocalendar', 'isoformat', 'isoweekday', 'max', 'microsecond', 'min', 'minute', 'month', 'now', 'replace', 'resolution', 'second', 'strftime', 'strptime', 'time', 'timestamp', 'timetuple', 'timetz', 'today', 'toordinal', 'tzinfo', 'tzname', 'utcfromtimestamp', 'utcnow', 'utcoffset', 'utctimetuple', 'weekday', 'year']
>>>
>>> date.timestamp()
1521302400.0
>>>

`time`
Can do simple time operations and also include a sleep function.
>>> import time
>>> dir(time)
['CLOCK_MONOTONIC', 'CLOCK_MONOTONIC_RAW', 'CLOCK_PROCESS_CPUTIME_ID', 'CLOCK_REALTIME', 'CLOCK_THREAD_CPUTIME_ID', '_STRUCT_TM_ITEMS', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'altzone', 'asctime', 'clock', 'clock_getres', 'clock_gettime', 'clock_settime', 'ctime', 'daylight', 'get_clock_info', 'gmtime', 'localtime', 'mktime', 'monotonic', 'perf_counter', 'process_time', 'sleep', 'strftime', 'strptime', 'struct_time', 'time', 'timezone', 'tzname', 'tzset']
>>> time.localtime
>>> time.localtime()
time.struct_time(tm_year=2017, tm_mon=8, tm_mday=29, tm_hour=18, tm_min=20, tm_sec=35, tm_wday=1, tm_yday=241, tm_isdst=0)
>>>
>>>
>>>
>>> time.sleep(3)
>>>
`timeit`
Measures speed of small code snippets.
>>> from timeit import Timer
>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
0.57535828626024577
>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
0.54962537085770791

For larger codes, use `profile` and `pstats`