Wednesday, September 5, 2018

String/Templating Modules


String Pattern Matching
-----------------------

`re`
This module uses regular expressions for advanced string processing.

>>> import re
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']h
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'

For simple cases, you can make use of "string methods" like this one below.

>>> 'tea for too'.replace('too', 'two')
'tea for two'

Compiling an expression
  --> patterns are compiles into bytecode and executed by mactching engine in C
  --> this way makes it run faster

>>> import re
>>> p = re.compile('[a-z]+')
>>> p
re.compile('[a-z]+')
>>> p.match("")
>>> print(p.match(""))
None
>>> print(p.match("abc"))
<_sre .sre_match="" 3="" match="abc" object="" span="(0,">
>>>


match() vs search()
  --> match() - searches at the beginning of string
  --> search() - searches anywhere on the string

>>> re.match('[a-z]+', '123abc456')
>>>
>>> re.search('[a-z]+', '123abc456')
<_sre .sre_match="" 0x7f769f7fd4a8="" at="" object="">
>>> re.match('[a-z]+', 'abc456')
<_sre .sre_match="" 0x7f769ddf5cc8="" at="" object="">
>>>


Templating
----------

`string`
You can use `Template` class from this module to create a base string that has editable
values.
>>> from string import Template
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.substitute(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'

`substitute()` method will raise `KeyErorr` exception if there is a missing key value but you
can bpyass that.
>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unladen swallow')
>>> t.substitute(d)
Traceback (most recent call last):
  ...
KeyError: 'owner'
>>> t.safe_substitute(d)
'Return the unladen swallow to $owner.'

You can also do a batch renamer like this:
>>> import time, os.path
>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
>>> class BatchRename(Template):
...     delimiter = '%'
>>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
Enter rename style (%d-date %n-seqnum %f-format):  Ashley_%n%f

>>> t = BatchRename(fmt)
>>> date = time.strftime('%d%b%y')
>>> for i, filename in enumerate(photofiles):
...     base, ext = os.path.splitext(filename)
...     newname = t.substitute(d=date, n=i, f=ext)
...     print('{0} --> {1}'.format(filename, newname))

img_1074.jpg --> Ashley_0.jpg
img_1076.jpg --> Ashley_1.jpg
img_1077.jpg --> Ashley_2.jpg

Tools for working with Lists
----------------------------

`array`
Stores homogeneous data and stores it compactly.
 
>>> from array import array
>>> a = array('H', [4000, 10, 700, 22222])
>>> sum(a)
26932
>>> a[1:3]
array('H', [10, 700])

* NEED MORE READING *
`deqeue`
Can be used for faster appends and pops from the left but with slower lookups in the middle.
 
>>> from collections import deque
>>> d = deque(["task1", "task2", "task3"])
>>> d.append("task4")
>>> print("Handling", d.popleft())

Handling task1
unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

* NEED MORE READING *
`bisect`
Can manipulate sorted lists by automatically insert element on their correct position.
 
>>> import bisect
>>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
>>> bisect.insort(scores, (300, 'ruby'))
>>> scores
[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
`heapq`
Can be used by applications that repeatedly access the smallest element(s) but don't want to
a run a full list sort.
 
>>> from heapq import heapify, heappop, heappush
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> heapify(data)                      # rearrange the list into heap order
>>> heappush(data, -5)                 # add a new entry
>>> [heappop(data) for i in range(3)]  # fetch the three smallest entries
[-5, 0, 1]

Collections
-----------

collections.Counter()
>>> from collections import Counter
>>> l = ['a', 'c', 'b', 'd', 'a']
>>> c = Counter(l)
>>> c
Counter({'a': 2, 'd': 1, 'c': 1, 'b': 1})

Json
----

Basics
Sample json data:
{
"a": "apple",
"b": "banana",
"c": "carrot"
}
Loading json data
from a file
>>> with open('file.json') as f:
...     data = json.load(f)
>>> data = json.load(open('file.json'))
Loading json data
from a string
>>> json.loads('{"a": "apple", "b": "banana"}')
{'a': 'apple', 'b': 'banana'}
>>>

Random
------

Generate random strings
''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))
''.join(random.choices(string.ascii_uppercase + string.digits, k=N))

Itertools
---------

Generates permutation (useful for cracking passwords)
>>> import itertools
>>> for i in itertools.permutations('abc'):
...   print(i)
...
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
>>>

Sunday, September 2, 2018

File I/O Modules


File Wildcards
--------------

`glob`
This module provides a function for making a lists from directory wildcard searches
>>> import glob
>>> glob.glob('*.py')
['primes.py', 'random.py', 'quote.py']

Working on Binary Data
----------------------

`struct`
Contains `pack()` and `unpack()` functions to loop through header information without using
`zipfile` module.
import struct

with open('myfile.zip', 'rb') as f:
    data = f.read()

start = 0
for i in range(3):                      # show the first 3 file headers
    start += 14
    fields = struct.unpack('
    crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

    start += 16
    filename = data[start:start+filenamesize]
    start += filenamesize
    extra = data[start:start+extra_size]
    print(filename, hex(crc32), comp_size, uncomp_size)

    start += extra_size + comp_size     # skip to the next header

Sample output:
b'config-err-t6uao6' 0x0 0 0
b'gnome-software-34B15Y/' 0x0 0 0
b'gnome-software-GYBW5Y/' 0x0 0 0

Socket
------

Basics
AF_INET - ipv4 address famnily
STREAM - TCP socket type
Methods
socket.gethostname() - returns your computers haostname
socket.socket() - creates a socket object
simple client-server connection setup
1. create server socket
>>> import socket
>>> s = socket.socket()  # creates a socket object
>>> host = socket.gethostname()
>>> port = 8000
>>> s.bind((host, port))
# puts socket in listening state (queues 5 connections before rejecting others)
>>> s.listen(5)
# accepts an incoming connection (returns if a connection was accepted,
# otherwise; it will just hang)
>>> conn, addr = s.accept()

2. create client socket
>>> import socket
>>> s = socket.socket()
>>> host = 'remote.system.com'
>>> port = 8000
# connects to remote system
>>> socket.connect((host, port))

Server's s.accept() and client's s.connect() are peers. If the other
one is not alive, the other will just hang. For example, if client
launched s.connect() first before server launches its s.accept(),
client side will hang and will not return until server launches its
s.accept(). Same true if the other way around has happened.
sending and receiving
1. server
# continuing the example above, will make our server be able to receive
# 100 bytes at a time (buffer size). This will return once it received
# something from the other end.
>>> conn.recv(100)

2. client
# To send a string, add `b` so it will be converted into byte type
>>> s.send(b'Hello world\n')
# You can also open a file in binary mode and send it.
>>> f = open('grocery list.txt', 'rb')
>>> data = f.read()
>>> f.close()
>>> s.send(data)
sending to multiple client sockets
# Continuing the examplese above, let's create 2 client sockets from the server
>>> clientsocket1, addr = s.accept()
# on client1, execute s.connect((host, port))
>>> clientsocket2, addr = s.accept()
# on client2, execute s.connect((host, port))
>>>
# Now, send separate messages to each client using their respective sockets
>>> clientsocket1.send(b'Hi client1')
# On client1, do a s.recv(4096) to receive the data
>>> clientsocket2.send(b'Hi client2')
# On client2, do a s.recv(4096) to receive the data
preparing data for transmission
You can only send data in its binary form. Here are ways on how to do it.
>>> conn.send(b'I am no longer a string')
>>> response = 'Hello {}'.format('world')
>>> conn.send(response.encode('utf-8')

On the receiving end, you can decode the binary data by using .decode:
>>> received_data = s.recv(1024)
>>> print(received_data.decode('utf-8'))
right way of closing connection
If client calls `close()`, server will receive 0 byte response for every `recv()` calls.