Showing posts with label python3. Show all posts
Showing posts with label python3. Show all posts

Wednesday, September 5, 2018

String/Templating Modules


String Pattern Matching
-----------------------

`re`
This module uses regular expressions for advanced string processing.

>>> import re
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']h
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'

For simple cases, you can make use of "string methods" like this one below.

>>> 'tea for too'.replace('too', 'two')
'tea for two'

Compiling an expression
  --> patterns are compiles into bytecode and executed by mactching engine in C
  --> this way makes it run faster

>>> import re
>>> p = re.compile('[a-z]+')
>>> p
re.compile('[a-z]+')
>>> p.match("")
>>> print(p.match(""))
None
>>> print(p.match("abc"))
<_sre .sre_match="" 3="" match="abc" object="" span="(0,">
>>>


match() vs search()
  --> match() - searches at the beginning of string
  --> search() - searches anywhere on the string

>>> re.match('[a-z]+', '123abc456')
>>>
>>> re.search('[a-z]+', '123abc456')
<_sre .sre_match="" 0x7f769f7fd4a8="" at="" object="">
>>> re.match('[a-z]+', 'abc456')
<_sre .sre_match="" 0x7f769ddf5cc8="" at="" object="">
>>>


Templating
----------

`string`
You can use `Template` class from this module to create a base string that has editable
values.
>>> from string import Template
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.substitute(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'

`substitute()` method will raise `KeyErorr` exception if there is a missing key value but you
can bpyass that.
>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unladen swallow')
>>> t.substitute(d)
Traceback (most recent call last):
  ...
KeyError: 'owner'
>>> t.safe_substitute(d)
'Return the unladen swallow to $owner.'

You can also do a batch renamer like this:
>>> import time, os.path
>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
>>> class BatchRename(Template):
...     delimiter = '%'
>>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
Enter rename style (%d-date %n-seqnum %f-format):  Ashley_%n%f

>>> t = BatchRename(fmt)
>>> date = time.strftime('%d%b%y')
>>> for i, filename in enumerate(photofiles):
...     base, ext = os.path.splitext(filename)
...     newname = t.substitute(d=date, n=i, f=ext)
...     print('{0} --> {1}'.format(filename, newname))

img_1074.jpg --> Ashley_0.jpg
img_1076.jpg --> Ashley_1.jpg
img_1077.jpg --> Ashley_2.jpg

Tools for working with Lists
----------------------------

`array`
Stores homogeneous data and stores it compactly.
 
>>> from array import array
>>> a = array('H', [4000, 10, 700, 22222])
>>> sum(a)
26932
>>> a[1:3]
array('H', [10, 700])

* NEED MORE READING *
`deqeue`
Can be used for faster appends and pops from the left but with slower lookups in the middle.
 
>>> from collections import deque
>>> d = deque(["task1", "task2", "task3"])
>>> d.append("task4")
>>> print("Handling", d.popleft())

Handling task1
unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

* NEED MORE READING *
`bisect`
Can manipulate sorted lists by automatically insert element on their correct position.
 
>>> import bisect
>>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
>>> bisect.insort(scores, (300, 'ruby'))
>>> scores
[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
`heapq`
Can be used by applications that repeatedly access the smallest element(s) but don't want to
a run a full list sort.
 
>>> from heapq import heapify, heappop, heappush
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> heapify(data)                      # rearrange the list into heap order
>>> heappush(data, -5)                 # add a new entry
>>> [heappop(data) for i in range(3)]  # fetch the three smallest entries
[-5, 0, 1]

Collections
-----------

collections.Counter()
>>> from collections import Counter
>>> l = ['a', 'c', 'b', 'd', 'a']
>>> c = Counter(l)
>>> c
Counter({'a': 2, 'd': 1, 'c': 1, 'b': 1})

Json
----

Basics
Sample json data:
{
"a": "apple",
"b": "banana",
"c": "carrot"
}
Loading json data
from a file
>>> with open('file.json') as f:
...     data = json.load(f)
>>> data = json.load(open('file.json'))
Loading json data
from a string
>>> json.loads('{"a": "apple", "b": "banana"}')
{'a': 'apple', 'b': 'banana'}
>>>

Random
------

Generate random strings
''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))
''.join(random.choices(string.ascii_uppercase + string.digits, k=N))

Itertools
---------

Generates permutation (useful for cracking passwords)
>>> import itertools
>>> for i in itertools.permutations('abc'):
...   print(i)
...
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
>>>

Sunday, September 2, 2018

File I/O Modules


File Wildcards
--------------

`glob`
This module provides a function for making a lists from directory wildcard searches
>>> import glob
>>> glob.glob('*.py')
['primes.py', 'random.py', 'quote.py']

Working on Binary Data
----------------------

`struct`
Contains `pack()` and `unpack()` functions to loop through header information without using
`zipfile` module.
import struct

with open('myfile.zip', 'rb') as f:
    data = f.read()

start = 0
for i in range(3):                      # show the first 3 file headers
    start += 14
    fields = struct.unpack('
    crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

    start += 16
    filename = data[start:start+filenamesize]
    start += filenamesize
    extra = data[start:start+extra_size]
    print(filename, hex(crc32), comp_size, uncomp_size)

    start += extra_size + comp_size     # skip to the next header

Sample output:
b'config-err-t6uao6' 0x0 0 0
b'gnome-software-34B15Y/' 0x0 0 0
b'gnome-software-GYBW5Y/' 0x0 0 0

Socket
------

Basics
AF_INET - ipv4 address famnily
STREAM - TCP socket type
Methods
socket.gethostname() - returns your computers haostname
socket.socket() - creates a socket object
simple client-server connection setup
1. create server socket
>>> import socket
>>> s = socket.socket()  # creates a socket object
>>> host = socket.gethostname()
>>> port = 8000
>>> s.bind((host, port))
# puts socket in listening state (queues 5 connections before rejecting others)
>>> s.listen(5)
# accepts an incoming connection (returns if a connection was accepted,
# otherwise; it will just hang)
>>> conn, addr = s.accept()

2. create client socket
>>> import socket
>>> s = socket.socket()
>>> host = 'remote.system.com'
>>> port = 8000
# connects to remote system
>>> socket.connect((host, port))

Server's s.accept() and client's s.connect() are peers. If the other
one is not alive, the other will just hang. For example, if client
launched s.connect() first before server launches its s.accept(),
client side will hang and will not return until server launches its
s.accept(). Same true if the other way around has happened.
sending and receiving
1. server
# continuing the example above, will make our server be able to receive
# 100 bytes at a time (buffer size). This will return once it received
# something from the other end.
>>> conn.recv(100)

2. client
# To send a string, add `b` so it will be converted into byte type
>>> s.send(b'Hello world\n')
# You can also open a file in binary mode and send it.
>>> f = open('grocery list.txt', 'rb')
>>> data = f.read()
>>> f.close()
>>> s.send(data)
sending to multiple client sockets
# Continuing the examplese above, let's create 2 client sockets from the server
>>> clientsocket1, addr = s.accept()
# on client1, execute s.connect((host, port))
>>> clientsocket2, addr = s.accept()
# on client2, execute s.connect((host, port))
>>>
# Now, send separate messages to each client using their respective sockets
>>> clientsocket1.send(b'Hi client1')
# On client1, do a s.recv(4096) to receive the data
>>> clientsocket2.send(b'Hi client2')
# On client2, do a s.recv(4096) to receive the data
preparing data for transmission
You can only send data in its binary form. Here are ways on how to do it.
>>> conn.send(b'I am no longer a string')
>>> response = 'Hello {}'.format('world')
>>> conn.send(response.encode('utf-8')

On the receiving end, you can decode the binary data by using .decode:
>>> received_data = s.recv(1024)
>>> print(received_data.decode('utf-8'))
right way of closing connection
If client calls `close()`, server will receive 0 byte response for every `recv()` calls.


Tuesday, August 28, 2018

Math/Number Modules


`math`
main module for mathematical computations
>>> import math
>>> math.cos(math.pi / 4)
0.70710678118654757
>>> math.log(1024, 2)
10.0
`random`
You can use this module to generate random values.

>>> import random
>>> random.choice(['apple', 'pear', 'banana'])
'apple'
>>> random.sample(range(100), 10)   # sampling without replacement
[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
>>> random.random()    # random float
0.17970987693706186
>>> random.randrange(6)    # random integer chosen from range(6)
4
`statistics`
Module for statistical calculation
>>> import statistics
>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
>>> statistics.mean(data)
1.6071428571428572
>>> statistics.median(data)
1.25
>>> statistics.variance(data)
1.3720238095238095
`decimal`
Can be used by applications that requires precise calculations.

This calculates 5% tax on a 70 cent phone.
>>> from decimal import *
>>> round(Decimal('0.70') * Decimal('1.05'), 2)
Decimal('0.74')
>>> round(.70 * 1.05, 2)
0.73

Performs modulo calculations and equality test that are unsuitable for binary float point.
>>> Decimal('1.00') % Decimal('.10')
Decimal('0.00')
>>> 1.00 % 0.10
0.09999999999999995

>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
True
>>> sum([0.1]*10) == 1.0
False

Performs very precise calculations.
>>> getcontext().prec = 36
>>> Decimal(1) / Decimal(7)
Decimal('0.142857142857142857142857142857142857')
functools
>>> from functool import reduce
>>> reduce(lambda x, y: x + y, [1, 2, 3, 4, 5])
15
>>>

Wednesday, August 15, 2018

Python Date/Time modules


`datetime`
Can provide time difference calculations

>>> # dates are easily constructed and formatted
>>> from datetime import date
>>> now = date.today()
>>> now
datetime.date(2003, 12, 2)
>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'
>>> # dates support calendar arithmetic
>>> birthday = date(1964, 7, 31)
>>> age = now - birthday
>>> age.days
14368

Using strptime()

>>> from datetime import datetime
>>>
>>> datetime.strptime('2016-01-01', '%Y-%m-%d')
datetime.datetime(2016, 1, 1, 0, 0)
>>>
>>> datetime.strptime('03-01-2018', '%Y-%m-%d')
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '03-01-2018' does not match format '%Y-%m-%d'
>>>
>>>
>>> datetime.strptime('2016-01-01', '%Y-%m-%d %H:%M:%S')
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '2016-01-01' does not match format '%Y-%m-%d %H:%M:%S'
>>>
>>> datetime.strptime('2016-01-01 04:16:34', '%Y-%m-%d %H:%M:%S')
datetime.datetime(2016, 1, 1, 4, 16, 34)        
>>>


Managing time differences

>>> a = timedelta(days=365)
>>> b = timedelta(days=100)
>>> a - b
datetime.timedelta(265)
>>>

Callable methods on datetime

>>> date = datetime.strptime('2018-03-18', '%Y-%m-%d')
>>>
>>> dir(date)
['add', 'class', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'ne', 'new', 'radd', 'reduce', 'reduce_ex', 'repr', 'rsub', 'setattr', 'sizeof', 'str', 'sub', 'subclasshook', 'astimezone', 'combine', 'ctime', 'date', 'day', 'dst', 'fold', 'fromordinal', 'fromtimestamp', 'hour', 'isocalendar', 'isoformat', 'isoweekday', 'max', 'microsecond', 'min', 'minute', 'month', 'now', 'replace', 'resolution', 'second', 'strftime', 'strptime', 'time', 'timestamp', 'timetuple', 'timetz', 'today', 'toordinal', 'tzinfo', 'tzname', 'utcfromtimestamp', 'utcnow', 'utcoffset', 'utctimetuple', 'weekday', 'year']
>>>
>>> date.timestamp()
1521302400.0
>>>

`time`
Can do simple time operations and also include a sleep function.
>>> import time
>>> dir(time)
['CLOCK_MONOTONIC', 'CLOCK_MONOTONIC_RAW', 'CLOCK_PROCESS_CPUTIME_ID', 'CLOCK_REALTIME', 'CLOCK_THREAD_CPUTIME_ID', '_STRUCT_TM_ITEMS', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'altzone', 'asctime', 'clock', 'clock_getres', 'clock_gettime', 'clock_settime', 'ctime', 'daylight', 'get_clock_info', 'gmtime', 'localtime', 'mktime', 'monotonic', 'perf_counter', 'process_time', 'sleep', 'strftime', 'strptime', 'struct_time', 'time', 'timezone', 'tzname', 'tzset']
>>> time.localtime
>>> time.localtime()
time.struct_time(tm_year=2017, tm_mon=8, tm_mday=29, tm_hour=18, tm_min=20, tm_sec=35, tm_wday=1, tm_yday=241, tm_isdst=0)
>>>
>>>
>>>
>>> time.sleep(3)
>>>
`timeit`
Measures speed of small code snippets.
>>> from timeit import Timer
>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
0.57535828626024577
>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
0.54962537085770791

For larger codes, use `profile` and `pstats`

Sunday, August 12, 2018

Some python number modules


`math`
main module for mathematical computations
>>> import math
>>> math.cos(math.pi / 4)
0.70710678118654757
>>> math.log(1024, 2)
10.0
`random`
You can use this module to generate random values.

>>> import random
>>> random.choice(['apple', 'pear', 'banana'])
'apple'
>>> random.sample(range(100), 10)   # sampling without replacement
[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
>>> random.random()    # random float
0.17970987693706186
>>> random.randrange(6)    # random integer chosen from range(6)
4
`statistics`
Module for statistical calculation
>>> import statistics
>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
>>> statistics.mean(data)
1.6071428571428572
>>> statistics.median(data)
1.25
>>> statistics.variance(data)
1.3720238095238095
`decimal`
Can be used by applications that requires precise calculations.

This calculates 5% tax on a 70 cent phone.
>>> from decimal import *
>>> round(Decimal('0.70') * Decimal('1.05'), 2)
Decimal('0.74')
>>> round(.70 * 1.05, 2)
0.73

Performs modulo calculations and equality test that are unsuitable for binary float point.
>>> Decimal('1.00') % Decimal('.10')
Decimal('0.00')
>>> 1.00 % 0.10
0.09999999999999995

>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
True
>>> sum([0.1]*10) == 1.0
False

Performs very precise calculations.
>>> getcontext().prec = 36
>>> Decimal(1) / Decimal(7)
Decimal('0.142857142857142857142857142857142857')
functools
>>> from functool import reduce
>>> reduce(lambda x, y: x + y, [1, 2, 3, 4, 5])
15
>>>

Thursday, July 19, 2018

Making your Python program faster


Here are some ways of making your program efficient and fast.

Some Techniques
---------------

Concatenate faster
Avoid:
s = ""
for substring in list:
    s += substring

Use:
slist = [some_function(elt) for elt in somelist]
s = "".join(slist)

Avoid:
out = "" + head + prologue + query + tail + ""

Use:
out = "%s%s%s%s" % (head, prologue, query, tail)

or:
out = "%(head)s%(prologue)s%(query)s%(tail)s" % locals()
Faster looping
Avoid:

newlist = []
for word in oldlist:
    newlist.append(word.upper())

Use any of the following instead:

newlist = map(str.upper, oldlist)

newlist = [s.upper() for s in oldlist]

upper = str.upper
newlist = []
append = newlist.append
for word in oldlist:
    append(upper(word))
Use local variables as much
as possible
Python accesses local variables more efficiently compared to
global variables.

def func():
    upper = str.upper
    newlist = []
    append = newlist.append
    for word in oldlist:
        append(upper(word))
    return newlist
Dictionaries can be used to
get record count faster
The following code will look up all keys inside a dict to check if it exists

wdict = {}
for word in words:
    if word not in wdict:
        wdict[word] = 0
    wdict[word] += 1

It is cheaper to use `try-except` clause

wdict = {}
for word in words:
    try:
        wdict[word] += 1
    except KeyError:
        wdict[word] = 1

Or using `dict.get()`

wdict = {}
get = wdict.get
for word in words:
    wdict[word] = get(word, 0) + 1
Reduce repeated imports since
it slows down performance
Function 1 places import inside:

def doit1():
    import string ###### import statement inside function
    string.lower('Python')

for num in range(100000):
    doit1()

Function 2 places import outside:

import string ###### import statement outside function
def doit2():
    string.lower('Python')

for num in range(100000):
    doit2()

Function 2 runs faster because it only import once

>>> def doit1():
... import string
... string.lower('Python')
...
>>> import string
>>> def doit2():
... string.lower('Python')
...
>>> import timeit
>>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
>>> t.timeit()
11.479144930839539
>>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
>>> t.timeit()
4.6661689281463623
Use `string` methods instead
of importing `string` module
There are cases where you don't even need to import `string`

def doit3():
    'Python'.lower()

for num in range(100000):
    doit3()

>>> def doit3():
... 'Python'.lower()
...
>>> t = timeit.Timer(setup='from __main__ import doit3', stmt='doit3()')
>>> t.timeit()
2.5606080293655396

This is only useful if you `string` module was not imported at all. If it is
already loaded from other modules, avoiding to import it doesn't make any
difference. To see if it is loaded, use `sys.modules`.
Lazy imports can be used
This will import `email` only once which is on the first invocation of
parse_email()

email = None

def parse_email():
    global email
    if email is None:
        import email
    ...
Data aggregation
Putting loop inside a function is faster then looping the function

example 1:

import time
x = 0
def doit1(i):
    global x
    x = x + i

list = range(100000)
t = time.time()
for i in list:
    doit1(i)

print "%.3f" % (time.time()-t)

example 2:

import time
x = 0
def doit2(list):
    global x
    for i in list:
        x = x + i

list = range(100000)
t = time.time()
doit2(list)
print "%.3f" % (time.time()-t)

The second example is faster. Here's a demo:

>>> t = time.time()
>>> for i in list:
... doit1(i)
...
>>> print "%.3f" % (time.time()-t)
0.758
>>> t = time.time()
>>> doit2(list)
>>> print "%.3f" % (time.time()-t)
0.204
Reduce interpreter interval
checks
You can set `sys.setcheckinterval` to a higher value to reduce the times the
interpreter does periodic checks.


Some Tools to benchmark program speed
-------------------------------------

trace
  - available under `sys.path`
  - usage: trace.py -t spam.py eggs
  - or simply hit: python -m trace

runsnake
  - GUI tool
  - usage: runsnake some_profile_dump.prof

pycallgraph
  - creates call graphs for python programs
  - generates PNG file showing the graph traces