My Lazy Admin: python3

Showing posts with label python3. Show all posts

Wednesday, September 5, 2018

String/Templating Modules

String Pattern Matching

-----------------------

`re`

This module uses regular expressions for advanced string processing.

>>> import re

>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')

['foot', 'fell', 'fastest']h

>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')

'cat in the hat'

For simple cases, you can make use of "string methods" like this one below.

>>> 'tea for too'.replace('too', 'two')

'tea for two'

Compiling an expression

--> patterns are compiles into bytecode and executed by mactching engine in C

--> this way makes it run faster

>>> import re

>>> p = re.compile('[a-z]+')

>>> p

re.compile('[a-z]+')

>>> p.match("")

>>> print(p.match(""))

None

>>> print(p.match("abc"))

<_sre .sre_match="" 3="" match="abc" object="" span="(0,">

>>>

https://docs.python.org/3/howto/regex.html#introduction

match() vs search()

--> match() - searches at the beginning of string

--> search() - searches anywhere on the string

>>> re.match('[a-z]+', '123abc456')

>>>

>>> re.search('[a-z]+', '123abc456')

<_sre .sre_match="" 0x7f769f7fd4a8="" at="" object="">

>>> re.match('[a-z]+', 'abc456')

<_sre .sre_match="" 0x7f769ddf5cc8="" at="" object="">

>>>

Templating

----------

`string`

You can use `Template` class from this module to create a base string that has editable
values.

>>> from string import Template

>>> t = Template('${village}folk send $$10 to $cause.')

>>> t.substitute(village='Nottingham', cause='the ditch fund')

'Nottinghamfolk send $10 to the ditch fund.'

`substitute()` method will raise `KeyErorr` exception if there is a missing key value but you
can bpyass that.

>>> t = Template('Return the $item to $owner.')

>>> d = dict(item='unladen swallow')

>>> t.substitute(d)

Traceback (most recent call last):

...

KeyError: 'owner'

>>> t.safe_substitute(d)

'Return the unladen swallow to $owner.'

You can also do a batch renamer like this:

>>> import time, os.path

>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']

>>> class BatchRename(Template):

... delimiter = '%'

>>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format): ')

Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f

>>> t = BatchRename(fmt)

>>> date = time.strftime('%d%b%y')

>>> for i, filename in enumerate(photofiles):

... base, ext = os.path.splitext(filename)

... newname = t.substitute(d=date, n=i, f=ext)

... print('{0} --> {1}'.format(filename, newname))

img_1074.jpg --> Ashley_0.jpg

img_1076.jpg --> Ashley_1.jpg

img_1077.jpg --> Ashley_2.jpg

Tools for working with Lists

----------------------------

`array`	Stores homogeneous data and stores it compactly. >>> from array import array >>> a = array('H', [4000, 10, 700, 22222]) >>> sum(a) 26932 >>> a[1:3] array('H', [10, 700]) * NEED MORE READING *
`deqeue`	Can be used for faster appends and pops from the left but with slower lookups in the middle. >>> from collections import deque >>> d = deque(["task1", "task2", "task3"]) >>> d.append("task4") >>> print("Handling", d.popleft()) Handling task1 unsearched = deque([starting_node]) def breadth_first_search(unsearched): node = unsearched.popleft() for m in gen_moves(node): if is_goal(m): return m unsearched.append(m) * NEED MORE READING *
`bisect`	Can manipulate sorted lists by automatically insert element on their correct position. >>> import bisect >>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')] >>> bisect.insort(scores, (300, 'ruby')) >>> scores [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
`heapq`	Can be used by applications that repeatedly access the smallest element(s) but don't want to a run a full list sort. >>> from heapq import heapify, heappop, heappush >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] >>> heapify(data) # rearrange the list into heap order >>> heappush(data, -5) # add a new entry >>> [heappop(data) for i in range(3)] # fetch the three smallest entries [-5, 0, 1]

Collections

-----------

collections.Counter()

>>> from collections import Counter

>>> l = ['a', 'c', 'b', 'd', 'a']

>>> c = Counter(l)

>>> c

Counter({'a': 2, 'd': 1, 'c': 1, 'b': 1})

Json

----

Basics	Sample json data: { "a": "apple", "b": "banana", "c": "carrot" }
Loading json data from a file	>>> with open('file.json') as f: ... data = json.load(f) >>> data = json.load(open('file.json'))
Loading json data from a string	>>> json.loads('{"a": "apple", "b": "banana"}') {'a': 'apple', 'b': 'banana'} >>>

Random

------

Generate random strings

''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))

''.join(random.choices(string.ascii_uppercase + string.digits, k=N))

Itertools

---------

Generates permutation (useful for cracking passwords)

>>> import itertools

>>> for i in itertools.permutations('abc'):

... print(i)

...

('a', 'b', 'c')

('a', 'c', 'b')

('b', 'a', 'c')

('b', 'c', 'a')

('c', 'a', 'b')

('c', 'b', 'a')

>>>

Sunday, September 2, 2018

File I/O Modules

File Wildcards

--------------

`glob`

This module provides a function for making a lists from directory wildcard searches

>>> import glob

>>> glob.glob('*.py')

['primes.py', 'random.py', 'quote.py']

Working on Binary Data

----------------------

`struct`

Contains `pack()` and `unpack()` functions to loop through header information without using
`zipfile` module.

import struct

with open('myfile.zip', 'rb') as f:

data = f.read()

start = 0

for i in range(3): # show the first 3 file headers

start += 14

fields = struct.unpack('

crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

start += 16

filename = data[start:start+filenamesize]

start += filenamesize

extra = data[start:start+extra_size]

print(filename, hex(crc32), comp_size, uncomp_size)

start += extra_size + comp_size # skip to the next header

Sample output:

b'config-err-t6uao6' 0x0 0 0

b'gnome-software-34B15Y/' 0x0 0 0

b'gnome-software-GYBW5Y/' 0x0 0 0

Socket

------

Basics	AF_INET - ipv4 address famnily STREAM - TCP socket type
Methods	socket.gethostname() - returns your computers haostname socket.socket() - creates a socket object
simple client-server connection setup	1. create server socket >>> import socket >>> s = socket.socket() # creates a socket object >>> host = socket.gethostname() >>> port = 8000 >>> s.bind((host, port)) # puts socket in listening state (queues 5 connections before rejecting others) >>> s.listen(5) # accepts an incoming connection (returns if a connection was accepted, # otherwise; it will just hang) >>> conn, addr = s.accept() 2. create client socket >>> import socket >>> s = socket.socket() >>> host = 'remote.system.com' >>> port = 8000 # connects to remote system >>> socket.connect((host, port)) Server's s.accept() and client's s.connect() are peers. If the other one is not alive, the other will just hang. For example, if client launched s.connect() first before server launches its s.accept(), client side will hang and will not return until server launches its s.accept(). Same true if the other way around has happened.
sending and receiving	1. server # continuing the example above, will make our server be able to receive # 100 bytes at a time (buffer size). This will return once it received # something from the other end. >>> conn.recv(100) 2. client # To send a string, add `b` so it will be converted into byte type >>> s.send(b'Hello world\n') # You can also open a file in binary mode and send it. >>> f = open('grocery list.txt', 'rb') >>> data = f.read() >>> f.close() >>> s.send(data)
sending to multiple client sockets	# Continuing the examplese above, let's create 2 client sockets from the server >>> clientsocket1, addr = s.accept() # on client1, execute s.connect((host, port)) >>> clientsocket2, addr = s.accept() # on client2, execute s.connect((host, port)) >>> # Now, send separate messages to each client using their respective sockets >>> clientsocket1.send(b'Hi client1') # On client1, do a s.recv(4096) to receive the data >>> clientsocket2.send(b'Hi client2') # On client2, do a s.recv(4096) to receive the data
preparing data for transmission	You can only send data in its binary form. Here are ways on how to do it. >>> conn.send(b'I am no longer a string') >>> response = 'Hello {}'.format('world') >>> conn.send(response.encode('utf-8') On the receiving end, you can decode the binary data by using .decode: >>> received_data = s.recv(1024) >>> print(received_data.decode('utf-8'))
right way of closing connection	If client calls `close()`, server will receive 0 byte response for every `recv()` calls.

Tuesday, August 28, 2018

Math/Number Modules

`math`	main module for mathematical computations >>> import math >>> math.cos(math.pi / 4) 0.70710678118654757 >>> math.log(1024, 2) 10.0
`random`	You can use this module to generate random values. >>> import random >>> random.choice(['apple', 'pear', 'banana']) 'apple' >>> random.sample(range(100), 10) # sampling without replacement [30, 83, 16, 4, 8, 81, 41, 50, 18, 33] >>> random.random() # random float 0.17970987693706186 >>> random.randrange(6) # random integer chosen from range(6) 4
`statistics`	Module for statistical calculation >>> import statistics >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> statistics.mean(data) 1.6071428571428572 >>> statistics.median(data) 1.25 >>> statistics.variance(data) 1.3720238095238095
`decimal`	Can be used by applications that requires precise calculations. This calculates 5% tax on a 70 cent phone. >>> from decimal import * >>> round(Decimal('0.70') * Decimal('1.05'), 2) Decimal('0.74') >>> round(.70 * 1.05, 2) 0.73 Performs modulo calculations and equality test that are unsuitable for binary float point. >>> Decimal('1.00') % Decimal('.10') Decimal('0.00') >>> 1.00 % 0.10 0.09999999999999995 >>> sum([Decimal('0.1')]10) == Decimal('1.0') True >>> sum([0.1]10) == 1.0 False Performs very precise calculations. >>> getcontext().prec = 36 >>> Decimal(1) / Decimal(7) Decimal('0.142857142857142857142857142857142857')
functools	>>> from functool import reduce >>> reduce(lambda x, y: x + y, [1, 2, 3, 4, 5]) 15 >>>

Wednesday, August 15, 2018

Python Date/Time modules

`datetime`	Can provide time difference calculations >>> # dates are easily constructed and formatted >>> from datetime import date >>> now = date.today() >>> now datetime.date(2003, 12, 2) >>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.") '12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.' >>> # dates support calendar arithmetic >>> birthday = date(1964, 7, 31) >>> age = now - birthday >>> age.days 14368 Using strptime() >>> from datetime import datetime >>> >>> datetime.strptime('2016-01-01', '%Y-%m-%d') datetime.datetime(2016, 1, 1, 0, 0) >>> >>> datetime.strptime('03-01-2018', '%Y-%m-%d') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime (data_string, format)) ValueError: time data '03-01-2018' does not match format '%Y-%m-%d' >>> >>> >>> datetime.strptime('2016-01-01', '%Y-%m-%d %H:%M:%S') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime (data_string, format)) ValueError: time data '2016-01-01' does not match format '%Y-%m-%d %H:%M:%S' >>> >>> datetime.strptime('2016-01-01 04:16:34', '%Y-%m-%d %H:%M:%S') datetime.datetime(2016, 1, 1, 4, 16, 34) >>> https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime Managing time differences >>> a = timedelta(days=365) >>> b = timedelta(days=100) >>> a - b datetime.timedelta(265) >>> Callable methods on datetime >>> date = datetime.strptime('2018-03-18', '%Y-%m-%d') >>> >>> dir(date) ['add', 'class', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'ne', 'new', 'radd', 'reduce', 'reduce_ex', 'repr', 'rsub', 'setattr', 'sizeof', 'str', 'sub', 'subclasshook', 'astimezone', 'combine', 'ctime', 'date', 'day', 'dst', 'fold', 'fromordinal', 'fromtimestamp', 'hour', 'isocalendar', 'isoformat', 'isoweekday', 'max', 'microsecond', 'min', 'minute', 'month', 'now', 'replace', 'resolution', 'second', 'strftime', 'strptime', 'time', 'timestamp', 'timetuple', 'timetz', 'today', 'toordinal', 'tzinfo', 'tzname', 'utcfromtimestamp', 'utcnow', 'utcoffset', 'utctimetuple', 'weekday', 'year'] >>> >>> date.timestamp() 1521302400.0 >>>
`time`	Can do simple time operations and also include a sleep function. >>> import time >>> dir(time) ['CLOCK_MONOTONIC', 'CLOCK_MONOTONIC_RAW', 'CLOCK_PROCESS_CPUTIME_ID', 'CLOCK_REALTIME', 'CLOCK_THREAD_CPUTIME_ID', '_STRUCT_TM_ITEMS', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'altzone', 'asctime', 'clock', 'clock_getres', 'clock_gettime', 'clock_settime', 'ctime', 'daylight', 'get_clock_info', 'gmtime', 'localtime', 'mktime', 'monotonic', 'perf_counter', 'process_time', 'sleep', 'strftime', 'strptime', 'struct_time', 'time', 'timezone', 'tzname', 'tzset'] >>> time.localtime >>> time.localtime() time.struct_time(tm_year=2017, tm_mon=8, tm_mday=29, tm_hour=18, tm_min=20, tm_sec=35, tm_wday=1, tm_yday=241, tm_isdst=0) >>> >>> >>> >>> time.sleep(3) >>>
`timeit`	Measures speed of small code snippets. >>> from timeit import Timer >>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit() 0.57535828626024577 >>> Timer('a,b = b,a', 'a=1; b=2').timeit() 0.54962537085770791 For larger codes, use `profile` and `pstats`

Sunday, August 12, 2018

Some python number modules

`math`	main module for mathematical computations >>> import math >>> math.cos(math.pi / 4) 0.70710678118654757 >>> math.log(1024, 2) 10.0
`random`	You can use this module to generate random values. >>> import random >>> random.choice(['apple', 'pear', 'banana']) 'apple' >>> random.sample(range(100), 10) # sampling without replacement [30, 83, 16, 4, 8, 81, 41, 50, 18, 33] >>> random.random() # random float 0.17970987693706186 >>> random.randrange(6) # random integer chosen from range(6) 4
`statistics`	Module for statistical calculation >>> import statistics >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> statistics.mean(data) 1.6071428571428572 >>> statistics.median(data) 1.25 >>> statistics.variance(data) 1.3720238095238095
`decimal`	Can be used by applications that requires precise calculations. This calculates 5% tax on a 70 cent phone. >>> from decimal import * >>> round(Decimal('0.70') * Decimal('1.05'), 2) Decimal('0.74') >>> round(.70 * 1.05, 2) 0.73 Performs modulo calculations and equality test that are unsuitable for binary float point. >>> Decimal('1.00') % Decimal('.10') Decimal('0.00') >>> 1.00 % 0.10 0.09999999999999995 >>> sum([Decimal('0.1')]10) == Decimal('1.0') True >>> sum([0.1]10) == 1.0 False Performs very precise calculations. >>> getcontext().prec = 36 >>> Decimal(1) / Decimal(7) Decimal('0.142857142857142857142857142857142857')
functools	>>> from functool import reduce >>> reduce(lambda x, y: x + y, [1, 2, 3, 4, 5]) 15 >>>

Thursday, July 19, 2018

Making your Python program faster

Here are some ways of making your program efficient and fast.

Some Techniques

---------------

Concatenate faster	Avoid: s = "" for substring in list: s += substring Use: slist = [some_function(elt) for elt in somelist] s = "".join(slist) Avoid: out = "" + head + prologue + query + tail + "" Use: out = "%s%s%s%s" % (head, prologue, query, tail) or: out = "%(head)s%(prologue)s%(query)s%(tail)s" % locals()
Faster looping	Avoid: newlist = [] for word in oldlist: newlist.append(word.upper()) Use any of the following instead: newlist = map(str.upper, oldlist) newlist = [s.upper() for s in oldlist] upper = str.upper newlist = [] append = newlist.append for word in oldlist: append(upper(word))
Use local variables as much as possible	Python accesses local variables more efficiently compared to global variables. def func(): upper = str.upper newlist = [] append = newlist.append for word in oldlist: append(upper(word)) return newlist
Dictionaries can be used to get record count faster	The following code will look up all keys inside a dict to check if it exists wdict = {} for word in words: if word not in wdict: wdict[word] = 0 wdict[word] += 1 It is cheaper to use `try-except` clause wdict = {} for word in words: try: wdict[word] += 1 except KeyError: wdict[word] = 1 Or using `dict.get()` wdict = {} get = wdict.get for word in words: wdict[word] = get(word, 0) + 1
Reduce repeated imports since it slows down performance	Function 1 places import inside: def doit1(): import string ###### import statement inside function string.lower('Python') for num in range(100000): doit1() Function 2 places import outside: import string ###### import statement outside function def doit2(): string.lower('Python') for num in range(100000): doit2() Function 2 runs faster because it only import once >>> def doit1(): ... import string ... string.lower('Python') ... >>> import string >>> def doit2(): ... string.lower('Python') ... >>> import timeit >>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()') >>> t.timeit() 11.479144930839539 >>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()') >>> t.timeit() 4.6661689281463623
Use `string` methods instead of importing `string` module	There are cases where you don't even need to import `string` def doit3(): 'Python'.lower() for num in range(100000): doit3() >>> def doit3(): ... 'Python'.lower() ... >>> t = timeit.Timer(setup='from __main__ import doit3', stmt='doit3()') >>> t.timeit() 2.5606080293655396 This is only useful if you `string` module was not imported at all. If it is already loaded from other modules, avoiding to import it doesn't make any difference. To see if it is loaded, use `sys.modules`.
Lazy imports can be used	This will import `email` only once which is on the first invocation of parse_email() email = None def parse_email(): global email if email is None: import email ...
Data aggregation	Putting loop inside a function is faster then looping the function example 1: import time x = 0 def doit1(i): global x x = x + i list = range(100000) t = time.time() for i in list: doit1(i) print "%.3f" % (time.time()-t) example 2: import time x = 0 def doit2(list): global x for i in list: x = x + i list = range(100000) t = time.time() doit2(list) print "%.3f" % (time.time()-t) The second example is faster. Here's a demo: >>> t = time.time() >>> for i in list: ... doit1(i) ... >>> print "%.3f" % (time.time()-t) 0.758 >>> t = time.time() >>> doit2(list) >>> print "%.3f" % (time.time()-t) 0.204
Reduce interpreter interval checks	You can set `sys.setcheckinterval` to a higher value to reduce the times the interpreter does periodic checks.

Some Tools to benchmark program speed

-------------------------------------

trace

- available under `sys.path`

- usage: trace.py -t spam.py eggs

- or simply hit: python -m trace

runsnake

- GUI tool

- usage: runsnake some_profile_dump.prof

pycallgraph

- creates call graphs for python programs

- generates PNG file showing the graph traces