Wednesday, September 5, 2018

String/Templating Modules


String Pattern Matching
-----------------------

`re`
This module uses regular expressions for advanced string processing.

>>> import re
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']h
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'

For simple cases, you can make use of "string methods" like this one below.

>>> 'tea for too'.replace('too', 'two')
'tea for two'

Compiling an expression
  --> patterns are compiles into bytecode and executed by mactching engine in C
  --> this way makes it run faster

>>> import re
>>> p = re.compile('[a-z]+')
>>> p
re.compile('[a-z]+')
>>> p.match("")
>>> print(p.match(""))
None
>>> print(p.match("abc"))
<_sre .sre_match="" 3="" match="abc" object="" span="(0,">
>>>


match() vs search()
  --> match() - searches at the beginning of string
  --> search() - searches anywhere on the string

>>> re.match('[a-z]+', '123abc456')
>>>
>>> re.search('[a-z]+', '123abc456')
<_sre .sre_match="" 0x7f769f7fd4a8="" at="" object="">
>>> re.match('[a-z]+', 'abc456')
<_sre .sre_match="" 0x7f769ddf5cc8="" at="" object="">
>>>


Templating
----------

`string`
You can use `Template` class from this module to create a base string that has editable
values.
>>> from string import Template
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.substitute(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'

`substitute()` method will raise `KeyErorr` exception if there is a missing key value but you
can bpyass that.
>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unladen swallow')
>>> t.substitute(d)
Traceback (most recent call last):
  ...
KeyError: 'owner'
>>> t.safe_substitute(d)
'Return the unladen swallow to $owner.'

You can also do a batch renamer like this:
>>> import time, os.path
>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
>>> class BatchRename(Template):
...     delimiter = '%'
>>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
Enter rename style (%d-date %n-seqnum %f-format):  Ashley_%n%f

>>> t = BatchRename(fmt)
>>> date = time.strftime('%d%b%y')
>>> for i, filename in enumerate(photofiles):
...     base, ext = os.path.splitext(filename)
...     newname = t.substitute(d=date, n=i, f=ext)
...     print('{0} --> {1}'.format(filename, newname))

img_1074.jpg --> Ashley_0.jpg
img_1076.jpg --> Ashley_1.jpg
img_1077.jpg --> Ashley_2.jpg

Tools for working with Lists
----------------------------

`array`
Stores homogeneous data and stores it compactly.
 
>>> from array import array
>>> a = array('H', [4000, 10, 700, 22222])
>>> sum(a)
26932
>>> a[1:3]
array('H', [10, 700])

* NEED MORE READING *
`deqeue`
Can be used for faster appends and pops from the left but with slower lookups in the middle.
 
>>> from collections import deque
>>> d = deque(["task1", "task2", "task3"])
>>> d.append("task4")
>>> print("Handling", d.popleft())

Handling task1
unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

* NEED MORE READING *
`bisect`
Can manipulate sorted lists by automatically insert element on their correct position.
 
>>> import bisect
>>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
>>> bisect.insort(scores, (300, 'ruby'))
>>> scores
[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
`heapq`
Can be used by applications that repeatedly access the smallest element(s) but don't want to
a run a full list sort.
 
>>> from heapq import heapify, heappop, heappush
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> heapify(data)                      # rearrange the list into heap order
>>> heappush(data, -5)                 # add a new entry
>>> [heappop(data) for i in range(3)]  # fetch the three smallest entries
[-5, 0, 1]

Collections
-----------

collections.Counter()
>>> from collections import Counter
>>> l = ['a', 'c', 'b', 'd', 'a']
>>> c = Counter(l)
>>> c
Counter({'a': 2, 'd': 1, 'c': 1, 'b': 1})

Json
----

Basics
Sample json data:
{
"a": "apple",
"b": "banana",
"c": "carrot"
}
Loading json data
from a file
>>> with open('file.json') as f:
...     data = json.load(f)
>>> data = json.load(open('file.json'))
Loading json data
from a string
>>> json.loads('{"a": "apple", "b": "banana"}')
{'a': 'apple', 'b': 'banana'}
>>>

Random
------

Generate random strings
''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))
''.join(random.choices(string.ascii_uppercase + string.digits, k=N))

Itertools
---------

Generates permutation (useful for cracking passwords)
>>> import itertools
>>> for i in itertools.permutations('abc'):
...   print(i)
...
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
>>>

No comments:

Post a Comment