Python: OS, Shutil, Glob & Unipath

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

This article looks at ways of interacting with the operating and file system.

OS

The OS module is a collection of miscellaneous operating system interfaces. If you want to interact with the operating system or call operating system functionality then you can probably do it with the OS module.

The following outlines some of the most useful functionality.

We can get the name and details of the OS with the name property and uname function. Uname returns a tuple with the following information.

  • sysname: operating system name
  • nodename: name of machine on network (implementation-defined)
  • release: operating system release
  • version: operating system version
  • machine: hardware identifier

Code:

print(os.name)
print(os.uname())

Output:

posix
posix.uname_result(sysname=’Linux’, nodename=’Minty’, release=’3.13.0-37-generic’, version=’#64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014′, machine=’x86_64′)

We can determine the current working directory and also change it with the getcwd and chdir functions.

Code:

print(os.getcwd())
os.chdir('/')
print(os.getcwd())

Output:

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/StandardLibrary
/

We can access the user group id and the user id of the user principal of the active running thread along with a list of all user group ids for the OS.

Code:

print(os.getgid())
print(os.getuid())
print(os.getgroups()) 

Output:

1000
1000
[4, 24, 27, 30, 46, 108, 110, 1000]

The getenv and and putenv can be used to get and set system variables respectively.

Code:

print(os.getenv("HOME"))
os.putenv("Fluffy", "Yes")

Output:

/home/luke

We can create directories and remove them with the mkdir and rmdir functions. We can also run bash commands with the system function.

os.mkdir("foo")
os.rmdir("foo")
os.system("mkdir XXX")

Shutil

The shutil module provides a high level interface when working with a file or a collection of files.

The following outlines some of the most useful functionality.

We can move and copy files with the move and copy functions.

The copytree function can be used to copy a directory and all it’s contents while the rmtree can be used to delete a directory and all it’s contents.

Code:

from shutil import move, copy, copytree, rmtree, 

move("source_file.txt", "target_dir" )
copy("source_file.txt", "target_file.txt" )
copytree("source_dir", "target_dir")
rmtree("source")

We can use shutil to compress and uncompress files and directories with the make_archive and unpack_archive functions respectively. The format has to be registered upon the system and with python. We can determine which formats are configured with the get_archive_formats function. It returns a list of tuples, each tuple contains the file extension and the format description.

Code:

from shutil import get_archive_formats, make_archive, unpack_archive

print(get_archive_formats())
make_archive(archive_name, 'gztar', root_dir)
unpack_archive(archive_name, "source_dir", "gztar" )

Output:

[(‘bztar’, “bzip2’ed tar-file”), (‘gztar’, “gzip’ed tar-file”), (‘tar’, ‘uncompressed tar file’), (‘zip’, ‘ZIP file’)]

The disk_usage function can be used to determine the disk usages statistics for a partition or hard disk. It returns a tuple of total the hard disk size, the used size and the free size in bytes.

Code:

from shutil import disk_usage

print(disk_usage("/"))

Output:

usage(total=20507914240, used=7373537280, free=12069023744)

The which command can be used to find the location of an executable which is locatable within the path variable. Here we use it to find the location of the python and python3 executables.

Code:

from shutil import which

print(which("python"))
print(which("python3"))

Output:

/usr/bin/python
/usr/bin/python3

Glob

Glob provides functionality for getting a list of files on a hard disk from a search pattern. The pattern rules for glob are not actually regular expressions but standard Unix path expansion rules.

  • * matches zero or more characters as wild card characters
  • ? matches a single character as a wild card character
  • [] matches a single character form a list of possibilities. Allows the character ‘-‘ to determine a range and the character ‘!’ to negate.
    • [0123456789] matches any number
    • [abc] matches letters a, b or c as lowercase letters
    • [0-9] matches any number
    • [a-zA-Z] matches any letter upper or lowercase
    • [!abc] matches anything except letters a, b or c

Glob can be used with relative or absolute paths.

Glob is not recursive; i.e it will only search local entities and not within subdirectories. You can use os.walk to search recursively.

The following matches any file which has an extension of ‘txt’.

Code:

from glob import glob

for name in glob('*.txt'):
    print(name)

The following will match any file which ends in og.txt and has one first character which can be anything.

Code:

from glob import glob

for name in glob('?og.txt'):
    print(name)

The following will match any file which ends in og.txt and has one first character which can be any lowercase letter.

Code:

from glob import glob

for name in glob('[a-z]og.txt'):
    print(name)

The following will match any file which ends in og.txt and has one first character which is anything except for lowercase a, b, c, d or e.

Code:

from glob import glob

for name in glob('[!abcde]og.txt'):
    print(name)

Unipath

Unitpath is a Object-oriented alternative to os, os.path and shutil.

Path

Everything works around the Path class which can take any number of path name components which are concatenated to use the correct pathname separator. It supports glob style syntax as well as relative and absolute paths.

A Path instance can be created from any of the following.

Code:

Path("/", "home", "lukey")        # An absolute path of /home/lukey
Path("foo", "log.txt")            # A relative path of foo/log.txt
Path(__file__)                    # The current running file
Path()                            # Path(os.curdir)
p = Path("")                      # An empty path

Path Properties

The path class have number of properties which can be used to return a Path instance of the parent directory, the file name, the file extension and the file name without the extension.

The components function can be used to get a list of directories which define the Path instance, each as an instance of Path.

Code:

here = Path(__file__)
print(here)

print(here.components())            # A list of all the directories and the file as Path instances.
print(here.parent)                  # The path without the file name
print(here.name)                    # The file name
print(here.ext)                     # The file extension
print(here.stem)                    # The  file name without the extension

Output:

[Path(‘/’), Path(‘data’), Path(‘data’), Path(‘Dropbox’), Path(‘Development’), Path(‘SandBox’), Path(‘Git’), Path(‘ThePythonPit’), Path(‘PythonSandBox’), Path(‘StandardLibrary’), Path(‘unipath_example.py’)]

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/> PythonSandBox/StandardLibrary

unipath_example.py

.py

unipath_example

Child & Parent Methods

We can access the parent directory of a path instance with the parent property as shown above.

We can jump up the ancestry tree x times with the ancestor method; this is the same as calling parent x times.

We can walk down the ancestry tree with the child method passing all components of the path to the required directory or file.

Code:

print(here.parent)                  # The containing directory
print(here.ancestor(5))             # Up x entities ( same as calling parent x times).
print(here.ancestor(3).child("PythonSandBox", "StandardLibrary")) # Returns the child as defined by the components.

Output:

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/StandardLibrary

/data/data/Dropbox/Development/SandBox

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/StandardLibrary

Expand, Expand User and Expand Vars

Path instances can be defined with the ~, system variables and also the .. notation.

Note: ~ represents the users home director for Linux/Unix.
Note: .. is a notation for up one directory when defining relative paths.

We can expand these relative path notations to absolute paths. The function expand_user will expand ~ while expand_vars will expand system variables. The norm function will expand the .. and . notations.

Alternatively the expand function will expand ~, system variables as well as the .. and . notations.

Code:

print(Path("~").expand_user() )     # Expands ~ to a absolute path name
print(Path("$HOME").expand_vars())  # Expands system variables
print(Path("/home/luke/..").norm()) # Expands .. and . notation
print(Path("$HOME/..").expand())    # Expands system variables, ~ and also ..

Output:

/home/luke
/home/luke
/home
/home

File Attributes and permissions

The path class also has a number of attributes which can be used to return information about the file or directory. Most of them are self explanatory.

Note that the atime and ctime functions return time as seconds past the epoch which for unix is the first second of 1970. You can find out the epoch with gmtime(0).

Code:

here = Path(__file__)

print(here.atime())                     # Last access time
print(here.ctime())                     # Last permission or ownership modification; windows is creation time
print(here.isfile())                    # Is this a file? Symbolic links are followed
print(here.isdir())                     # Is this a directory? Symbolic links are followed
print(here.islink())                    # Is a symbolic link?
print(here.ismount())                   # Is a mount point; i.e. is the parent on a different device?
print(here.exists())                    # File or directory actually exists? Symbolic links are followed.
print(here.lexists())                   # Same as exists but symbolic links are not followed
print(here.size())                      # File size in bytes
print(Path("/foo").isabsolute())        # Is an absolute and not a relative path

The function gmtime can be used to determine the epoch.

Code:

print(gmtime(0))

Output:

time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

The stat and lstat can be used to get statistics for a file. Stat will navigate through symbolic links while lstat will use the file the path instance is looking at regardless if it if is a symbolic link or not.

Code:

here = Path(__file__)
print(here.stat())                      # File stat object for size, permissions etc. Symbolic links are

Output:

os.stat_result(st_mode=33188, st_ino=2753975, st_dev=2052, st_nlink=1, st_uid=1000, st_gid=1000, st_size=3054, st_atime=1434042724, st_mtime=1434042724, st_ctime=1434042724)

Python: Logging

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

The logging module provides the ability to add conditional logging into any code.

Levels

Logging is associated with a level of seriousness which starts at debug information and ends up at a critical system.

Level Description
Debug Debug information
Info Information
Warning Warning
Error Error
Critical Critical

When we log information we provide the level along with the message.

Code:

import logging

logging.debug('Debug Info')
logging.info('Info')
logging.warning('Warning')
logging.error('Error')
logging.critical('Critical Error')

We configure the active level with the level parameter of the basicconfig function. Only entities which are equal to or of a higher seriousness are reported. By default the level is set to WARNING.

Code:

logging.basicConfig(level=logging.WARNING)   # Set report level

Output:

WARNING:root:Warning
ERROR:root:Error
CRITICAL:root:Critical Error

If we changed the level to debug we would output the following.

Output:

DEBUG:root:Debug Info
INFO:root:Info
WARNING:root:Warning
ERROR:root:Error
CRITICAL:root:Critical Error

Logging To A File

By default the messages are logged to a terminal. We can log to a file with the filename parameter.

Code:

logging.basicConfig(filename='log.txt',level=logging.DEBUG)

By default messages are appended to the log file between application runs. We can overwrite the file each time by setting the filemode parameter to ‘w’.

Code:

logging.basicConfig(filename='log.txt',level=logging.DEBUG, filemode="w")

Format

We can define the format of the messages in a number of ways.

First lets add the time to the output message.

Code:

logging.basicConfig(format='%(asctime)s %(message)s')

Output:

2015-06-10 19:19:03,578 Warning
2015-06-10 19:19:03,579 Error
2015-06-10 19:19:03,579 Critical Error

We can also add new templates into the format and pass the data as key value pairs into the logging message.

Below we add in the templates ip and user into the format string, these are then populated by the parameter named extra which should be a dictionary.

Code:

logging.basicConfig(format='%(asctime)-15s %(ip)s %(user)-8s %(message)s')
logging.critical('Critical %s', 'Error', extra = {'ip': '192.168.0.1', 'user': 'luke'})

Output:

2015-06-10 19:55:01,698 192.168.0.1 luke Critical Error

Python: Numerical Types, Mathematical Functions & The Random Module

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

Numbers

Python and its dynamic type system means that in most cases you don’t need to worry about the data type you are working on.

Before you get too excited there are still inherent inaccuracies when working with floating point numbers!

One thing I really like about Python is that from version 3 operators are not truncated to the narrowest type, as such 3.3 / 1 = 1.1 and not 1!

Python provides the following numerical types which are all immutable.

Type Description Example
int Integral value x = 10
float Floating point x = 1.1
Decimal Decimal point x = Decimal(1.1)
boolean Sublcass of int x = True
Fraction Holds a separate numerator and denominator x = Fraction(1, 2)
complex Representation of a theoretical complex numbers x = complex(-1, 0, 0.0 )

They all respond to the basic mathematical and assignment operators. TODO

Integers

Before Python version 3 there were two integer types; int and long. Ints had a defined range while longs were in theory infinite but in practice depended upon the size of the available memory. The Python runtime determined which type was required and for longs handled the amount of memory required.

For version 3 and greater the int and long types have been merged into the int type. The runtime handles determining the memory size based upon the size of the integer being represented.

In theory ints can grow as big as they are required; the only limit is the amount of available memory.

Assignment is made with the = operator and an integral value. The int constructor is implicitly called though can be used.

Code:

x = 1
y = int(1)

We can determine the type with the type function.

Code:

print(x, type(x))

Output:

1

We can parse strings to an int via the constructor.

Code:

x = int("1")

We can check that a string only contains characters which can be parsed with the isdigit function.

Code:

print(str.isdigit("1"));
print(str.isdigit("a"));

Output:

True
False

We can round by passing in a negative number of decimal places.

Below we round a value to -2 decimal places; the first two digits to the left of the decimal point become zeros, the third is rounded accordingly.

Code:

print(round(11111, -2))

Output:

11100

We can perform many maths operators, below are some examples. . See ../Operators/*.py for more details *TODO**

Code:

print(1+1)
print(1-1)
print(1/1)
print(1*1)

Output:

2
0
1.0
1

We can get physical storage information about the int class with the int_info function.

Code:

import sys
print(print(sys.int_info))

The output will vary depending if you are using a 32bit or 64bit operating system.

Output:

sys.int_info(bits_per_digit=30, sizeof_digit=4)
sys.int_info(bits_per_digit=15, sizeof_digit=2)

64 bit systems will store a digit as 2**30 bits and 32 bit systems will store a digit as 2**15 bits. Both will allow an int to grow to an infinite size dependant on memory.

The memory used is dependant upon what is required. We can use the getsizeof function to get the size of the memory allocated for an instance of an int.

Code:

from sys import getsizeof

print(getsizeof(0))
print(getsizeof(100**100))
print(getsizeof(100**100100))

Output:

24
116
88700

Floating Point

The float type represents a floating point number.

A float is a fixed size representation of a fractional number; it contains digits to the left and right of the decimal point.

1/3 gives us an infinite number of digits after the decimal point which is impossible to store exactly.

Float has a fixed memory size and it’s range is represented by significant figures rather than a physical min and max boundary.

For example 1.0e10 = 10000000000.0 and only has one digit of signification.

Format Total bits Significant bits Exponent bits
Single precision 32 23 + 1 sign 8
Double precision 64 52 + 1 sign 11

Assignment is made with the = operator and any real number i.e. not an integral. It is a short cut for the float constructor which can be implicitly called.

Code:

x = 1.1
y = float(1.1)

We can determine the type with the type function.

Code:

print(x, type(1))

Output:

1.1
1

We can parse strings to a float including exponential representations.

Code:

print(float("1"))
print(float("1.0e10"))

Output:

1.0
10000000000.0

We can round to x d.p with the round function.

Code:

print(round(1.11111, 2))

Output:

1.11

We can perform many maths operators. TODO. See ../Operators/*.py for more details.

Code:

print(1.1 + 1.1)
print(1.1 - 1.1)
print(1.1 / 1.1)
print(1.1 * 1.1)

Output:

2.2
0.0
1.0
1.2100000000000002

We can get physical storage information about the float class with the float_info function.

Code:

print(float_info)
print(sys.float_info.min)
print(sys.float_info.max)

Output:

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
2.2250738585072014e-308
1.7976931348623157e+308

For 64 bit machines we have 53 bits of signification as noted with the mant_dig property, this includes the sign!

The memory appears to always fixed at 24 bytes, incrementing over this size causes an OverflowError exception to be raised.

Code:

print(getsizeof(float(0)))
print(getsizeof(1.1))
print(getsizeof(float(9999999.9)))

Output:

24
24
24

As per all languages floats are approximations and as such suffer from inaccuracy.

Code:

print(1 / 3)
print(.1 + .1 + .1 == .3)
print(float(.1) + float(.1) + float(.1))

Output:

0.3333333333333333
False
0.30000000000000004

The following pages explain more about floats and their inherent issues.

Decimal

The decimal type represents a real number (integral and fraction) and has a closer representation to the real value when compared to float.

It is not optimised for computers and as such has a memory and performance hit when compared to floats though they are more accurate.

In python decimals can grow to take on more precision and accuracy as required.

Assignment is made with the = operator and the decimal constructor.

Code:

from decimal import Decimal

x = Decimal(1.1)

We can determine the type with the type function.

Code:

x = print(x, type(Decimal(1.1))

Output:

1.100000000000000088817841970012523233890533447265625

We can parse strings to a Decimal including exponential numbers.

Code:

print(Decimal("1"))
print(Decimal("1.0e10"))

Output:

1
1.0E+10

We can round to x d.p with the round function.

Code:

print(round(Decimal(1.111111), 2))

Output:

1.11

We can perform many maths operators. TODO ee ../Operators/*.py for more details

Code:

x = Decimal(1.1)
print(x + x)
print(x - x)
print(x / x)
print(x * x)

Output:

2.200000000000000177635683940
0E-51
1
1.210000000000000195399252334

We can use the getcontext function to get information about the physical memory representation of a Decimal.

Code:

from decimal import Decimal, getcontext

print(getcontext())

Output:

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[Inexact, FloatOperation, Rounded], traps=[InvalidOperation, DivisionByZero, Overflow])

The memory allocation seems to always be fixed as 24 bytes.

Code:

print(getsizeof(Decimal(0)))
print(getsizeof(Decimal(1024)))
print(getsizeof(Decimal(999999999999)))

Output:

104
104
104

Decimals provide better accuracy to floats.

Code:

print(.1 + .1 + .1 == .3)
print(Decimal(".1") + Decimal(".1") + Decimal(".1") == Decimal(".3"))

Output:

False
True

Fractions

Represents a fraction i.e it has a nominator and a denominator.

A Fraction instance can be constructed from a pair of integers, from another rational number or from a string.

Code:

from fractions import Fraction
from decimal import Decimal

print(Fraction(1, 2))
print(Fraction(Fraction(1, 2)))
print(Fraction(1.1))
print(Fraction(Decimal(1.1)))

Output:

1/2
1/2
2476979795053773/2251799813685248
2476979795053773/2251799813685248

We can determine the type with the type function.

Code:

x = Fraction(1, 2)
print(x, type(x))

Output:

1/2

We can parse strings of rational number into a fraction.

Code:

print(Fraction("1.1"))

Output:

11/10

We can perform many maths operators. TODO: See ../Operators/*.py for more details

Code:

x = Fraction(1, 2)
print(x + x)
print(x - x)
print(x / x)
print(x * x)

Output:

1
0
1
1/4

Complex Numbers

Complex numbers have a real and imaginary part both of which which are floating point numbers.

Assignment is made with the = operator and the complex constructor.

Code:

x = complex(1.1, 2.2)

We can determine the type with the type function.

Code:

print(x, type(x))

Output:

(1.1+2.2j)

We can perform many maths operators. TODO: # See ../Operators/*.py for more details

Code:

x = complex(1.1, 2.2)
print(x + x)
print(x - x)
print(x / x)
print(x * x)

Output:

(2.2+4.4j)
0j
(1+0j)
(-3.630000000000001+4.840000000000001j)

Math

The math module contains various useful basic maths functions.

Code:

import math

The ceil and floor functions can be used to round up and down to the nearest integer respectively. Where negative numbers are found they work away from zero.

Code:

print(math.ceil(11.11))    
print(math.floor(11.11))  
print(math.ceil(-11.11))    
print(math.floor(-11.11))  

Output:

12
11
-11
-12

The min and max functions can be used to return the largest and smallest valued entity from any number of arguments passed in respectively. They can both work from collections.

The fsum function can be used to sum all elements within a collection.

Code:

print(min(1, 2, 3))
print(max(1, 2, 3))                 
print(math.fsum([1, 2, 3]))    

Output:

1
3
6

Modf returns a tuple of a real number as an integral and fractal.

Trunc removes any digits after the decimal point leaving an integral value.

Fabs returns a positive value of a number or the number itself if it is not negative.

Code:

print(math.modf(1.1))        
print(math.trunc(1.11))   
print(math.trunc(-1.11))   
print(math.fabs(-999))    

Output:

(0.10000000000000009, 1.0)
1
-1
999.0

Isfinite can be used to ensure a number is not NaN or infinite. Isinf can determine if a number is infinite while isnan can determine if a value is assigned nan.

Code:

print(math.isfinite(1))                    
print(math.isinf(float("inf")))    
print(math.isnan(float("inf")))   

Output:

True
True
False

The function pow calculates one number to the power of another, while the sqrt function can be used to calculate the square root of a number.

Code:

print(math.pow(3, 3))    
print(math.sqrt(9))        

Output:

9
3

The pi and e properties can be used to get the pi and e constants respectively.

Code:

print(math.pi)      # Pi
print(math.e)        # E

The radians and degrees function can be used to convert degrees to radians and vice versa respectively.

Output:

3.141592653589793
2.718281828459045

Code:

print(math.radians(360))                      # Degrees to Radian
print(math.degrees(6.283185307179586))    # Radians to degrees

Output:

6.283185307179586
360.0

Python provides many other trigonometry functions.

Function Description
acos(x) Return the arc cosine of x, in radians.
asin(x) Return the arc sine of x, in radians.
atan(x) Return the arc tangent of x, in radians.
cos(x) Return the cosine of x radians.
hypot(x, y) Return the Euclidean norm, sqrt(xx + yy).
sin(x) Return the sine of x radians.
tan(x) Return the tangent of x radians.

Random

The random module allows functionality of random selections.

Random generates a random float which has no lower or upper limits.

Code:

from random import random

print(random())

Output:

0.07197929300003614

Uniform generates a random float which has a lower and upper limit as defined by the first and second arguments respectively.

Code:

from random import uniform

print(uniform(1, 10))

Output:

3.8361968102149504

Randint can be used to generate a random int which has a lower and upper limit.

Code:

from random importrandint

print(randint(1, 99))

Output:

20

Randrange can be used to select an element which has a lower and upper limit but also respects an increment. Here we used it to get a random odd number between 1 and 99.

Code:

from random import randrange

print(randrange(1, 99, 2))

Output:

71

Choice can be used to randomly select an element from an enumerable.

Code:

from random import choice

print(choice('abcdefghi'))

Output:

h

Sample can be used to randomly select any number of elements from an enumerable. Here we select any two random elements from a collection.

Code:

from random import sample

print(sample([1, 2, 3, 4, 5], 2))       

Output:

[4, 2]

Shuffle can be used to randomly order an enumerable.

Code:

from random import shuffle

letters = "a,b,c,d".split(',')
shuffle(letters)
print("Shuffled Letters:", letters)

Output:

[‘b’, ‘c’, ‘a’, ‘d’]

Python: Dates, Times & TimeIt

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

This article looks at dates and times along with functionality which will be useful when working with them.

Date

The date class provides a representation of a date. It is located within the datetime namespace.

Code:

from datetime import date

We can create an instance of the date class populated with today’s day, month and year with the today function.

Code:

from datetime import date

today = date.today()
print(today)

Output:

2015-06-09

A date instance responds to day, month and year.

Code:

print("{0}-{1}-{2}".format(today.day, today.month, today.year))

Output:

9-6-2015

We can create an instance of a date with the constructor which takes the year, month and day as integers.

Code:

print(date(1978, 10, 25))

Output:

1978-10-25

The date class is limited to a range of 0001-01-01 to 9999-12-31 with increments of 1 day. The min, max and resolution properties can be used to return this information.

Code:

print("Min = {0}, Max = {1}, Resolution = {2}".format(date.min, date.max, date.resolution))

Output:

Min = 0001-01-01, Max = 9999-12-31, Resolution = 1 day, 0:00:00

The date class is immutable but we can use the replace function to create a new instance replacing any of the year, month and day fields; all parameters are optional.

Code:

print(today.replace(1, 2, 3))

Output:

0001-02-03

We can determine the day of the week as an integer with the weekday and isoweekday functions. Weekday returns 0-6 for Monday to Sunday while isoweekday returns 1-7 for Monday to Sunday.

Code:

print(today.weekday())
print(today.isoweekday())

Output:

1
2

Date objects can support basic operators with the most useful being the minus operator, this allows us to determine the difference between two days. It returns a timedelta instance

Code:

print(today.replace(2016) - today )

Output:

366 days, 0:00:00

Date Time

The datetime class is similar to the date class but it also holds state for time. It is located within the datetime namespace.

Code:

from datetime import datetime

We can get an instance of datetime populated as the current date and time with the today, now and utcnow functions.

The functions today and now take the local time including daylight saving, the utcnow will be the local time without daylight saving.

Code:

print(datetime.today())
print(datetime.now())
print(datetime.utcnow())

Output:

2015-06-09 16:48:55.378254
2015-06-09 16:48:55.378304
2015-06-09 15:48:55.378328

Datetime can represent a date and time between the range of 0001-01-01 00:00:00, and 9999-12-31 23:59:59.999999. The minimal increment is 0.000001 seconds. This data can be retrieved with the min, max and resolution properties.

Code:

print("Min = {0}, Max = {1}, Resolution = {2}".format(datetime.min, datetime.max, datetime.resolution))

The datetime class has properties for the day, month, year as well as hour, minutes and seconds.

Code:

print("{0}-{1}-{2} {3}:{4}:{5}".format(now.day, now.month, now.year, now.hour, now.minute, now.second))

Output:

9-6-2015 16:48:55

We can create an instance of a datetime by passing in any of the required components of state into the constructor.

Code:

print(datetime(2001, 2, 3, 4, 5, 6))

Output:

2001-02-03 04:05:06

A datetime instance is immutable though we can use the replace function to create a new instance based on another while swapping over any of the state components; all parameters are optional.

Code:

# replace(year, month, day, hours, minutes, seconds)
print(datetime.today().replace(1, 2, 3, 4, 5, 6, 7))

Output:

0001-02-03 04:05:06.000007

We can use basic operators between two instances of datetime, the most useful being the minus operator. This can be used to determine the time period between two dates, it returns a timedelta.

Code:

print(datetime.now().replace(year=2016) - datetime.now())

Output:

365 days, 23:59:59.999989

##Formatting DateTime## {#FormattingDateTime}

Python provides the following templates which can be used with date, time and datetime where applicable when formatting them to strings.

Code Example Description
%a Mon Name of day short
%A Monday Name of day
%w 0 Day of week as integral. Sunday – Saturday = 0 – 6
%d 25 Day of the month
%b Jan Name of month short
%B January Name of month
%m 1 Month (0-12)
%y 79 Short year ( last two digits)
%Y 1978 Year ( as 4 digits)
%H 18 Hour as integral of 24 hour clock
%I 6 Hour as integral of 12 hour clock
%p AM AM/PM
%M 30 Minute as integral
%S 30 Second as integral
%f 989898 Microsecond as integral
%z UTC offset (form +HHMM or -HHMM)
%Z Time zone name
%j 213 Day of the year
%U 10 Week number of the year ( Sunday as the first day of the week)
%W 10 Week number of the year (Monday as the first day of the week)
%c 01/02/2014 12:30:55 Locale formatted date time
%x 01/02/2014 Locale formatted date
%X 12:30:55 Locale formatted time

We can use any number of these templates along with the strftime function.

Code:

print(date.today().strftime("%m-%d-%y"))
print(datetime.now().strftime("%d %b %Y %X"))

Output:

06-09-15
09 Jun 2015 17:01:34

We can also use the isoformat and ctime functions for predefined formatted representations.

Code:

print(datetime.today().isoformat())
print(date.today().isoformat())
print(datetime.today().ctime())
print(date.today().ctime())

Output:

2015-06-09T17:01:34.880119
2015-06-09
Tue Jun 9 17:01:34 2015
Tue Jun 9 00:00:00 2015

Time Delta

The timedelta class allows the representation of a time range. It is returned when subtracting or working out the difference between two dates or datetimes.

Code:

from datetime import timedelta

We can create an instance with the constructor, all parameter are optional.

Internally only days, seconds and microseconds are stored all other arguments are converted.

Code:

a_timedelta = timedelta(days=1, seconds=2, microseconds=3, milliseconds=0, minutes=0, hours=0, weeks=0)
print(a_timedelta)

Output:

1 day, 0:00:02.000003

We can access the days, seconds and microseconds by similarly named properties.

Code:

print(a_timedelta.days)
print(a_timedelta.seconds)
print(a_timedelta.microseconds)

Output:

1
2
3

A time period is made up of the microseconds, seconds and days all together. We can use the total_seconds function to get the entire time range within seconds.

Code:

print(a_timedelta.total_seconds()) # Seconds contained in days, second sand microseconds

Output:

86402.000003

A timedelta can hold data within a range from -999999999 days, 0:00:00 seconds to 999999999 days, 23:59:59.999999 seconds in increments of 0.000001 seconds. The min, max and resolution properties can be used to return this information.

Code:

print("Min = {0}, Max = {1}, Resolution = {2}".format(timedelta.min, timedelta.max, timedelta.resolution))

Output:

Min = -999999999 days, 0:00:00, Max = 999999999 days, 23:59:59.999999, Resolution = 0:00:00.000001

The print, str and repr functions can be used to report an instance into a string.

Code:

print(a_timedelta)
print(str(a_timedelta))
print(repr(a_timedelta))

Output:

1 day, 0:00:02.000003
1 day, 0:00:02.000003
datetime.timedelta(1, 2, 3)

We can use a timedelta to add or subtract a time period onto a date or datetime.

Code:

today = datetime.now()
yesterday = datetime.now() - timedelta(days=1)
print(today - yesterday)
print((today - yesterday).total_seconds())

Output:

23:59:59.999537
86399.999537

Time

The time class allows us to represent a time along with its date.

Code:

import time

We can grab the current time and date with the time function. It returns time in seconds or ticks since 12:00am, January 1, 1970.

Code:

print(time.time())

Output:

1433866491.5349488

We can convert this to something a little more human readable with the localtime function.

Code:

print(time.localtime(time.time()))

It returns a struct_time which is a named tuple.

Output:

time.struct_time(tm_year=2015, tm_mon=6, tm_mday=9, tm_hour=17, tm_min=14, tm_sec=51, tm_wday=1, tm_yday=160, tm_isdst=1)

The following defines the struc_time tuple.

Index Attribute Description Range
0 tm_year Year Any int
1 tm_mon Month 1 to 12
2 tm_mday Day of month 1 to 31
3 tm_hour Hour 0 to 23
4 tm_min Minutes 0 to 59
5 tm_sec Seconds 0 to 61 where 60/61 are leap-econds
6 tm_wday Day of week 0 to 6 where 0 is Monday
7 tm_yday Day of year 1 to 366 (Julian day)
8 tm_isdst Daylight saving 1=y, 0=n, -1=library determines DST

We can use the asctime function to format a time object into a string.

Code:

print(time.asctime(time.localtime(time.time())))

Output:

Tue Jun 9 17:14:51 2015

TimeIt

The timeit module provides stopwatch style functionality for timing the running of code.

Lets take a simple function which performs a loop and does some multiplication.

Code:

import timeit

def function_to_time(max_value):

    start = 0

    for count in range(max_value):
        start = start ** max_value

We can use the timer class in timit to run the function a set number of times and then return the time it required to run it.

In the following we run our function 100, 200 and then 300 times with a value of 100.

Code:

    t = timeit.Timer(lambda: function_to_time(100))

    for number in [100, 200, 300]:
        print("{0}: {1}".format(number, t.timeit(number=number)))

Output:

100: 0.004333864999352954
200: 0.009334164000392775
300: 0.013926845000241883

The example above used a lambda expression though timit allows the code to be run represented as a string. Below we loop through 0 to 100 and join all the numbers with a hyphen.

Code:

for number in [100, 200, 300]:
    print("{0}: {1}".format(number, timeit.timeit('"-".join(str(n) for n in range(100))', number=number)))

Output:

100: 0.0035422290002316004
200: 0.006914595000125701
300: 0.009374900999318925

Python: Unit Testing

Unit Testing

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

The ability for software to test and diagnose itself is a powerful feature.

A Simple Example

Lets take a simple function which adds two numbers together.

Code:

def add_two_numbers(a, b):
    """
    A simple method to test
    """

    return a + b

We can create a test to ensure that add_two_numbers works as expected by comparing the result of a call to the function with our expected result.

Code:

from unittest import TestCase, main

class MyTestClass(TestCase):
    """
    A simple unit test example
    """

    def test_add_two_numbers(self):
        self.assertEqual(add_two_numbers(1, 2), 3)

A test class inherits from unittest.TestCase. All functions which are prefixed with test_ will be determined as tests which are required to be run.

Above we call add_two_numbers with parameters 1 and 2. We then use the returned value as a parameter to the assertEqual function along with our expected result of 3.

If the assertion validates as expected the assertion returns allowing control to carry on, otherwise an error is raised and the test is marked as failed.

A test function can have any number of assertions called.

We can run our test function with the main function from unittest.

Code:

if __name__ == '__main__':
    main()

Output:

.py::MyTestClass true
Testing started at 12:42 …

Process finished with exit code 0

If a bug appeared in our code we would see a result similar to the following.

Output:

.py::MyTestClass true
Testing started at 12:44 …

Process finished with exit code 0

Failure
Traceback (most recent call last):
File “/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/Testing/unittest_examples/simple_example.py”, line 26, in test_add_two_numbers
self.assertEqual(add_two_numbers(1, 2), 4)
AssertionError: 3 != 4

Assertions

In the previous section we saw the assertEqual assertion. The unittest module provides many assertion functions to cater for a range of possible test criteria.

Equals Assertions

Equality assertion can be made with the assertEqual and inequality assertion can be made with the assertNotEqual function. Both functions take two parameters; the result and the expected result.

Code:

self.assertEqual(1, 1)
self.assertNotEqual(1, 2)

For numerical results the assertAlmostEqual and assertNotAlmostEqual functions allow equality assertion within a tolerance of error. The tolerance is passed in as the third parameter and represents the number of decimal places to be used when determining equality.

The call to assertAlmostEqual takes 1.1 and 1.11 with a tolerance of 1 d.p. This would fail if we used assertEqual but as 1.11 becomes 1.1 when rounding to 1 d.p and therefore the assertion passes.

Code:

self.assertAlmostEqual(1.1, 1.11, 1)  # 3rd argument is the precession
self.assertNotAlmostEqual(1.1, 1.11, 2)  # 3rd argument is the precession

The assertEqual function can take most types. All of the following asserts for lists, tuples, sets, dictionaries and multi-line strings pass assertion.

When being called for collections, the test requires both collections to be of the same type, contain the same number of elements and the elements at the same ordinal position to be equal.

Code:

self.assertEqual([1, 2, 3], [1, 2, 3]) # list
self.assertEqual((1, 2, 3), (1, 2, 3)) # tuple
self.assertEqual({1, 2, 3}, {1, 2, 3}) # set
self.assertEqual({'a': 1}, {'a': 1})  # dictionary
self.assertEqual("onentwo", "onentwo") # multi-line string

Unittest does provide specific assert equal functions for each type though these are implicitly called via the assertEquals function. You should favour using the assertEquals functions.

Code:

self.assertListEqual([1, 2, 3], [1, 2, 3])
self.assertTupleEqual((1, 2, 3), (1, 2, 3))
self.assertSetEqual({1, 2, 3}, {1, 2, 3})
self.assertDictEqual({'a': 1}, {'a': 1})
self.assertMultiLineEqual("onentwo", "onentwo")

Code:

The assertEqual function works upon equality; as such an integer of value 1 and a float of value 1.0 will pass an assertion check together.

self.assertEqual(1.0, 1)

Booleans Assertions

The assertFalse and assertTrue functions for ensuring that a boolean type is either false or true respectively.

Code:

self.assertFalse(False)
self.assertTrue(True)

Collections Assertions

A number of assertions specifically for collections are provided.

We have already seen the assertEqual function which determines if two parameters are equal.

When working with collections this performs the following checks

  • The collection types are equal
  • The collections contain the same number of elements
  • Each element at the same ordinal position equals that in the other collection.

The elements can be of another type as long as their values are equal. In the example below one list contains integers and the other floats but the assertion passes as the elements are equal.

Code:

self.assertEqual([1.0, 2.0, 3.0], [1, 2, 3]) 

The assertSequenceEqual function works the same as assertEqual though it will not fail if the collections are of different types. Below we ensure that the contents of a list and a tuple are equal.

Code:

self.assertSequenceEqual((1, 2, 3), [1, 2, 3])  # Checks only the sequence

The assertIn and assetNotIn funcitons allows checks to see if an element is contained or not contained within a collection. The check is based upon equality.

Here we check that 1 is in 1,2,3 and that 4 is not in 1, 2, 3.

Code:

self.assertIn(1, (1, 2, 3))
self.assertNotIn(4, (1, 2, 3))

The assertCountEqual function has to be a contender for the worst named function in history. This function ensures that two collections contain exactly the same elements though their order is not important.

Code:

self.assertCountEqual((1, 2, 3), (3, 2, 1))  # Badly named. This checked elements and not their order

Comparison Assertions

Python provides the comparison checks in the form of less than, less than or equal to, greater than and greater than or equal to.

Code:

self.assertLess(1, 10)
self.assertLessEqual(1, 1)
self.assertGreater(10, 1)
self.assertGreaterEqual(1, 1)

Identity Assertions

Identity ensures that two parameters point to the same object instance.

In Python each type instance is assigned it’s own object id upon creation. More information can be found here .

The assertIs and assertIsNot can ensure that two objects are and are not the same instance respectively.

Code:

self.assertIs(1, 1)
self.assertIsNot(1, 2)

For parameters which are not referencing any data or have not been initialised they will point to the None type. Here we can check to see if a parameter is pointing to or not pointing to None with the assertIsNone and assertIsNotNone functions.

Code:

self.assertIsNone(None)
self.assertIsNotNone(1)

The assertIsInstance and assertNotIsInstance functions can be used to see if a parameter holds a specific type. Here we pass a parameter holding an instance of a type along with the class name of the type that we want to insure it references or does not reference.

Code:

self.assertIsInstance((), tuple)
self.assertNotIsInstance((), set)

Regular Expressions Assertions

Code:

We can use regular expressions to ensure the format of a string is as expected with the assertRegex and assertNotRegex functions

self.assertRegex('Luke', "^[a-zA-Z]{3,4}$")
self.assertNotRegex('Lukey', "^[a-zA-Z]{3,4}$")

Exceptions Assertions

Code should throw exceptions when we want it to or when it is called incorrectly. We can use the assertRaises function to assert that not only an exception is raised but it is of a certain type.

Below we ensure that a ZeroDivisionError error is raised.

Code:

with self.assertRaises(ZeroDivisionError) as ex:
    result = 1 / 0

self.assertEqual(str(ex.exception), "division by zero")

In the above example we assign the raised exception to a variable ex, we can then run assertions upon the exception to make sure it is as expected. We check the string representation of the object is as expected. The latter check can be enforced with the assertRaisesRegex function.

Code:

with self.assertRaisesRegex(ZeroDivisionError, "^division by [a-zA-z]{4}$"):
    result = 1 / 0

We can also annotate a test with the @expectedFailure attribute. Here the test will fail if an error is not raised.

Output:

  @expectedFailure
    def test_expectedFailure(self):
        self.fail("This is an expected failure")

Warnings Assertions

Python provides the same functions for warnings as it does for exceptions; they work in exactly the same way

Code:

with self.assertWarns(DeprecationWarning) as wn:
    warn("deprecated", DeprecationWarning)

self.assertEqual(str(wn.warning), "deprecated")

with self.assertWarnsRegex(DeprecationWarning, "^deprecate[a-z]$"):
    warn("deprecated", DeprecationWarning)

Assertions Messages

Each assertion can optionally take a string to be used as an error message when the test fails.

Code:

self.assertFalse(False, "False is not false!")

Would report as the following:

Output:

AssertionError: True is not false : False is not false

The following would be reported if the error message had not been provided.

Output:

AssertionError: True is not false

Failing Tests

We can fail a test in code with the fail method.

Code:

self.fail("Fail!!!")

Test Fixture

If a test class has a function called setUp, it will be run before every test function within it. If an error is raised within the setUp function then no test functions will be run.

If a test class has a function called tearDown, it will be run after every test function within it. This function will always be run after each test function regardless if the test passes or fails.

Code:

from unittest import TestCase


class TestFixtureExample(TestCase):

    def setUp(self):
        # Set up / initialise before a test
        # If this fails then no tests will be run
        print("In the setUp")

    def tearDown(self):
        # Destroy any resources required during the test
        # Will always be run if setUp runs regardless of tests successes
        print("In the tearDown")

    def test_fixture_one(self):
        self.assertTrue(True)

    def test_fixture_two(self):
        self.assertTrue(True)

    def test_fixture_three(self):
        self.assertTrue(True)

Output:

.py::TestFixtureExample true
Testing started at 14:17 …
In the setUp
In the tearDown
In the setUp
In the tearDown
In the setUp
In the tearDown

Test Suite

The TestSuite class can be used to register tests which can then be run with the TextTestRunner.

The addTest can be used to add an individual test method into a TestSuite instance.

The TestLoader().loadTestsFromTestCase() can be used to create a TestSuite with all test functions of a test class.

The TextTestRunner().run() function can then run all TestSuites passed in.

**Code:

from unittest import TestSuite, TextTestRunner, TestLoader

# Test Suite
def my_test_suite():
    suite_one= TestSuite()
    suite_one.addTest(MyTestClass('test_add_two_numbers')) # Adds MyTestClass.test_add_two_numbers()

    suite_two = TestLoader().loadTestsFromTestCase(TestAssertsExample)

    return TestSuite([suite_one, suite_two])

# Run the test suite
if __name__ == '__main__':
    TextTestRunner().run(my_test_suite())

Skipping Tests

Test functions can be annotated with specific unittest attributes.

Skip can be used to stop a test from running. This can also be done in code with the SkipTest function

SkipIf can be used to stop a test from running if a boolean statement evaluates to true.

SkipUnless can be used to stop a test from running unless a boolean statement evaluates to true.

Code:

class TestAttributes(TestCase):

    @skip("Test is not run")
    def test_skip(self):
        self.fail("This should not be run")

    @skipIf(True, "This is not run")
    def test_skipIf(self):
        self.fail("This should not be run")

    @skipUnless(False, "This is not run")
    def test_skipUnless(self):
        self.fail("This should not be run")

    def test_skipTest(self):
        SkipTest("This should not be run")

Python: File I/O

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

This article runs through reading and writing to files.

User Input

When running terminal applications we can write to the terminal stream with the print command and also collect information from the user with the input command.

The input command takes a string to display to the user prompting them to input some data.

In the following example we ask the user their name with the input command and assign it to a variable called name.

It is important to note that the runtime environment pauses the application while it is waiting the user for their input.

Code:

name = input('What is your name?: ')
print("Hello", name)

File Class

The remainder of the article looks at how to read and write to physical files in various formats of data.

All file access in Python uses a File class which is generated with the open function. Below defines the various parameters permissible to the open function.

Code:

a_file = open("FileName.txt", "[rw//r+][b]")

The first parameter is the file path of the file to read or write, the second parameter is a string representing the access type and file type required.

File access can be defined as permutations of read or write.

Access Description
r Readonly, which is the default
w Write, overwrite all existing data
a Write Append, opened at the end of file for appendage
r+ Allow read and write access

By default it is assumed UTF-8 text access is required. The usage of b along with the access type can define binary access.

Type Description
[none] Defaults as UTF-8 (text)
b Binary access

With Statement

As a file handle is an expensive resource you should always call close upon the file after you have finished working with it.

Code:

a_file = open("afile.txt")
# Actions upon the file.
a_file.close()

As you can never guarantee that your code will run all the way through to the calling of the close function without an error, it is best practice to place protection around your code to ensure it is always called. You could place the close() within the finally statement of a try catch block though Python provides the with statement which is easier and more elegant to use.

with open(text_file, "w") as f:
    # Code within with scope

# Code outside of with scope.

Here the close function of the file is automatically called upon leaving the with scope regardless if an error is thrown or not.

Path Class

The Path class from within the OS name space provides a handy function called join. Here we can provide a starting directory along with any number of directories and a file name to join.

The advantage of using the join method is that it will always use the correct path separator character regardless of which operating system your code is running on.

In the following example we join a directory called output to a file named output.txt to create a relative file path.

Code:

from os import path

text_file = path.join('output', 'output.txt')

Output:

output/output.txt

Using this relative file path will create a file relative to the file the program initially started running from; ideal for us to test stream usage.

Writing Text Files

To write to a file as text we simply need to create a file handle with the Open function, as mentioned above, along with the file path and the ‘w’ access mode. The file will default to UTF-8 text encoding.

Below we loop through a list of strings and write them to the file with the write function. We also add a new line character after each string by writing “\n” to the stream.

Code:

from os import path

text_file = path.join('output', 'output.txt')

data = ["This is a list of strings", "which need to be saved", "into a file"]

with open(text_file, "w") as f:
    for a_line in data:
        f.write(a_line)
        f.write("n")

File Contents:

This is a list of strings
which need to be saved
into a file

We could have used the function writelines to write all the strings contained in a collection.

Code:

with open(text_file, "w") as f:
    f.writelines(data)

Reading Text Files

We can read the contents of the file above by providing the same file path but changing the file access to readable by changing the ‘w’ to ‘r’.

We can then read all lines into a list with the readlines function.

Code:

with open(text_file, "r") as f:
    print(f.readlines())

Output:

[‘This is a list of strings\n’, ‘which need to be saved\n’, ‘into a file\n’]

Notice above that we have a newline character after each line in the file. Python does not automatically strip the newline character off when reading each line.

We can also read one line at a time with the read function. This would require writing an infinite loop and manually breaking when the read function stops returning data.

This is very long winded for Python….. instead we can enumerate the file handle!!! Each iteration is a line in the file and the loop stops automatically when we reach the end of the file.

Code:

with open(text_file, "r") as f:
    for a_line in f:
        print(a_line, end='')  # The file has new line chars also print adds one on by default

Output:

This is a list of strings
which need to be saved
into a file

We use the end=” to prevent the print function automatically adding on a newline character after each output.

In the following example we use list comprehensions to iterate through the file, strip of the newline character with the rstrip function and add each line into a list which we assign to a variable called lines.

Code:

lines = [line.rstrip('n') for line in open(text_file)]
print(lines)

Output:

[‘This is a list of strings’, ‘which need to be saved’, ‘into a file’]

The list constructor can take an enumerator, as such we can actually pass the file handle into the constructor to read each line into an instance of a list.

Code:

print("nRead with list():", text_file)
with open(text_file, "r") as f:
    print(list(f))

Seek & Tell

Python maintains the current location of the file handler with a marker.

When a file is opened the marker is normally initially set at the very start of the file. Writing a file as append actually sets the marker to the very end of the file.

Calling readline on a file handle will read all the contents from the current marker position up to the next newline character. It will then move the marker to the character after the newline it has read.

The position of the marker can be read and set with the tell and seek functions respectively.

Tell returns the position of the marker from the start of the file as bytes.

Seek sets the position of the marker by defining the number of bytes from the start of the file.

The following example opens a file, reads all the content and then resets the marker to the start of the file. Tell is used to show the position throughout the example.

with open(text_file, "r") as f:
    print("Tell: ", f.tell())
    contents = list(f)
    print("Tell: ", f.tell())
    f.seek(0)
    print("Tell: ", f.tell())

Output:

Tell: 0
Tell: 61
Tell: 0

JSON

JavaScript Object Notation is an open standard format that uses human-readable text to pass data or objects.. It strives to be human readable compared to formats such as XML. It uses attribute value pairs to represent the data.

It’s main use is to pass data between a server and web application; web services and AJAX calls.

Python provides a json class as a JSON parser for both serialisation of a python type to a JSON string and de-serialisation back to the python type.

Lets take a dictionary and populate it.

Code:

data = {"numbers": [1, 2, 3, ],
        "written numbers": ['one', 'two', 'three'],
        "characters": ['a', 'b', 'c']}

print(data)
print(type(data))

Output:

{‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3], ‘characters’: [‘a’, ‘b’, ‘c’]}

We can serialise the dictionary to JSON with the dumps function on the json class.

Code:

<br />import json

json_string = json.dumps(data)
print(json_string)
print(type(json_string))

Output:

{“written numbers”: [“one”, “two”, “three”], “numbers”: [1, 2, 3], “characters”: [“a”, “b”, “c”]}

The type has now changed to a string though the physical output in python is the same as when printing the dictionary. This is because python outputs the dictionary to JSON as part of the print command.

We can de-serialise the JSON string back to a dictionary with the loads function on the json class.

Code:

decoded_data = json.loads(json_string)
print(decoded_data)
print(type(decoded_data))

Output:

{‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3], ‘characters’: [‘a’, ‘b’, ‘c’]}

The josn class also provides the ability to serialise to a text file and de-serialise from a text file with the dump and load functions which both take a file instance.

Code:

print("Dumping:", json_file_path)
with open(json_file_path, "w") as f:
    json.dump(data, f)

print("Loading:", json_file_path)
with open(json_file_path, "r") as f:
    loaded_json = json.load(f)
    print("t", loaded_json)
    print("t", type(loaded_json))

Output:

Dumping: output/json.dump
Loading: output/json.dump
{‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3], ‘characters’: [‘a’, ‘b’, ‘c’]}

The dumps and dump function allows us some configuration points for formatting the JSON string.

The sort_keys parameter, defaulting to false, allows the dictionary to be sorted based upon the key value.

The indent parameter defines the number of characters to be used as indentation between nested elements within a collection.

The separators parameter defines a tuple of separation chars between list elements and keys.

Code:

print(json.dumps(data, sort_keys=True, indent=2, separators=(',', ':')))

Output:

{
“characters”:[
“a”,
“b”,
“c”
],
“numbers”:[
1,
2,
3
],
“written numbers”:[
“one”,
“two”,
“three”
]
}

Pickle

JSON is great where compatibility or human readability is required. However if you simply want to persist the state of an object to read it later Python provides pickle; an inbuilt binary format. This will be more efficient than text formats.

The dump function is used to serialise a class instance while the load function is used to de-serialise back to a class instance.

The file parameter takes a handle to a file instance representing the destination or source file.

Code:

import pickle

# Serialisation
try:
    with open(pickle_file, "wb") as output_file:
        pickle.dump(data, file=output_file)
except IOError as err:
    print('File error: ' + str(err))
except pickle.PickleError as pickle_error:
    print('Pickling error: ' + str(pickle_error))

Code:

# Deserialization
try:
    with open(pickle_file, "rb") as input_file:
        loaded_data = pickle.load(input_file)
        print("t", loaded_data)
        print("t", type(loaded_data))
except IOError as err:
    print('File error: ' + str(err))
except pickle.PickleError as pickle_error:
    print('Pickling error: ' + str(pickle_error))

Output:

*** The raw data:
{‘characters’: [‘a’, ‘b’, ‘c’], ‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3]}

Dumping to: output/pickle.data
Loading from: output/pickle.data
{‘characters’: [‘a’, ‘b’, ‘c’], ‘written numbers’: [‘one’, ‘two’, ‘three’], \> ‘numbers’: [1, 2, 3]}

Python: Exceptions

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

Code does not always work as intended. Even if the perfect system could exist, it is not possible to protect around every possible situation from badly formed data files, user input or even the network going down.

Expecting errors is an integral part of coding and provides the developer with the ability to respond when an error has occurred. Whether it is to reverse a database transaction, clearing expensive resources or simply logging and reporting the error.

Try Catch

Like most languages, the try catch statement is the basics building block of error handling.

The try statement defines an area where we would like the run time environment to allows us the opportunity to respond to errors being raised.

The catch or expect statement is the code which will run when an error is raised.

The basic syntax looks like this.

Code:

    try:
        # Code
    expect:
        # Error handling code

Any code after the try statement and before the except statement which causes an error, will immediately stop execution and be resumed within the top of the except statement.

After the except statement has run, control will be passed to the fist line outside of the try catch block, allowing the program to carry on as if no error has been raised.

Code:

def convert_to_int(input_value):
    try:
        x = int(input_value)
        print("{0} can be converted into an int of {1}".format(input_value, x))
    except:
        print("An error was caught!:")

for an_input in ["1", "a", "b"]:
    print("nTrying with:", an_input)
    convert_to_int(an_input)

Output:

Trying with: 1
1 can be converted into an int of 1

Trying with: a
An error was caught!

Trying with: b
An error was caught!

Catch An Exception

If we want to find out more about the error raised we can explicitly catch the exception and assign it to a variable. This will allow us to report the exception to the user or the log system as required.

The example above has been modified by catching the exception into the variable called ex in the except statement. In the error handling code we print the exception to the terminal.

Code:

def convert_to_int(input_value):
    try:
        x = int(input_value)
        print("{0} can be converted into an int of {1}".format(input_value, x))
    except Exception as ex :
        print("The following exception was caught:")
        print(ex)

for an_input in ["1", "a", "b"]:
    print("nTrying with:", an_input)
    convert_to_int(an_input)

Output:

Trying with: 1
1 can be converted into an int of 1

Trying with: a
The following exception was caught:
invalid literal for int() with base 10: ‘a’

Trying with: b
The following exception was caught:
invalid literal for int() with base 10: ‘b’

Exception Granularity

The Exception type is a class which can be inherited from. Python ships with many sublcasses of Exception with the intention of code raising an exception which is more specific to the error being raised.

In code we might want to act differently based upon the error being raised. For example if the network is down we might want to retry but if we have bad data from the user we might want to allow the user to re-input the data.

Python allows catching exceptions by their type as well as the general catch statement which have seen above.

We can provide multiple except statements all catching a different exception type which in turn allows us to respond differently based upon the error being raised.

Code:

try:
    f = open('foo.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as err:
    print("IOError: {0}".format(err))
except ValueError as err:
    print("ValueError: {0}".format(err))
except Exception as err:
    print("Exception: {0}".format(err))
except:
    print("Won't ever execute due to the except condition above")

Output:

IOError: [Errno 2] No such file or directory: ‘foo.txt’

An except statement will run if the exception types are the same or the error being raised has the defined exception type in its ancestry; i.e it inherits directly or indirectly from the defined exception type.

Only one exception statement will run so you should be careful to ensure your exceptions are placed from the most specific to the least specific.

The example above catches the Exception type last which will catch all errors being raised as long as they have not already been caught.

Exception Details

Like all types in Python, the exception is a class and as such contains state and behaviour.

We can write the exception summary to a string with the __str__ method which is called from string format or the print command.

The __traceback__ can be used to read the method stack at the time when the exception was raised.

The args property can be used to determine any additional arguments assigned to the exception when it was raised.

Code:

try:
    1 / 0
except Exception as err:
    print("Exception: {0}".format(err))
    print(err)
    print(err.__traceback__)
    print(err.args)

Output:

Exception: division by zero
division by zero

(‘division by zero’,)

Alternatively the sys.exec_info returns a tuple of information about the current exception being handled.

Code:

import sys

try:
    f = open('foo.txt')
    s = f.readline()
    i = int(s.strip())
except:
    print("Catch!!")
    for a_msg in sys.exc_info():
        print(a_msg)

Output:

Catch!!

[Errno 2] No such file or directory: ‘foo.txt’

Try Catch Finally

Python also allows a finally statement with a try catch block. Here the code is called regardless if an exception is raised or not or whether a raised exception was caught.

  • Iteration 1 raises no error.
  • Iteration 2 raises an error which is caught
  • Iteration 3 raises an error which is not caught.

After the catch statement has run for iteration three the program is terminated due to an exception not being caught.

Code:

def raise_if_true(arg_input):
    try:
        if arg_input == 2:
            raise ValueError("Input was 2")
        elif arg_input == 3:
            raise Exception("Input was 3")
    except ValueError as exception:
        print("Caught:", exception)
    finally:
        print('This is the finally!!!!')

for number in [1, 2, 3]:
    raise_if_true(number)

Output:

This is the finally!!!!
Traceback (most recent call last):
File “/home/lukey/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/Exceptions/try_finally_example.py”, line 18, in
Caught: Input was 2
raise_if_true(number)
This is the finally!!!!
File “/home/lukey/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/Exceptions/try_finally_example.py”, line 11, in raise_if_true
This is the finally!!!!
raise Exception(“Input was 3”)
Exception: Input was 3

Try Catch Finally Else

The else statement can be added onto a try statement to allow an area of code which will be run if no error is raised.

All variables are accessible to the try statement are available in the else statement. Here we assign the result to a variable called result which is created in the try block, we then access this within the else statement.

Code:

def divide(x, y):
    try:
        print("nPerforming: {0} / {1}".format(x, y))
        result = x / y
    except ZeroDivisionError:
        print("division by zero!")
    else:
        print("Result =", result)
    finally:
        print("Executing the finally clause")

divide(1, 2)
divide(1, 0)

Output:

Performing: 1 / 2
Result = 0.5
Executing the finally clause

Performing: 1 / 0
division by zero!
Executing the finally clause

Re-Throwing An Exception

We can re-throw an exception after we have finished handling it. This can be useful if we want the program to finish executing or we would like an outer try catch block to also catch and respond to the error.

We re-throw an error with the raise keyword.

In the following example we re-throw an exception which has been caught. As the try statement is not nested the program will terminate immediately.

Code:

try:
    f = open('foo.txt')
    s = f.readline()
    i = int(s.strip())
except:
    print("Caught!!")
    raise

print("This won't print!!!")

Output:

Caught!!
Traceback (most recent call last):
File “/home/lukey/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/Exceptions/rethrowing_an_exception.py”, line 6, in
f = open(‘foo.txt’)
FileNotFoundError: [Errno 2] No such file or directory: ‘foo.txt’

Raising An Exception

There might be times in your code where you want to raise an exception to trigger common error handling code which exists higher up in the method stack.

An error can be raised by simply creating an instance of the Exception class or any class which inherits from Exception along with the raise command.

The Exception class gathers all constructor arguments and places them into the args collection.

Code:

try:
    raise Exception('spam', 'eggs')
except Exception as inst:
    print(inst)
    print(inst.args)

Output:

(‘spam’, ‘eggs’)
(‘spam’, ‘eggs’)

Sublassing Exceptions

Any class which has the Exception type within its ancestry can be raised and caught in Python.

Inheriting from Exception allows catching to be granular as we have seen previously but it also allows us to add state and behaviour onto an exception.

In the following example we subclass exception to allow a field called value to be set during error raising and read during the error handling.

Code:

class MyError(Exception):
    def __init__(self, value):
        self.value = value

    def __str__(self):
        return repr(self.value)

try:
    raise MyError(2 * 2)
except MyError as e:
    print(e.value)
    print(e)

Output:

4
4