Python: OS, Shutil, Glob & Unipath

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

This article looks at ways of interacting with the operating and file system.

OS

The OS module is a collection of miscellaneous operating system interfaces. If you want to interact with the operating system or call operating system functionality then you can probably do it with the OS module.

The following outlines some of the most useful functionality.

We can get the name and details of the OS with the name property and uname function. Uname returns a tuple with the following information.

  • sysname: operating system name
  • nodename: name of machine on network (implementation-defined)
  • release: operating system release
  • version: operating system version
  • machine: hardware identifier

Code:

print(os.name)
print(os.uname())

Output:

posix
posix.uname_result(sysname=’Linux’, nodename=’Minty’, release=’3.13.0-37-generic’, version=’#64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014′, machine=’x86_64′)

We can determine the current working directory and also change it with the getcwd and chdir functions.

Code:

print(os.getcwd())
os.chdir('/')
print(os.getcwd())

Output:

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/StandardLibrary
/

We can access the user group id and the user id of the user principal of the active running thread along with a list of all user group ids for the OS.

Code:

print(os.getgid())
print(os.getuid())
print(os.getgroups()) 

Output:

1000
1000
[4, 24, 27, 30, 46, 108, 110, 1000]

The getenv and and putenv can be used to get and set system variables respectively.

Code:

print(os.getenv("HOME"))
os.putenv("Fluffy", "Yes")

Output:

/home/luke

We can create directories and remove them with the mkdir and rmdir functions. We can also run bash commands with the system function.

os.mkdir("foo")
os.rmdir("foo")
os.system("mkdir XXX")

Shutil

The shutil module provides a high level interface when working with a file or a collection of files.

The following outlines some of the most useful functionality.

We can move and copy files with the move and copy functions.

The copytree function can be used to copy a directory and all it’s contents while the rmtree can be used to delete a directory and all it’s contents.

Code:

from shutil import move, copy, copytree, rmtree, 

move("source_file.txt", "target_dir" )
copy("source_file.txt", "target_file.txt" )
copytree("source_dir", "target_dir")
rmtree("source")

We can use shutil to compress and uncompress files and directories with the make_archive and unpack_archive functions respectively. The format has to be registered upon the system and with python. We can determine which formats are configured with the get_archive_formats function. It returns a list of tuples, each tuple contains the file extension and the format description.

Code:

from shutil import get_archive_formats, make_archive, unpack_archive

print(get_archive_formats())
make_archive(archive_name, 'gztar', root_dir)
unpack_archive(archive_name, "source_dir", "gztar" )

Output:

[(‘bztar’, “bzip2’ed tar-file”), (‘gztar’, “gzip’ed tar-file”), (‘tar’, ‘uncompressed tar file’), (‘zip’, ‘ZIP file’)]

The disk_usage function can be used to determine the disk usages statistics for a partition or hard disk. It returns a tuple of total the hard disk size, the used size and the free size in bytes.

Code:

from shutil import disk_usage

print(disk_usage("/"))

Output:

usage(total=20507914240, used=7373537280, free=12069023744)

The which command can be used to find the location of an executable which is locatable within the path variable. Here we use it to find the location of the python and python3 executables.

Code:

from shutil import which

print(which("python"))
print(which("python3"))

Output:

/usr/bin/python
/usr/bin/python3

Glob

Glob provides functionality for getting a list of files on a hard disk from a search pattern. The pattern rules for glob are not actually regular expressions but standard Unix path expansion rules.

  • * matches zero or more characters as wild card characters
  • ? matches a single character as a wild card character
  • [] matches a single character form a list of possibilities. Allows the character ‘-‘ to determine a range and the character ‘!’ to negate.
    • [0123456789] matches any number
    • [abc] matches letters a, b or c as lowercase letters
    • [0-9] matches any number
    • [a-zA-Z] matches any letter upper or lowercase
    • [!abc] matches anything except letters a, b or c

Glob can be used with relative or absolute paths.

Glob is not recursive; i.e it will only search local entities and not within subdirectories. You can use os.walk to search recursively.

The following matches any file which has an extension of ‘txt’.

Code:

from glob import glob

for name in glob('*.txt'):
    print(name)

The following will match any file which ends in og.txt and has one first character which can be anything.

Code:

from glob import glob

for name in glob('?og.txt'):
    print(name)

The following will match any file which ends in og.txt and has one first character which can be any lowercase letter.

Code:

from glob import glob

for name in glob('[a-z]og.txt'):
    print(name)

The following will match any file which ends in og.txt and has one first character which is anything except for lowercase a, b, c, d or e.

Code:

from glob import glob

for name in glob('[!abcde]og.txt'):
    print(name)

Unipath

Unitpath is a Object-oriented alternative to os, os.path and shutil.

Path

Everything works around the Path class which can take any number of path name components which are concatenated to use the correct pathname separator. It supports glob style syntax as well as relative and absolute paths.

A Path instance can be created from any of the following.

Code:

Path("/", "home", "lukey")        # An absolute path of /home/lukey
Path("foo", "log.txt")            # A relative path of foo/log.txt
Path(__file__)                    # The current running file
Path()                            # Path(os.curdir)
p = Path("")                      # An empty path

Path Properties

The path class have number of properties which can be used to return a Path instance of the parent directory, the file name, the file extension and the file name without the extension.

The components function can be used to get a list of directories which define the Path instance, each as an instance of Path.

Code:

here = Path(__file__)
print(here)

print(here.components())            # A list of all the directories and the file as Path instances.
print(here.parent)                  # The path without the file name
print(here.name)                    # The file name
print(here.ext)                     # The file extension
print(here.stem)                    # The  file name without the extension

Output:

[Path(‘/’), Path(‘data’), Path(‘data’), Path(‘Dropbox’), Path(‘Development’), Path(‘SandBox’), Path(‘Git’), Path(‘ThePythonPit’), Path(‘PythonSandBox’), Path(‘StandardLibrary’), Path(‘unipath_example.py’)]

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/> PythonSandBox/StandardLibrary

unipath_example.py

.py

unipath_example

Child & Parent Methods

We can access the parent directory of a path instance with the parent property as shown above.

We can jump up the ancestry tree x times with the ancestor method; this is the same as calling parent x times.

We can walk down the ancestry tree with the child method passing all components of the path to the required directory or file.

Code:

print(here.parent)                  # The containing directory
print(here.ancestor(5))             # Up x entities ( same as calling parent x times).
print(here.ancestor(3).child("PythonSandBox", "StandardLibrary")) # Returns the child as defined by the components.

Output:

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/StandardLibrary

/data/data/Dropbox/Development/SandBox

/data/data/Dropbox/Development/SandBox/Git/ThePythonPit/PythonSandBox/StandardLibrary

Expand, Expand User and Expand Vars

Path instances can be defined with the ~, system variables and also the .. notation.

Note: ~ represents the users home director for Linux/Unix.
Note: .. is a notation for up one directory when defining relative paths.

We can expand these relative path notations to absolute paths. The function expand_user will expand ~ while expand_vars will expand system variables. The norm function will expand the .. and . notations.

Alternatively the expand function will expand ~, system variables as well as the .. and . notations.

Code:

print(Path("~").expand_user() )     # Expands ~ to a absolute path name
print(Path("$HOME").expand_vars())  # Expands system variables
print(Path("/home/luke/..").norm()) # Expands .. and . notation
print(Path("$HOME/..").expand())    # Expands system variables, ~ and also ..

Output:

/home/luke
/home/luke
/home
/home

File Attributes and permissions

The path class also has a number of attributes which can be used to return information about the file or directory. Most of them are self explanatory.

Note that the atime and ctime functions return time as seconds past the epoch which for unix is the first second of 1970. You can find out the epoch with gmtime(0).

Code:

here = Path(__file__)

print(here.atime())                     # Last access time
print(here.ctime())                     # Last permission or ownership modification; windows is creation time
print(here.isfile())                    # Is this a file? Symbolic links are followed
print(here.isdir())                     # Is this a directory? Symbolic links are followed
print(here.islink())                    # Is a symbolic link?
print(here.ismount())                   # Is a mount point; i.e. is the parent on a different device?
print(here.exists())                    # File or directory actually exists? Symbolic links are followed.
print(here.lexists())                   # Same as exists but symbolic links are not followed
print(here.size())                      # File size in bytes
print(Path("/foo").isabsolute())        # Is an absolute and not a relative path

The function gmtime can be used to determine the epoch.

Code:

print(gmtime(0))

Output:

time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

The stat and lstat can be used to get statistics for a file. Stat will navigate through symbolic links while lstat will use the file the path instance is looking at regardless if it if is a symbolic link or not.

Code:

here = Path(__file__)
print(here.stat())                      # File stat object for size, permissions etc. Symbolic links are

Output:

os.stat_result(st_mode=33188, st_ino=2753975, st_dev=2052, st_nlink=1, st_uid=1000, st_gid=1000, st_size=3054, st_atime=1434042724, st_mtime=1434042724, st_ctime=1434042724)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s