Python: File I/O

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

This article runs through reading and writing to files.

User Input

When running terminal applications we can write to the terminal stream with the print command and also collect information from the user with the input command.

The input command takes a string to display to the user prompting them to input some data.

In the following example we ask the user their name with the input command and assign it to a variable called name.

It is important to note that the runtime environment pauses the application while it is waiting the user for their input.

Code:

name = input('What is your name?: ')
print("Hello", name)

File Class

The remainder of the article looks at how to read and write to physical files in various formats of data.

All file access in Python uses a File class which is generated with the open function. Below defines the various parameters permissible to the open function.

Code:

a_file = open("FileName.txt", "[rw//r+][b]")

The first parameter is the file path of the file to read or write, the second parameter is a string representing the access type and file type required.

File access can be defined as permutations of read or write.

Access Description
r Readonly, which is the default
w Write, overwrite all existing data
a Write Append, opened at the end of file for appendage
r+ Allow read and write access

By default it is assumed UTF-8 text access is required. The usage of b along with the access type can define binary access.

Type Description
[none] Defaults as UTF-8 (text)
b Binary access

With Statement

As a file handle is an expensive resource you should always call close upon the file after you have finished working with it.

Code:

a_file = open("afile.txt")
# Actions upon the file.
a_file.close()

As you can never guarantee that your code will run all the way through to the calling of the close function without an error, it is best practice to place protection around your code to ensure it is always called. You could place the close() within the finally statement of a try catch block though Python provides the with statement which is easier and more elegant to use.

with open(text_file, "w") as f:
    # Code within with scope

# Code outside of with scope.

Here the close function of the file is automatically called upon leaving the with scope regardless if an error is thrown or not.

Path Class

The Path class from within the OS name space provides a handy function called join. Here we can provide a starting directory along with any number of directories and a file name to join.

The advantage of using the join method is that it will always use the correct path separator character regardless of which operating system your code is running on.

In the following example we join a directory called output to a file named output.txt to create a relative file path.

Code:

from os import path

text_file = path.join('output', 'output.txt')

Output:

output/output.txt

Using this relative file path will create a file relative to the file the program initially started running from; ideal for us to test stream usage.

Writing Text Files

To write to a file as text we simply need to create a file handle with the Open function, as mentioned above, along with the file path and the ‘w’ access mode. The file will default to UTF-8 text encoding.

Below we loop through a list of strings and write them to the file with the write function. We also add a new line character after each string by writing “\n” to the stream.

Code:

from os import path

text_file = path.join('output', 'output.txt')

data = ["This is a list of strings", "which need to be saved", "into a file"]

with open(text_file, "w") as f:
    for a_line in data:
        f.write(a_line)
        f.write("n")

File Contents:

This is a list of strings
which need to be saved
into a file

We could have used the function writelines to write all the strings contained in a collection.

Code:

with open(text_file, "w") as f:
    f.writelines(data)

Reading Text Files

We can read the contents of the file above by providing the same file path but changing the file access to readable by changing the ‘w’ to ‘r’.

We can then read all lines into a list with the readlines function.

Code:

with open(text_file, "r") as f:
    print(f.readlines())

Output:

[‘This is a list of strings\n’, ‘which need to be saved\n’, ‘into a file\n’]

Notice above that we have a newline character after each line in the file. Python does not automatically strip the newline character off when reading each line.

We can also read one line at a time with the read function. This would require writing an infinite loop and manually breaking when the read function stops returning data.

This is very long winded for Python….. instead we can enumerate the file handle!!! Each iteration is a line in the file and the loop stops automatically when we reach the end of the file.

Code:

with open(text_file, "r") as f:
    for a_line in f:
        print(a_line, end='')  # The file has new line chars also print adds one on by default

Output:

This is a list of strings
which need to be saved
into a file

We use the end=” to prevent the print function automatically adding on a newline character after each output.

In the following example we use list comprehensions to iterate through the file, strip of the newline character with the rstrip function and add each line into a list which we assign to a variable called lines.

Code:

lines = [line.rstrip('n') for line in open(text_file)]
print(lines)

Output:

[‘This is a list of strings’, ‘which need to be saved’, ‘into a file’]

The list constructor can take an enumerator, as such we can actually pass the file handle into the constructor to read each line into an instance of a list.

Code:

print("nRead with list():", text_file)
with open(text_file, "r") as f:
    print(list(f))

Seek & Tell

Python maintains the current location of the file handler with a marker.

When a file is opened the marker is normally initially set at the very start of the file. Writing a file as append actually sets the marker to the very end of the file.

Calling readline on a file handle will read all the contents from the current marker position up to the next newline character. It will then move the marker to the character after the newline it has read.

The position of the marker can be read and set with the tell and seek functions respectively.

Tell returns the position of the marker from the start of the file as bytes.

Seek sets the position of the marker by defining the number of bytes from the start of the file.

The following example opens a file, reads all the content and then resets the marker to the start of the file. Tell is used to show the position throughout the example.

with open(text_file, "r") as f:
    print("Tell: ", f.tell())
    contents = list(f)
    print("Tell: ", f.tell())
    f.seek(0)
    print("Tell: ", f.tell())

Output:

Tell: 0
Tell: 61
Tell: 0

JSON

JavaScript Object Notation is an open standard format that uses human-readable text to pass data or objects.. It strives to be human readable compared to formats such as XML. It uses attribute value pairs to represent the data.

It’s main use is to pass data between a server and web application; web services and AJAX calls.

Python provides a json class as a JSON parser for both serialisation of a python type to a JSON string and de-serialisation back to the python type.

Lets take a dictionary and populate it.

Code:

data = {"numbers": [1, 2, 3, ],
        "written numbers": ['one', 'two', 'three'],
        "characters": ['a', 'b', 'c']}

print(data)
print(type(data))

Output:

{‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3], ‘characters’: [‘a’, ‘b’, ‘c’]}

We can serialise the dictionary to JSON with the dumps function on the json class.

Code:

<br />import json

json_string = json.dumps(data)
print(json_string)
print(type(json_string))

Output:

{“written numbers”: [“one”, “two”, “three”], “numbers”: [1, 2, 3], “characters”: [“a”, “b”, “c”]}

The type has now changed to a string though the physical output in python is the same as when printing the dictionary. This is because python outputs the dictionary to JSON as part of the print command.

We can de-serialise the JSON string back to a dictionary with the loads function on the json class.

Code:

decoded_data = json.loads(json_string)
print(decoded_data)
print(type(decoded_data))

Output:

{‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3], ‘characters’: [‘a’, ‘b’, ‘c’]}

The josn class also provides the ability to serialise to a text file and de-serialise from a text file with the dump and load functions which both take a file instance.

Code:

print("Dumping:", json_file_path)
with open(json_file_path, "w") as f:
    json.dump(data, f)

print("Loading:", json_file_path)
with open(json_file_path, "r") as f:
    loaded_json = json.load(f)
    print("t", loaded_json)
    print("t", type(loaded_json))

Output:

Dumping: output/json.dump
Loading: output/json.dump
{‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3], ‘characters’: [‘a’, ‘b’, ‘c’]}

The dumps and dump function allows us some configuration points for formatting the JSON string.

The sort_keys parameter, defaulting to false, allows the dictionary to be sorted based upon the key value.

The indent parameter defines the number of characters to be used as indentation between nested elements within a collection.

The separators parameter defines a tuple of separation chars between list elements and keys.

Code:

print(json.dumps(data, sort_keys=True, indent=2, separators=(',', ':')))

Output:

{
“characters”:[
“a”,
“b”,
“c”
],
“numbers”:[
1,
2,
3
],
“written numbers”:[
“one”,
“two”,
“three”
]
}

Pickle

JSON is great where compatibility or human readability is required. However if you simply want to persist the state of an object to read it later Python provides pickle; an inbuilt binary format. This will be more efficient than text formats.

The dump function is used to serialise a class instance while the load function is used to de-serialise back to a class instance.

The file parameter takes a handle to a file instance representing the destination or source file.

Code:

import pickle

# Serialisation
try:
    with open(pickle_file, "wb") as output_file:
        pickle.dump(data, file=output_file)
except IOError as err:
    print('File error: ' + str(err))
except pickle.PickleError as pickle_error:
    print('Pickling error: ' + str(pickle_error))

Code:

# Deserialization
try:
    with open(pickle_file, "rb") as input_file:
        loaded_data = pickle.load(input_file)
        print("t", loaded_data)
        print("t", type(loaded_data))
except IOError as err:
    print('File error: ' + str(err))
except pickle.PickleError as pickle_error:
    print('Pickling error: ' + str(pickle_error))

Output:

*** The raw data:
{‘characters’: [‘a’, ‘b’, ‘c’], ‘written numbers’: [‘one’, ‘two’, ‘three’], ‘numbers’: [1, 2, 3]}

Dumping to: output/pickle.data
Loading from: output/pickle.data
{‘characters’: [‘a’, ‘b’, ‘c’], ‘written numbers’: [‘one’, ‘two’, ‘three’], \> ‘numbers’: [1, 2, 3]}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s