Python: Collections

Collections

This is part of my Python & Django Series which can be found here including information on how to download all the source code.

Collections are types which hold references 0, 1 or more elements.

In Python collections can hold elements which are of different types.

In general, collections are mutable; i.e. their state can change by adding, removing and replacing elements.

There are, as per most languages, many types of collections, all of which exist to solve various scenarios.

List

The list type is probably the most used collection in python. It represents a mutable collection of elements which are referenced by their ordinal position.

To create an instance of a list you use the [] notation.

We can use the type function to determine the class as well as the print function to see its contents.

Code:

numbers = []
print(type(numbers), numbers)

Output:

<class ‘list’\> []

We can initialise a list with elements by declaring them in the call to the constructor separated by commas.

Code:

numbers = [1, 2, "three", 4]
print(type(numbers), numbers)

Output:

<class ‘list’\> [1, 2, ‘three’, 4]

Elements can be accessed via their ordinal position which starts at 0 and ends at size – 1 via [index]. If we try to access an element at an index which does not exist we get a IndexError exception raised.

Code:

numbers = [1, 2, "three", 4]
print(numbers[0])
print(numbers[10])

Output:

1
IndexError: list index out of range

Alternatively we can ask the index of an element with the index method. If the element does not exist we get a ValueError exception raised.

Code:

numbers = [1, 2, "three", 4]
print(numbers.index('three'))
print(numbers.index('nine'))

Output:

2
ValueError: ‘nine’ is not in list

We can use the in operator and the not in operators to determine if a list contains an element.

Code:

numbers = [1, 2, "three", 4]
print('three' in numbers)
print('nine' not in numbers)

Output:

True
True

The len function can be used to return the length or number of elements in a collection.

Code:

numbers = [1, 2, "three", 4]
print(len(numbers))

Output:

4

Collections are dynamic and can contain any class instance, including other collections. Here we create list of lists.

Code:

print([[1], [1, 2], [1, 2, 3]])

Output:

[[1], [1, 2], [1, 2, 3]]

The append function can add an element to the end of a list.

The extend function can add all elements within another collection to the end of a list.

The insert function can be used to add an element into a specific ordinal position. All elements with an ordinal position greater than or equal to the index will have their ordinal position incremented by 1.

An element can be removed via the remove function and the ordinal position. All elements with an ordinal position greater than the index will have their ordinal position decremented by 1.

Code:

numbers = [1, 2, "three", 4]

# Append an element
numbers.append(9)
print(numbers)

# Append multiple elements
numbers.extend([10, 11])
print(numbers)

# Insert element into an index
numbers.insert(12, 99999)
print(numbers)

# Removes an element at a specified index
numbers.remove(10)
print(numbers)

Output:

[1, 2, “three”, 4]

[1, 2, ‘three’, 4, 9]

[1, 2, ‘three’, 4, 9, 10, 11]

[1, 2, ‘three’, 4, 9, 10, 11, 99999]

[1, 2, ‘three’, 4, 9, 11, 99999]

The + operator can be used to create a new list containing all elements in the first list followed by all elements in the second list.

Code:

print([1, 2, "three", 4] + [5, 'six', 7, 8])

Output:

[1, 2, ‘three’, 4, 5, ‘six’, 7, 8]

The * operator can be used to repeat the contents of a list x times.

Code:

print([1] * 5)

Output:

[1, 1, 1, 1, 1]

We can iterative through the elements in a collection with a for in command.

Code:

for number in [1, 2, 3]:
print(number)

Output:

1
2
3

We can count the number of occurrences of an element within the collection with the count function.

Code:

print( [1, 1, 1, 1, 1].count(1))

Output:

5

The clear function can be used to remove all elements.

Code:

queue.clear()

It is important to note that many of the list functions are available to other collection types.

Tuple

A tuple is an immutable list; i.e once declared it is read-only and cannot have any elements added, removed or replaced.

A tuple can contain mutable objects which can themselves be changed.

All list functionality which does not affect the state are applicable for tuples; slicing, indexing etc

They are faster than lists and should be used when the collection is modelling constant data.

They are an ideal contender for dictionary keys where multiple elements determine the unique key. They can only be used in dictionaries if they do not contain any mutable elements.

They are created with().

Code:

empty = ()
a_tuple = ('abcd', 786, 2.23, 'john', 70.2)
print(empty)
print(a_tuple)
print(type(a_tuple))

Output:

()
(‘abcd’, 786, 2.23, ‘john’, 70.2)
<class ‘tuple’\>
They are implicitly created via packing; multiple comma separated arguments being assigned to a single variable. A trailing comma is required where only one element is to be added.

Code:

tupe_one = 1, 2, 3
tuple_two = 1,

print(tuple_one)
print(tuple_two)

Output:

(1, 2, 3)
(1,)

We can also unpack a tuple’s elements into separate variables, in fact this works for most collections. If the number of elements and variables do not align exactly an error is raised.

Code:

one, two, three, four = (1, 2, 3, 4)
print(one, two, three, four)

Output:

1 2 3 4

Tuples support most read-only style functionality of lists

Code:

print(a_tuple) # Prints complete list
print(a_tuple[0]) # Prints first element of the list
print(a_tuple[1:3]) # Prints elements starting from 2nd utill 3rd
print(a_tuple[2:]) # Prints all elements starting from 3rd element
print(a_tuple * 2) # Prints list two times
print(a_tuple + a_tuple) # Prints concatenated tuples

Output:

(‘abcd’, 786, 2.23, ‘john’, 70.2)
abcd
(786, 2.23)
(2.23, ‘john’, 70.2)
(‘abcd’, 786, 2.23, ‘john’, 70.2, ‘abcd’, 786, 2.23, ‘john’, 70.2)
(‘abcd’, 786, 2.23, ‘john’, 70.2, ‘abcd’, 786, 2.23, ‘john’, 70.2)

One important last feature of tuples is that the elements can be named upon creation and then referenced as if they are state. The data is always read-only and trying to change the state will result in an AttributeError exception being raised.

Output:

Person = namedtuple("Person", ["name", "age"])
a_person = Person(name="Luke", age=36)

print(a_person)
print(type(a_person))
print(a_person.name)
print(a_person.age)

Ouyput:

Person(name=’Luke’, age=36)
<class ‘__main__.Person’\>
Luke
36

Dictionary

The dictionary class represents a collection of elements without order but are keyed or accessed upon an entity which should be unique in the set of data.

The {} is used to create a dictionary. We can use the type and len functions to determine the class type and the number respectively.

The key can be of any type as long as it is immutable; numbers and strings for example. Tuples are immutable and can be used as long as they only contain elements which are immutable.

Code:

dictionary_one = {}
print(type(dictionary_one))
print(dictionary_one)
print("Length: ", len(dictionary_one))

Output:

<class ‘dict’\>
{}
Length: 0

In fact {} is just a shortcut to the dict class which can be invoked by its constructor.

Code:

dictionary_one = dict()
print(type(dictionary_one))

Output:

<class ‘dict’\>

We can create a dictionary with elements and their keys already assigned with the notation key1 : element1, key2 : element2, etc.

dictionary_two = {'three': 3, 4: "four"}
print(dictionary_two)

Output:

{‘three’: 3, 4: ‘four’}

We can set or edit an element in the dictionary by the key and the [] access method.

If the key does not exist it is considered a new element. If the key exists it is considered as a replacement to an existing element.

Code:

dictionary_one {}
dictionary_one[1] = "one"
dictionary_one["two"] = 2
print(dictionary_one)

Output:

{‘two’: 2, 1: ‘one’}

We can use the same [key] notation to access the element. If the key does not exist a KeyError exception is raised. You can use the in operator and the not in operator to determine if a key exists.

Code:

dictionary_one = { 1: 'one', 'two': 2}
print(dictionary_one[1])
print(dictionary_one["two"])
print(1 in dictionary_one )

Output:

one
2
1 in dictionary_one: True

Python provides the methods keys and values to return a dict_keys and dict_values instance which are iterators of the keys and elements respectively.

Code:

dictionary_one = { 1: 'one', 'two': 2}
print(dictionary_one.keys())
print(dictionary_one.values())

Output:

dict_keys([1, ‘two’])
dict_values([‘one’, 2])

We can loop through the keys of a dictionary with a for in loop directly without having to use the keys method.

Code:

for k in { 1: 'one', 'two': 2}:
print(k)

Output:

1
two

We can use the items function to loop through a dictionary with access to the key and the element at the same time.

Code:

for k, v in { 1: 'one', 'two': 2}.items():
print(k, v)

Output;

1 one
two 2

Tuples can be used as keys as long as they contain no mutable objects.

You can even use the in and not in operators to determine if the dictionary contains a key.

Code:

dict_with_tuples = {('a', 'b'): 'ab', ('a', 'c'): 'ac', ('a', 'c'): 'ac'}
print(dict_with_tuples)
print(('a', 'b') in dict_with_tuples.keys())

Output:

{(‘a’, ‘b’): ‘ab’, (‘a’, ‘c’): ‘ac’}
True

Queue

A queue is a special collection which has semantics for first in first out functionality. Python provides the deque class.

Before we can use the deque class we need to import it from the collections namespace.

We initiate a queue by passing in an enumerator; below we pass in a list of numbers from 0-9.

Code:

<br />from collections import deque

numbers = list(range(10))
queue = deque(numbers)
print(queue)
print(type(queue))

Output:

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
<class ‘collections.deque’\>
We can then remove elements from the start with popleft function or the end with the pop function.

Code:

print(queue.popleft())
print(queue)
print(queue.pop())
print(queue)

Output:

0
deque([1, 2, 3, 4, 5, 6, 7, 8, 9])
9
deque([1, 2, 3, 4, 5, 6, 7, 8])

We can append elements to the start or the end of a queue with the append and appendleft functions respectively.

Code:

queue.append(10)
print(queue)

queue.appendleft(11)
print(queue)

Output:

deque([1, 2, 3, 4, 5, 6, 7, 8, 10])
deque([11, 1, 2, 3, 4, 5, 6, 7, 8, 10])

Stack

A stack is a collection which have the semantics of first in last out or last in first out functionality.

Python does not have a dedicated stack class though a list can be used with the pop command to remove the last element.

Pop removes the last element or it can take an index of the element to be removed.

Code:

numbers = list(range(10))

print(numbers)
print(numbers.pop()) # Remove last element
print(numbers.pop(3)) # Remove element at index x
print(numbers)

Output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
9
3
[0, 1, 2, 4, 5, 6, 7, 8]

Set

A set is a collection which contains only unique elements. Adding an element which already exists will leave the collection untouched.

In Python a set is created by passing an iterator into the set constructor. Where duplicate elements are found in the iterator they are reduced to a unique set of values in the set.

Code:

numbers_bag = [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
numbers_set = set(numbers_bag)

print(numbers_bag)
print(numbers_set)

Output:

[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
{1, 2, 3, 4}
<class ‘set’\>

The sets class provides useful set based operators and functionality which create a new set from an operation between two sets.

Operator Name Description
Set difference All elements in the first set which are not in the second.
Pipe Union All elements.
& Intersection All elements which exist in both sets.
^ Symmetric Difference. All in LHS or RHS but not both.
> = Is subset True if all elements in LHS are found within the RHS.
< = Is superset True if all elements in RHS are found within the LHS.

Code:

set_one = set({1, 2, 3, 4, 5, 6})
set_two = set({5, 6, 7, 8})

print(set_one - set_two) # Difference
print(set_one | set_two) # Union
print(set_one & set_two) # Intersection
print(set_one ^ set_two) # Symmetric Difference.

print(set({ 1, 2 }) <= set({1, 2, 3})) # Subset
print(set({ 1, 2, 3}) >= set({1, 2})) # Superset

Output:

{1, 2, 3, 4}
{1, 2, 3, 4, 5, 6, 7, 8}
{5, 6}
{1, 2, 3, 4, 7, 8}
True
True

List Ordering & Iterating

Iterating and ordering lists has already been covered in article on page flow which can be seen here .

Conditional Operators

Conditional operators can be used to determine if a collection is equal.

Two collections are considered equal if they have the same number of elements and each element is equal to the element at the same ordinal potion in the other collection.

The less than, less than or equal to, greater than, greater than or equal to conditional operators can all be used though they perform the semantics of their operator.

For example >= will ensure that the element in the fist set is greater than or equal to it’s counter part in the other set.

Code:

# Compares each element at the same ordinal position
print("1, 2, 5) >= (1, 2, 5):", (1, 2, 5) >= (1, 2, 5))

# Checks each element for equality
print("(1,2) == (1.0, 2.0):", (1, 2) == (1.0, 2.0))

# Can be used for most immutable elements
print('("a","b") < ("e", "f"):', ("a", "b") < ("e", "f"))

Output:

1, 2, 5) \>= (1, 2, 5): True

(1,2) == (1.0, 2.0): True

(“a”,”b”) < (“e”, “f”): True

Slicing

Slicing allows cloning of a collection with criteria data such as a start index, end index and increment value. Any collection which has access to its data via an ordinal position or index can be used.

The basic format is [start_index: end_index : increment]. The end index is used as a less than predicate; i.e. it is the first index not considered.

The values for start index, end index and increment are all optional and will default to [0 : size : 1].

Using either default options or explicit values we can shallow clone an entire collection by either of the following commands:

Code:

new_numbers: numbers[:]
new_numbers: numbers[0:len(numbers):1]

We can mix and match which elements we enter; they are all independently optional.

Code:

numbers[:3] # Element at index 1 until < 3
numbers[8:] # Element at index 8 until the end

We can use a negative increment though this would require the start index to be higher than the end index. The result would be the elements in the reverse order of their source collection.

We can also have an increment which is greater than 1. 2 would be every other element.

Code:

numbers[::2]
numbers[::-2]
numbers[len(numbers):0:-2]

Output:

[1, 3, 5, 7, 9]
[9, 7, 5, 3, 1]
[9, 7, 5, 3, 1]

Slicing can also be used for setting elements though only for mutable types. This would not include tuples.

Here we assign 99 and 100 to index 1 and 2 in the same line of code.

Code:

numbers[0:2] = [99, 100]

We can use slicing to remove a range of elements by assigning an empty collection.

Code:

numbers[0:2] = []

As the start index and end index are defaulted to the first and last elements respectively, we can clear a collection with the following syntax.

Code:

numbers[:] = []

List Comprehensions

List comprehensions allow the generation of a list of elements from another list of elements while providing criteria data. They extend the list slicing functionality.

In short it is a condensed form of a for loop iterating a collection along with an expression to generate a new element from the old element and a predicate to determine which elements to use.

They are best explored by examples.

Imagine we want a list of the square numbers from 0 to 9.

Code:

squares = []
for x in range(10):
squares.append(x**2)

Output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We could condense this with a list comprehension as followed.

Code:

print([x ** 2 for x in range(10)])
  • We create an iterator of 0-9 with the range function.
  • We loop through it assigning each element to a variable named x.
  • We convert the element value using the expression x ** 2; this squares the value.

We could have used a lambda and the map function as follows which in short does the same thing.

Code:

print(list(map(lambda x: x ** 2, range(10))))

Where multiple collections or enumerators are require to generate the new collection we can use a nested loop where every combination of elements is used to generate the list.

We map each pair of x and y into a tuple.

Code:

print([(x, y) for x in [1, 2, 3] for y in [10, 11, 12]])

Output:

(1, 10), (1, 11), (1, 12), (2, 10), (2, 11), (2, 12), (3, 10), (3, 11), (3, 12)]

If we did not want every combination of elements but each pair of elements at the same ordinal position we can use the zip function.

Zip can take any number of iterators though it must be stressed that it only iterates through the number of times of the smallest collection. Below the second collection will have data which is missed from the new collection as it contains more elements than the first collection.

Code:

print([(x, y) for x, y in zip([1, 2, 3], [10, 11, 12, 13, 14, 15])])

Output:

[(1, 10), (2, 11), (3, 12)]

Here we append a predicate to restrict x and y to being even. Only combinations when both x and y are even will be used.

Code:

print([(x, y) for x in [1, 2, 3] for y in [10, 11, 12] if x % 2 == 0 & y % 2 == 0 ] )

Output:

[(2, 10), (2, 11), (2, 12)]

Here we append a predicate to ensure only letters in abcdef which are not in the word cab.

Code:

print({x for x in 'abcdef' if x not in 'cab'})

Output:

{‘e’, ‘d’, ‘f’}

Lists & Linq Functionality

Linq is a collection of set based functionality which come as part of .NET. Python also provides some useful linq style functionality.

The min, max and sum functions can be used to determine the minimum and maximum valued entity along with the sum of all entities without having to iterate through the collection.

**Code:***

numbers = list(range(0, 10))
print("Numbers:", numbers)
print("Min:", min(numbers))
print("Max:", max(numbers))
print("Sum:", sum(numbers))

Output:

Numbers: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Min: 0
Max: 9
Sum: 45

We can use the in and not in operators to determine if an element is contained within a collection. For immutable and mutable types this will work based upon equality.

Code:

numbers = list(range(0, 10))
print("Numbers:", numbers)
print("1 in:", 1 in numbers)
print("10 not in", 10 not in numbers)
print("[1,2] in [[1,2],[1,1]]", [1,2] in [[1,2],[1,1]])

Output:

Numbers: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
True
True
True

The filter class is an iterator which takes a where clause; only elements satisfying the predicate will be iterated over.

Here we provide a function which returns a boolean indicating if the parameter is even. We use this to loop through only the even numbers of a collection

Code:

def is_even(x):
return x % 2 == 0

for even_number in filter(is_even, range(0, 5)):
print(even_number)

Code:

0
2
4

Anywhere a function which takes a function as an argument can always take a lambda expression. The above example could be rewritten using a lambda.

Code:

for even_number in filter(lambda x: x % 2 == 0, range(0, 5)):
print(even_number)

Python also provides the map function which allows looping through one or more iterators along with a conversion function. The return value from the conversion function will define the elements in the new collection.

Here we we loop through 0-3 and create a new collection of the square.

Code:

for n in map(lambda x: x**2, range(0, 4)):
print(n)

Output:

0
1
4
9

Map can take any number of iterators. All elements at the same ordinal position will form the parameters of the iteration. Note the number of iterations will be that of the smallest collection. Any elements at a higher ordinal position will not be iterated through.

Here we loop through two collections both containing 0 – 3 and we crate a new collection of the sum of the elements at the same ordinal position.

Code:

for n in map(lambda x, y: x + y, range(0, 4), range(0, 4)):
print(n)

Output:

0
2
4
6

Don’t forget that a list instance can be created from an iterator. This is useful if you want the results from map or filter to be within a list. In fact this works for most collections except for dictionaries.

Code:

new_list = list(map((lambda x: x**2), range(0, 4)))

The reduce function loops through all elements along with a reduce mapping function. The input parameters of the function will be the next element along with the result of the last call to the reduce mapping function.

Here we can sum all the elements of a collection by adding the ongoing result to the current element and then returning the result.

The function can also take a third parameter which is a starting value. The starting value will default to 0 if it is not provided.

Code:

import functools

print(functools.reduce(lambda x, y: x + y, range(4))) # Starting value of 0. 0 + 1 + 2 + 3
print(functools.reduce(lambda x, y: x + y, range(4), 10)) # Starting value 10. 10 + 1 + 2 + 3

Output:

6
16

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s