Advanced Python: Itertools Library — The Gem Of Python Language
Explaining The Features Of The Must-Know Amazing Python Library
Did you know that the Python Itertools library is regarded as the Gem of Python?
Some users even consider it to be one of the coolest and most amazing Python libraries.
We can use the Itertools module to enrich our applications and create a solid working solution in a shorter time.
The article will help the readers understand how we can use the Itertools module in our projects.
This is an advanced level topic for Python developers and I recommend it to everyone who is/or intends in using the Python programming language.
If you want to understand the Python programming language from the beginner to an advanced level then I highly recommend the article below:
Article Aim
This article will provide an overview of the Itertools library.
I have divided the article into three parts whereby each part will explain a specific functionality of the Itertools library.
In particular, I will be explaining:
- Infinite Iterators
- Terminating Iterators
- Combinatoric Iterators
We can use the Itertools library to implement precise, memory efficient and stable applications in a shorter time.
This article is based on the itertools version 2.3 and Python 3.8
In your Python code, import the itertools library
import itertools as it
Quick Note: What Is An Iterator?
An iterator is an object with a __next__
method. It has a state. The state is used to remember the execution during iteration. Therefore an iterator knows about its current state and this makes it memory efficient. This is the reason why an iterator is used in memory efficient and fast applications.
We can open an infinite stream of data (such as reading a file) and get the next item (such as the next line from the file). We can then perform an action on the item and proceed to the next item. This could mean that we can have an iterator that returns an infinite number of elements as we only need to be aware of the current item.
It has a __next__ method that returns the next value in the iteration and then updates the state to point to the next item. The iterator will always get us the next item from the stream when we execute next(iterator)
When there is no next item to be returned, the iterator raises a StopIteration exception.
As a result, a lean application can be implemented by using iterators
Note, collections such as a list, string, file lines, dictionary, tuples, etc. are all iterators.
Quick Note: What Is An Iterable?
An iterable is an object that can return an iterator. It has an __iter__
method that returns an iterator.
An iterable is an object which we can loop over and can call iter() on. It has a __getitem__
method that can take sequential indexes starting from zero (and raises an IndexError
when the indexes are no longer valid).
What Is Itertools?
Itertools is a Python module that is part of the Python 3 standard libraries. It lets us perform memory and computation efficient tasks on iterators. It is inspired by constructs from APL, Haskell, and SML.
Essentially, the module contains a number of fast and memory-efficient methods that can help us build applications succinctly and efficiently in pure Python.
Python’s Itertool is a module that provides various functions that work on iterators to produce iterators. It allows us to perform iterator algebra.
The most important point to take is that the itertools functions can return an iterator.
This brings us to the core of the article. Let’s understand how infinite iterators work.
1. Infinite Iterators
What if we want to construct an iterator that returns an infinite evenly spaced values? Or, what if we have to generate a cycle of elements from an iterator? Or, maybe we want to repeat the elements of an iterator?
The itertools library offers a set of functions which we can use to perform all of the required functionality.
The three functions listed in this section construct and return iterators which can be a stream of infinite items.
Count
As an instance, we can generate an infinite sequence of evenly spaced values:
start = 10
stop = 1
my_counter = it.count(start, stop)
for i in my_counter:
# this loop will run for ever
print(i)
This will print never-ending items e.g.
10
11
12
13
14
15
Cycle
We can use the cycle method to generate an infinite cycle of elements from the input.
The input of the method needs to be an iterable such as a list or a string or a dictionary, etc.
my_cycle = it.cycle('Python')
for i in my_cycle:
print(i)
This will print never-ending items:
P
y
t
h
o
n
P
y
t
h
o
n
P
Repeat
To repeat an item (such as a string or a collection), we can use the repeat() function:
to_repeat = 'FM'
how_many_times = 4
my_repeater = it.repeat(to_repeat, how_many_times)
for i in my_repeater:
print(i)#Prints
FM
FM
FM
FM
This will repeat the string ‘FM’ 4 times. If we do not provide the second parameter then it will repeat the string infinite times.
2. Terminating Iterators
This brings us to the next section of the topic.
In this section, I will illustrate the powerful features of terminating iterations. These functions can be used for a number of reasons, such as:
- We might have a number of iterables and we want to perform an action on the elements of all of the iterables one by one in a single sequence.
- Or when we have a number of functions which we want to perform on every single element of an iterable
- Or sometimes we want to drop elements from the iterable as long as the predicate is true and then perform an action on the other elements.
Chain
This method lets us create an iterator that returns elements from all of the input iterables in a sequence until there are no elements left. Hence, it can treat consecutive sequences as a single sequence.
chain = it.chain([1,2,3], ['a','b','c'], ['End'])
for i in chain:
print(i)
This will print:
1
2
3
a
b
c
End
Drop While
We can pass an iterable along with a condition and this method will start evaluating the condition on each of the elements until the condition returns False for an element. As soon as the condition evaluates to False for an element, this function will then return the rest of the elements of the iterable.
As an example, assume that we have a list of jobs and we want to iterate over the elements and only return the elements as soon as a condition is not met. Once the condition evaluates to False, our expectation is to return the rest of the elements of the iterator.
jobs = ['job1', 'job2', 'job3', 'job10', 'job4', 'job5']
dropwhile = it.dropwhile(lambda x : len(x)==4, jobs)
for i in dropwhile:
print(i)
This method will return:
job10
job4
job5
The method returned the three items above because the length of the element job10 is not equal to 4 characters and therefore job10 and the rest of the elements are returned.
The input condition and the iterable can be complex objects too.
Take While
This method is the opposite of the dropwhile() method. Essentially, it returns all of the elements of an iterable until the first condition returns False and then it does not return any other element.
As an example, assume that we have a list of jobs and we want to stop returning the jobs as soon as a condition is not met.
jobs = ['job1', 'job2', 'job3', 'job10', 'job4', 'job5']
takewhile = it.takewhile(lambda x : len(x)==4, jobs)
for i in takewhile:
print(i)
This method will return:
job1
job2
job3
This is because the length of ‘job10’ is not equal to 4 characters.
GroupBy
This function constructs an iterator after grouping the consecutive elements of an iterable. The function returns an iterator of key, value pairs where the key is the group key and the value is the collection of the consecutive elements that have been grouped by the key.
Consider this snippet of code:
iterable = 'FFFAARRHHHAADDMMAAALLIIKKK'
my_groupby = it.groupby(iterable)
for key, group in my_groupby:
print('Key:', key)
print('Group:', list(group))
Note, the group property is an iterable and therefore I materialised it to a list.
As a result, this will print:
Key: F
Group: [‘F’, ‘F’, ‘F’]
Key: A
Group: [‘A’, ‘A’]
Key: R
Group: [‘R’, ‘R’]
Key: H
Group: [‘H’, ‘H’, ‘H’]
Key: A
Group: [‘A’, ‘A’]
Key: D
Group: [‘D’, ‘D’]
Key: M
Group: [‘M’, ‘M’]
Key: A
Group: [‘A’, ‘A’, ‘A’]
Key: L
Group: [‘L’, ‘L’]
Key: I
Group: [‘I’, ‘I’]
Key: K
Group: [‘K’, ‘K’, ‘K’]
We can also pass in a key function as the second argument if we want to group by a complex logic.
Tee
This method can split an iterable and generate new iterables from the input. The output is also an iterator that returns the iterable for the given number of items. To understand it better, review the snippet below:
iterable = 'FM'
tee = it.tee(iterable, 5)
for i in tee:
print(list(i))
This method returned the entire iterable FM, 5 times:
[‘F’, ‘M’]
[‘F’, ‘M’]
[‘F’, ‘M’]
[‘F’, ‘M’]
[‘F’, ‘M’]
3. Combinatoric Iterators
In this section of the article, I will explain the two methods which I recommend all of the Python programmers to have a solid understanding of.
Permutations
We can create an iterator that returns successive permutations of elements in the input iterable by using the permutations method.
We can pass in an argument to specify the length of the permutations. It is defaulted to the length of the iterable.
This implies that when the length is missing then the method would generate all possible full-length permutations.
iterable = 'FM1'length = 2
permutations = it.permutations(iterable, length)
for i in permutations:
print(i)
This will print:
(‘F’, ‘M’, ‘1’)
(‘F’, ‘1’, ‘M’)
(‘M’, ‘F’, ‘1’)
(‘M’, ‘1’, ‘F’)
(‘1’, ‘F’, ‘M’)
(‘1’, ‘M’, ‘F’)
If the length is 2 then it would genereate:
(‘F’, ‘M’)
(‘F’, ‘1’)
(‘M’, ‘F’)
(‘M’, ‘1’)
(‘1’, ‘F’)
(‘1’, ‘M’)
(‘F’, ‘M’)
(‘F’, ‘1’)
(‘M’, ‘1’)
Combinations
Finally, I wanted to provide an explanation of how we can generate combinations of an iterable.
Given an iterable, we can construct an iterator to return sub-sequences of elements of a given length.
The elements are treated as unique based on their position and only the distinct elements are returned.
iterable = 'FM1'
combinations = it.combinations(iterable, 2)
for i in combinations:
print(i)
This will print:
(‘F’, ‘M’)
(‘F’, ‘1’)
(‘M’, ‘1’)
Summary
This article explained the uses of the Itertools library. In particular, it explained:
- Infinite Iterators
- Terminating Iterators
- Combinatoric Iterators
The itertools methods can be used in combination to serve a powerful set of functionality in our applications. We can use the Itertools library to implement precise, memory efficient and stable applications in a shorter time.
I recommend potentially evaluating our applications to assess whether we can use the Itertools library.
For more detailed information, please visit the Python official documentation here
'Data Analytics(en)' 카테고리의 다른 글
The Definitive Data Scientist Environment Setup (0) | 2020.10.03 |
---|---|
Extracting Data from PDF File Using Python and R (0) | 2020.10.02 |
Data Visualisation using Pandas and Plotly (0) | 2020.09.30 |
Bye-bye Python. Hello Julia! (0) | 2020.09.29 |
Python Lambda Expressions in Data Science (0) | 2020.09.29 |