Data Science and Data Proicessing

Pandas DataFrame (Python): 10 useful tricks

10 basic tricks to make your pandas life a bit easier

Image for post

Pandas is a powerful open source data analysis and manipulation tool, built on top of the Python programming language. In this article, I will show 10 tricks regarding the pandas DataFrame to make certain programming practices a bit easier.

Of course, before we can use pandas, we have to import it by using the following command:

import pandas as pd

1. Select multiple rows and columns using .loc

countries = pd.DataFrame({
'country': ['United States', 'The Netherlands', 'Spain', 'Mexico', 'Australia'],
'capital': ['Washington D.C.', 'Amsterdam', 'Madrid', 'Mexico City', 'Canberra'],
'continent': ['North America', 'Europe', 'Europe', 'North America', 'Australia'],
'language': ['English', 'Dutch', 'Spanish', 'Spanish', 'English']})
Image for post

By using the operator, we are able to select subsets of rows and columns on the basis of their index label and column name. Below are some examples on how to use the loc operator on the ‘countries’ DataFrame:

countries.loc[:, 'country':'continent']
Image for post
countries.loc[0:2, 'country':'continent']
Image for post
countries.loc[[0, 4], ['country', 'language']]
Image for post

2. Filter DataFrames by category

In many cases, we may want to consider only the data points that are included in one particular category, or sometimes in a selection of categories. For a single category, we are able to do this by using the operator. However, for multiple categories, we have to make use of the function:

countries[countries.continent == 'Europe']
Image for post
countries[countries.language.isin(['Dutch', 'English'])]
Image for post

3. Filter DataFrames by excluding categories

As opposed to filtering by category, we may want to filter our DataFrame by excluding certain categories. We do this by making use of the (tilde) sign, which is the complement operator. Example usage:

countries[~countries.continent.isin(['Europe'])]
Image for post
countries[~countries.language.isin(['Dutch', 'English'])]
Image for post

4. Rename columns

You might want to change the name of certain columns because e.g. the name is incorrect or incomplete. For example, we might want to change the ‘capital’ column name to ‘capital_city’ and ‘language’ to ‘most_spoken_language’. We can do this in the following way:

countries.rename({'capital': 'capital_city', 'language': 'most_spoken_language'}, axis='columns')
Image for post

Alternatively, we can use:

countries.columns = ['country', 'capital_city', 'continent', 'most_spoken_language']

5. Reverse row order

To reverse the row order, we make use of the operator. This works in the following way:

countries.loc[::-1]
Image for post

However, note that now the indexes still are following the previous ordering. We have to make use of the function to reset the indexes:

countries.loc[::-1].reset_index(drop=True)
Image for post

6. Reverse column order

Reversing the column order goes in a similar way as for the rows:

countries.loc[:, ::-1]
Image for post

7. Split a DataFrame into two random subsets

In some cases, we want to split a DataFrame into two random subsets. For this, we make use of the function. For example, when creating a training and a test set out of the whole data set, we have to create two random subsets. Below, we show how to use the function:

countries_1 = countries.sample(frac=0.6, random_state=999)
countries_2 = countries.drop(countries_1.index)
Image for post
countries_1
Image for post
countries_2

8. Create dummy variables

students = pd.DataFrame({
'name': ['Ben', 'Tina', 'John', 'Eric'],
'gender': ['male', 'female', 'male', 'male']})

We might want to convert categorical variables into dummy/indicator variables. We can do so by making use of the function:

pd.get_dummies(students)
Image for post

To get rid of the redundant columns, we have to add :

pd.get_dummies(students, drop_first=True)
Image for post

9. Check equality of columns

When the goal is to check equality of two different columns, one might at first think of the operator, since this is mostly used when we are concerned with checking equality conditions. However, this operator does not handle NaN values properly, so we make use of the functions here. This goes as follows:

df = pd.DataFrame({'col_1': [1, 0], 'col_2': [0, 1], 'col_3': [1, 0]})
Image for post
df['col_1'].equals(df['col_2'])

>>> False

df['col_1'].equals(df['col_3'])

>>> True

10. Concatenate DataFrames

We might want to combine two DataFrames into one DataFrame that contains all data points. This can be achieved by using the function:

df_1 = pd.DataFrame({'col_1': [6, 7, 8], 'col_2': [1, 2, 3], 'col_3': [5, 6, 7]})
Image for post
pd.concat([df, df_1]).reset_index(drop=True)

Thanks for reading!

I hope this article helped you in some way, and I wish you good luck on your next project when making use of Pandas :).

Introducing Bamboolib — a GUI for Pandas

A couple of days back, mister Tobias Krabel contacted me via LinkedIn to introduce me to his product, a Python library called Bamboolib, which he states to be a GUI tool for learning Pandas — Python’s data analysis and visualization library.

He states, and I quote:

I have to admit, I was skeptical at first, mainly because I’m not a big fan of GUI tools and drag & drop principle in general. Still, I’ve opened the URL and watched the introduction video.

It was one of those rare times when I was legitimately intrigued.

From there I’ve quickly responded to Tobias, and he kindly offered me to test out the library and see if I liked it.

How was it? Well, you’ll have to keep reading to find the answer to that. So let’s get started.


Is it Free?

In a world where such amazing libraries like Numpy and Pandas are free to use, this question may not even pop in your head. However, it should, because not all versions of Bamboolib are free.

If you don’t mind sharing your work with others, then yeah, it’s free to use, but if that poses a problem then it will set you back at least $10 a month which might be a bummer for the average users. Down below is the full pricing list:

Image for post

As the developer of the library stated, Bamboolib is designed to help you learn Pandas, so I don’t see a problem with going with the free option — most likely you won’t be working on some top-secret project if just starting out.

This review will, however, be based on the private version of the library, as that’s the one Tobias gave access to me. With that being said, this article is by no means written with the idea of persuading you to buy the license, it only provides my personal opinion.

Before jumping into the good stuff, you’ll need to install the library first.


The Installation Process

The first and most obvious thing to do is pip install:

pip install bamboolib

However, there’s a lot more to do if you want this thing fully working. It is designed to be a Jupyter Lab extension (or Jupyter Notebook if you still use those), so we’ll need to set up a couple of things there also.

In a command line type the following:

jupyter nbextension enable --py qgrid --sys-prefix
jupyter nbextension enable --py widgetsnbextension --sys-prefix
jupyter nbextension install --py bamboolib --sys-prefix
jupyter nbextension enable --py bamboolib --sys-prefix

Now you’ll need to find the major version of Jupyter Lab installed on your machine. You can obtain it with the following command:

jupyter labextension list

Mine is “1.0”, but yours can be anything, so here’s a generic version of the next command you’ll need to execute:

jupyter labextension install @jupyter-widgets/jupyterlab-manager@MAJOR_VERSION.MINOR_VERSION --no-build

Note that you need to replace “MAJOR_VERSION.MINOR_VERSION” with the version number, which is “1.0” in my case.

A couple of commands more and you’re ready to rock:

jupyter labextension install @8080labs/qgrid@1.1.1 --no-build
jupyter labextension install plotlywidget --no-build
jupyter labextension install jupyterlab-plotly --no-build
jupyter labextension install bamboolib --no-build

jupyter lab build --minimize=False

That’ it. Now you can start Juypter Lab and we can dive into the good stuff.


The First Use

Once in Jupyter, you can import Bamboolib and Pandas, and then use Pandas to load in some dataset:

Image for post

Here’s how you’d use the library to view the dataset:

Image for post

That’s not gonna work the first time you’re using the library. You’ll need to activate it, so make sure to have the license key somewhere near:

Image for post

Once you’ve entered the email and license key, you should get the following message indicating that everything went well:

Image for post

Great, now you can once again execute the previous cell. Immediately you’ll see an unfamiliar, but friendly-looking interface:

Image for post

Now everything is good to go, and we can dive into some basic functionalities. It was a lot of work to get to this point, but trust me, it was worth it!


Data Filtering

One of the most common everyday tasks of any data analyst/scientist is data filtering. Basically you want to keep only a subset of data that’s relevant to you in a given moment.

To start filtering with Bamboolib, click on the Filter button.

A side menu like the one below should pop up. I’ve decided to filter by the “Age” column, and keep only the rows where the value of “Age” is less than 18:

Image for post

Once you press Execute, you’ll see the actions took place immediately:

Image for post

That’s great! But what more can you do?


Replacing Values

Another one of those common everyday tasks is to replace string values with the respective numerical alternative. This dataset is perfect to demonstrate value replacement because we can easily replace string values in the “Sex” column with numeric ones.

To begin, hit the Replace value button and specify the column, the value you want to replace and what you want to replace it with:

Image for post

And once the Execute button is hit:

Fantastic! You can do the same for the “female” option, but it’s up to you whether you want to do it or not.


Group By

Yes, you can also perform aggregations! To get started, click on the Aggregate/Group by button and specify what should be done in the side menu.

I’ve decided to group by “Pclass”, because I want to see the total number of survivors per passenger class:

Image for post

That will yield the following output:

Image for post

Awesome! Let’s explore one more thing before wrapping up.


One Hot Encoding

Many times when preparing data for machine learning you’ll want to create dummy variables, ergo create a new column per unique value of a given attribute. It’s a good idea to do so because many machine learning algorithms can’t work with text data.

To implement that logic via Bamboolib, hit the OneHotEncoder button. I’ve decided to create dummy variables from the “Embarked” attribute because it has 3 distinct values and you can’t state that one is better than the other. Also, make sure to remove the first dummy to avoid collinearity issues (having variable which is a perfect predictor for some other variable):

Image for post

Executing will create two new columns in the dataset, just as you would expect:

That’s nice, I’ve done my transformations, but what’s next?


Getting the Code

It was all fun and games until now, but sooner or later you’ll notice the operations don’t act in place — ergo the dataset will not get modified if you don’t explicitly specify it.

That’s not a bug, as it enables you to play around without messing the original dataset. What Bamboolib will do, however, it will generate Python code for achieving the desired transformations.

To get the code, first, click on the Export button:

Image for post

Now specify how do you want it exported — I’ve selected the first option:

Image for post

And it will finally give you the code which you can copy and apply to the dataset:

Image for post

Is it worth it?

Until this point, I showcased briefly the main functionalities of Bamboolib — by no means was it exhaustive tutorial — just wanted to show you the idea behind it.

The question remains, is it worth the money?

That is if you decide to go with the paid route. You can still use it for free, provided that you don’t mind sharing your work with others. The library by itself is worth checking out for two main reasons:

  1. It provides a great way to learn Pandas — it’s much more easy to learn by doing than by reading, and a GUI tool like this will most certainly only help you
  2. It’s great for playing around with data — let’s face it, there are times when you know what you want to do, but you just don’t know how to implement it in code — Bamboolib can assist

Keep in mind — you won’t get any additional features with the paid version — the only real benefit is that your work will be private and that there’s an option for commercial use.

Even if you’re not ready to grab your credit card just yet, it can’t harm you to try out the free version and see if it’s something you can benefit from.

Thanks for reading. Take care.

Jupyter is now a full-fledged IDE

Literate programming is now a reality through nbdev and the new visual debugger for Jupyter.

Image for post
Photo by Max Duzij on Unsplash

Notebooks have always been a tool for incremental development of software ideas. Data scientists use Jupyter to journal their work, explore and experiment with novel algorithms, quickly sketch new approaches and immediately observe the outcomes.

However, when the time is ripe, software developers turn to classical IDEs (Integrated Development Environment), such as Visual Studio Code and Pycharm, to convert the ideas into libraries and frameworks. But is there a way to transform Jupyter into a full-fledged IDE, where raw concepts are translated into robust and reusable modules?

To this end, developers from several institutions, including QuantStack, Two Sigma, Bloomberg and fast.ai developed two novel tools; nbdev and a visual debugger for Jupyter.

Literate Programming and nbdev

In 1983, Donald Knuth came up with a new programming paradigm call literate programming. In his own words literate programming is “a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer”.

Jeremy Howard and Sylvain Gugger, fascinated by that design presented nbdev later last year. This framework allows you to compose your code in the familiar Jupyter Notebook environment, exploring and experimenting with different approaches before reaching an effective solution for a given problem. Then, using certain keywords, nbdev permits you to extract the useful functionality into a full-grown python library.

More specifically, nbdev complements Jupyter by adding support for:

  • automatic creation of python modules from notebooks, following best practices
  • editing and navigation of the code in a standard IDE
  • synchronization of any changes back into the notebooks
  • automatic creation of searchable, hyperlinked documentation from the code
  • pip installers readily uploaded to PyPI
  • testing
  • continuous-integration
  • version control conflict handling

nbdev enables software developers and data scientists to develop well-documented python libraries, following best practices without leaving the Jupyter environment. nbdev is on PyPI so to install it you just run:

pip install nbdev

For an editable install, use the following:

git clone https://github.com/fastai/nbdev
pip install -e nbdev

To get started, read the excellent blog post by its developers, describing the notion behind nbdev and follow the detailed tutorial in the documentation.

The missing piece

Though nbdev covers most of the tools needed for an IDE-like development inside Jupyter, there is still a piece missing; a visual debugger.

Therefore, a team of developers from several institutions announced yesterday the first public release of the Jupyter visual debugger. The debugger offers most of what you would expect from an IDE debugger:

  • a variable explorer, a list of breakpoints and a source preview
  • the possibility to navigate the call stack (next line, step in, step out etc.)
  • the ability to set breakpoints intuitively, next to the line of interest
  • flags to indicate where the current execution has stopped

To take advantage of this new tool we need a kernel implementing the Jupyter debug protocol in the back-end. Hence, the first step is to install such a kernel. The only one that implements it so far is xeus-python. To install it just run:

conda install xeus-python -c conda-forge

Then, run Jupyter Lab and on the sidebar search for the Extension Manager and enable it, if you haven’t so far.

Image for post
Enable the extension manager

A new button will appear on the sidebar. To install the debugger just go to the newly enabled Extension Manager button and search for the debugger extension.

Image for post
Enable the debugger

After installing it Jupyter Lab will ask you to perform a build to include the latest changes. Accept it, and, after a few seconds, you are good to go.

To test the debugger, we create a new xpython notebook and compose a simple function. We run the function as usual and observe the result. To enable the debugger, press the associated button on the top right of the window.

Image for post
Enable the debugger

Now, we are ready to run the function again. Only this time the execution will stop at the breakpoint we set and we will be able to explore the state of the program.

Image for post
Debug the code

We see that the program stopped at the breakpoint. Opening the debugger panel we see the variables, a list of breakpoints, the call stack navigation and the source code.

The new visual debugger for Jupyter offers everything you would expect from an IDE debugger. It is still in development, thus, new functionality is expected. Some of the features that its developers plan to release in 2020 are:

  • Support for rich mime type rendering in the variable explorer
  • Support for conditional breakpoints in the UI
  • Enable the debugging of Voilà dashboards, from the JupyterLab Voilà preview extension
  • Enable debugging with as many kernels as possible

Conclusion

Jupyter notebooks have always been a great way to explore and experiment with your code. However, software developers usually turn to a full-fledged IDE, copying the parts that work, to produce a production-ready library.

This is not only inefficient but also a loss on the Jupyter offering; literate programming. Moreover, notebooks provide an environment for better documentation, including graphs, images and videos, and sometimes better tools, such as auto-complete functionality.

nbdev and the visual debugger are two projects that aim at closing the gap between notebooks and IDEs. In this story, we saw what nbdev is and how it makes literate programming a reality. Furthermore, we discovered how a new project, the visual debugger for Jupyter, provides the missing piece.

My name is Dimitris Poulopoulos and I’m a machine learning researcher at BigDataStack and PhD(c) at the University of Piraeus, Greece. I have worked on designing and implementing AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA. If you are interested in reading more posts about Machine Learning, Deep Learning and Data Science, follow me on Medium, LinkedIn or @james2pl on twitter.

Handling exceptions in Python a cleaner way, using Decorators

Handling exceptions in python can get in some cases repetitive and ugly, we can solve that using decorators.

May 17 · 3 min read
Image for post

Functions in Python

Functions in Python are first class objects, which means they can be assigned to a variable, passed as an argument, and return from another function and store that in any data structure.

def example_function():
print("Example Function called")
some_variable = example_functionsome_variable()
Example Function called

Decorators

The first class object property of function helps us to use the concept of Decorators in Python. Decorators are functions which take as argument another function as an object, which enables us to put our logic either at the start and end of the execution of the argument function.

def decorator_example(func):
print("Decorator called")

def inner_function(*args, **kwargs):
print("Calling the function")
func(*args, **kwargs)
print("Function's execution is over")
return inner_function
@decorator_example
def some_function():
print("Executing the function")
# Function logic goes here
Decorator called
Calling the function
Executing the function
Function's execution is over

Error Handling Using Decorators

You can use Decorators for quite a lot of purposes like logging, validations, or any other common logic which needs to be put in multiple functions. One of the many areas where Decorators can be used is the exception handling.

def area_square(length):
try:
print(length**2)
except TypeError:
print("area_square only takes numbers as the argument")


def area_circle(radius):
try:
print(3.142 * radius**2)
except TypeError:
print("area_circle only takes numbers as the argument")


def area_rectangle(length, breadth):
try:
print(length * breadth)
except TypeError:
print("area_rectangle only takes numbers as the argument")
def exception_handler(func):
def inner_function(*args, **kwargs):
try:
func(*args, **kwargs)
except TypeError:
print(f"{func.__name__} only takes numbers as the argument")
return inner_function


@exception_handler
def area_square(length):
print(length * length)


@exception_handler
def area_circle(radius):
print(3.14 * radius * radius)


@exception_handler
def area_rectangle(length, breadth):
print(length * breadth)


area_square(2)
area_circle(2)
area_rectangle(2, 4)
area_square("some_str")
area_circle("some_other_str")
area_rectangle("some_other_rectangle")
4
12.568
8
area_square only takes numbers as the argument
area_circle only takes numbers as the argument
area_rectangle only takes numbers as the argument

Rank Game Publisher
1 리니지M NCSOFT
2 리니지2M NCSOFT
3 바람의나라: 연 NEXON Company
4 R2M Webzen Inc.
5 기적의 검 4399 KOREA
6 뮤 아크엔젤 Webzen Inc.
7 KartRider Rush+ NEXON Company
8 V4 NEXON Company
9 일루전 커넥트 ChangYou
10 라그나로크 오리진 GRAVITY Co., Ltd.
11 라이즈 오브 킹덤즈 LilithGames
12 블레이드&소울 레볼루션 Netmarble
13 FIFA ONLINE 4 M by EA SPORTS™ NEXON Company
14 그랑삼국 YOUZU(SINGAPORE)PTE.LTD.
15 A3: 스틸얼라이브 Netmarble
16 AFK 아레나 LilithGames
17 메이플스토리M NEXON Company
18 리니지2 레볼루션 Netmarble
19 스테리테일 4399 KOREA
20 PUBG MOBILE PUBG CORPORATION
21 동방불패 모바일 Perfect World Korea
22 가디언 테일즈 Kakao Games Corp.
23 슬램덩크 DeNA HONG KONG LIMITED
24 Lords Mobile: Kingdom Wars IGG.COM
25 Teamfight Tactics: League of Legends Strategy Game Riot Games, Inc
26 Epic Seven Smilegate Megaport
27 Roblox Roblox Corporation
28 Gardenscapes Playrix
29 Brawl Stars Supercell
30 Pmang Poker : Casino Royal NEOWIZ corp
31 왕좌의게임:윈터이즈커밍 YOOZOO GAMES KOREA CO., LTD.
32 Age of Z Origins Camel Games Limited
33 검은사막 모바일 PEARL ABYSS
34 마구마구 2020 Netmarble
35 Rise of Empires: Ice and Fire Long Tech Network Limited
36 Summoners War Com2uS
37 황제라 칭하라 Clicktouch Co., Ltd.
38 Empires & Puzzles: Epic Match 3 Small Giant Games
39 한게임 포커 NHN BIGFOOT
40 케페우스M Ujoy Games
41 안녕엘라 (주)알피지리퍼블릭
42 페이트/그랜드 오더 Netmarble
43 궁3D WISH INTERACTIVE TECHNOLOGY LIMITED
44 FIFA Mobile NEXON Company
45 붕괴3rd miHoYo Limited
46 Random Dice: PvP Defense 111%
47 Homescapes Playrix
48 Lord of Heroes CloverGames
49 컴투스프로야구2020 Com2uS
50 Cookie Run: OvenBreak - Endless Running Platformer Devsisters Corporation

Rank Game Publisher
1 리니지M NCSOFT
2 리니지2M NCSOFT
3 바람의나라: 연 NEXON Company
4 R2M Webzen Inc.
5 기적의 검 4399 KOREA
6 뮤 아크엔젤 Webzen Inc.
7 KartRider Rush+ NEXON Company
8 V4 NEXON Company
9 일루전 커넥트 ChangYou
10 라그나로크 오리진 GRAVITY Co., Ltd.
11 라이즈 오브 킹덤즈 LilithGames
12 블레이드&소울 레볼루션 Netmarble
13 FIFA ONLINE 4 M by EA SPORTS™ NEXON Company
14 그랑삼국 YOUZU(SINGAPORE)PTE.LTD.
15 A3: 스틸얼라이브 Netmarble
16 AFK 아레나 LilithGames
17 메이플스토리M NEXON Company
18 리니지2 레볼루션 Netmarble
19 스테리테일 4399 KOREA
20 PUBG MOBILE PUBG CORPORATION
21 동방불패 모바일 Perfect World Korea
22 가디언 테일즈 Kakao Games Corp.
23 슬램덩크 DeNA HONG KONG LIMITED
24 Lords Mobile: Kingdom Wars IGG.COM
25 Teamfight Tactics: League of Legends Strategy Game Riot Games, Inc
26 Epic Seven Smilegate Megaport
27 Roblox Roblox Corporation
28 Gardenscapes Playrix
29 Brawl Stars Supercell
30 Pmang Poker : Casino Royal NEOWIZ corp
31 왕좌의게임:윈터이즈커밍 YOOZOO GAMES KOREA CO., LTD.
32 Age of Z Origins Camel Games Limited
33 검은사막 모바일 PEARL ABYSS
34 마구마구 2020 Netmarble
35 Rise of Empires: Ice and Fire Long Tech Network Limited
36 Summoners War Com2uS
37 황제라 칭하라 Clicktouch Co., Ltd.
38 Empires & Puzzles: Epic Match 3 Small Giant Games
39 한게임 포커 NHN BIGFOOT
40 케페우스M Ujoy Games
41 안녕엘라 (주)알피지리퍼블릭
42 페이트/그랜드 오더 Netmarble
43 궁3D WISH INTERACTIVE TECHNOLOGY LIMITED
44 FIFA Mobile NEXON Company
45 붕괴3rd miHoYo Limited
46 Random Dice: PvP Defense 111%
47 Homescapes Playrix
48 Lord of Heroes CloverGames
49 컴투스프로야구2020 Com2uS
50 Cookie Run: OvenBreak - Endless Running Platformer Devsisters Corporation

+ Recent posts