Data Science and Data Proicessing

Automating your daily tasks with Python

Jul 11 · 3 min read
Image for post

Who doesn’t want to get an extra hour of sleep in the morning but alas you have to do the same daily tasks of your office. The life of a data analyst is all about executing SQL queries to extract the data from the database, saving the result set into a CSV file, doing data manipulation on the file, and representing the findings.

We often do some tasks daily where we have to pull out the data from the database, do some data manipulation on excel, and then final data in the right format we have to send it to other teams for making appropriate decisions based on that. Doing the same thing on every day basis will become a tedious task and frankly speaking it will not add any value to your skills.

Here in this article, we will see how we can automate that using python. So, let’s get started:

The prerequisite is, you need to have anaconda or any similar IDE installed in your system where you can run your python code.

Firstly we need to create a connection to our database from where we want to extract the data. Here I will be creating a connection to Vertica(for MySQL database use: import mysql.connector). Below is the code you will be needed in order to create a connection to the database.

from vertica_python import connectconn_info = {'host': '123.45.67.89',
'port': 5433,
'user': 'Username',
'password': 'password',
'database': 'Schema_name',
'read_timeout': 600, # 10 minutes timeout on queries
'unicode_error': 'strict', # default throw error on invalid UTF-8 results
'ssl': False # SSL is disabled by default
}
connection = connect(**conn_info)

Here host will be the IP address of the server where the database resides. Once you execute this code your connection will be set up and you will be good to go to our next step which is to create a cursor. The cursor is used to execute a statement to communicate with the database. It is used to execute SQL statements, call procedures, and fetch all data from the result set.

cursor = connection.cursor()

Once we have our cursor we can execute our SQL query.

cursor.execute("Select * from table_name")

Here cursor.execute will compile your select query and throw out syntax error if you have any error in your SQL statement. After this, we need to fetch all the records from the result set and save the data into a CSV file.

import csv
with open('output.csv','w') as f:
writer = csv.writer(f)
writer.writerow([i[0] for i in cursor.description])
for row in cursor.fetchall():
writer.writerow(row)

For saving our result set into a CSV file we need to open the file in write mode(It will overwrite the file if it exists else will create a new file).

Cursor.description will be used to fetch the header of the data. For loop is used here to write the data row by row into a CSV file. The output file will be found in your current working directory.

After fetching the data we will create a pivot table to represent the final output. To create the pivot table we need to fetch the data into a pandas DataFrame.

import pandas as pd
df = pd.read_csv('output.csv')
df.pivot_table(index=['col_name1'], values='col_name2', columns='col_name3', aggfunc='sum')

In the pivot table, parameters represent the same thing as their name suggests. Now, we have our pivot table with us as an end result.

Hope this article helps in your data analysis journey and makes your life a little easy.

Thanks for reading!

+ Recent posts