Data Science and Data Proicessing

Advanced Python: Consider These 10 Elements When You Define Python Functions

Best practices for function declarations in Python — particularly public APIs

Aug 26 · 12 min read
Image for post
Photo by Ricardo Frantz on Unsplash

No matter what implementation mechanisms programming languages use, all of them have a reserved seat for functions. Functions are essential parts of any code project because they’re responsible for preparing and processing data and configuring user interface elements. Without exception, Python, while positioned as an object-oriented programming language, depends on functions to perform data-related operations. So, writing good functions is critical to building a resilient code base.

It’s straightforward to define a few simple functions in a small project. With the growth of the project scope, the functions can get far more complicated and the need for more functions grows exponentially. Getting all the functions to work together without any confusion can be a headache, even to experienced programmers. Applying best practices to function declarations becomes more important as the scope of your project grows. In this article, I’d like to talk about best practices for declaring functions — knowledge I have accrued over years of coding.

1. General Guidelines

You may be familiar with these general guidelines, but I’d like to discuss them first because they’re high-level, good practices that many programmers don’t appreciate. When developers don’t follow these guidelines, they pay the price — the code is very hard to maintain.

We have to give meaningful names to our functions. As you know, functions are also objects in Python, so when we define a function, we basically create a variable of the function type. So, the variable name (i.e. the name of the function) has to reflect the operation it performs.

Although readability has become more emphasized in modern coding, it’s mostly talked about in regards to comments — it’s much less often discussed in relation to code itself. So, if you have to write extensive comments to explain your functions, it’s very likely that your functions don’t have good names. Don’t worry about having a long function name — almost all modern IDEs have excellent auto-completion hints, which will save you from typing the entire long names.

Function Names

Good naming rules should also apply to the arguments of the function and all local variables within the function. Something else to note is that if your functions are intended to be used within your class or module, you may want to prefix the name with an underscore (e.g., def _internal_fun():) to indicate that these functions are for private usages and they’re not public APIs.

Small and Single Purpose

Your functions should be kept small, so they’re easier to manage. Imagine that you’re building a house (not a mansion). However, the bricks you’re using are one meter cubed. Are they easy to use? Probably not — they’re too large. The same principle applies to functions. The functions are the bricks of your project. If the functions are all enormous in size, your construction won’t progress as smoothly as it could. When they’re small, they’re easier to fit into various places and moved around if the need arises.

It’s also key for your functions to serve single purposes, which can help you keep your functions small. Another benefit of single-purpose functions is that you’ll find it much easier to name such functions. You can simply name your function based on its intended single purpose. The following is how we can refactor our functions to make each of them serve only one purpose each. Another thing to note is that by doing that, you can minimize the comments that you need to write — because all the function names tell the story.

Single Purposes

You don’t have unlimited energy and time to write functions for every operation you need, so it’s essential to be familiar with common functions in standard libraries. Before you define your own functions, think about whether the particular business need is common — if so, it’s likely that these particular and related needs have already been addressed.

For instance, if you work with data in the CSV format, you can look into the functionalities in the CSV module. Alternatively, the pandas library can handle CSV files gracefully. For another instance, if you want to count elements in a list, you should consider the Counter class in the collections module, which is designed specifically for these operations.

2. Default Arguments

When we first define a function, it usually serves one particular purpose. However, when you add more features to your project, you may realize that some closely related functions can be merged. The only difference is that the invocation of the merged function sometimes involves passing another argument or setting slightly different arguments. In this case, you can consider setting a default value to the argument.

The other common scenario is that when you declare a function, you already expect that your function serves multiple purposes, with function calls using differential parameters while some other parameters requiring few variations. You should consider setting a default value to the less varied argument.

The benefit of setting default arguments is straightforward — you don’t need to deal with setting unnecessary arguments in most cases. However, the availability of keeping these parameters in your function signature allows you to use your functions more flexibly when you need to. For instance, for the built-in sorted() function, there are several ways to call the function, but in most cases, we just use the basic form: sorted(the_iterable), which will sort the iterable in the ascending lexicographic order. However, when you want to change the ascending order or the default lexicographic order, we can override the default setting by specifying the reverse and key arguments.

We should apply the same practice to our own function declaration. In terms of what value we should set, the rule of thumb is you should choose the default value that is to be used for most function calls. Because this is an optional argument, you (or the users of your APIs) don’t want to set it in most situations. Consider the following example:

Default Arguments

There is a catch for setting the default argument. If your argument is a mutable object, it’s important that you don’t set it using the default constructor — because functions are objects in Python and they’re created when they’re defined. The side effect is that the default argument is evaluated at the time of function declaration, so a default mutable object is created and becomes part of the function. Whenever you call the function using the default object, you’re essentially accessing the same mutable object associated with the function, although your intention may be having the function to create a brand new object for you. The following code snippet shows you the unwanted side effect of setting a default mutable argument:

Default Mutable Object

As shown above, although we intended to create two distinct shopping lists, the second function call still accessed the same underlying object, which resulted in the Soccer item added to the same list object. To solve the problem, we should use the following implementation. Specifically, you should use None as the default value for a mutable argument:

None As the Default Value for Mutable Argument

3. Consider Returning Multiple Values

When your function performs complicated operations, the chances are that these operations can generate two or more objects, all of which are needed for your subsequent data processing. Theoretically, it’s possible that you can create a class to wrap these objects such that your function can return the class instance as its output. However, it’s possible in Python that a function can return multiple values. More precisely speaking, these multiple values are returned as a tuple object. The following code shows you a trivial example:

Multiple Return Values

As shown above, the returned values are simply separated by a comma, which essentially creates a tuple object, as checked by the type() function.

One thing to note is that although Python functions can return multiple values, you should not abuse this feature. One value (when a function doesn’t explicitly return anything, it actually returns None implicitly) is best — because everything is straightforward and most users usually expect a function to return only one value. In some cases, returning two values is fine, returning three values is probably still OK, but please don’t ever return four values. It can create a lot of confusion for the users over which are which. If it happens, this is a good indication that you should refactor your functions — your functions probably serve multiple purposes and you should create smaller ones with more dedicated responsibilities.

4. Use Try…Except

When you define functions as public APIs, you can’t always assume that the users set the desired parameters to the functions. Even if we use the functions ourselves, it’s possible that some parameters are created out of our control and they’re incompatible with our functions. In these cases, what should we do in our function declaration?

The first consideration is to use the try…except statement, which is the typical exception handling technique. You embed the code that can possibly go wrong (i.e., raise certain exceptions) in the try clause and the possible exceptions are handled in the except clause.

Let’s consider the following scenario. Suppose that the particular business need is that your function takes a file path and if the file exists and is read successfully, your function does some data processing operations with the file and returns the result, otherwise returns -1. There are multiple ways to implement this need. The code below shows you a possible solution:

Try…Except Statement

In other words, if you expect that users of your functions can set some arguments that result in exceptions in your code, you can define functions that handle these possible exceptions. However, this should be communicated with the users clearly, unless it’s part of the feature as shown in the example (return -1 when the file can’t be read).

5. Consider Argument Validation

The previous function using the try…except statement is sometimes referred to as the EAFP (Easier to Ask Forgiveness than Permission) coding style. There is another coding style called LBYL (Look Before You Leap), which stresses the sanity check before running particular code blocks.

Following the previous example, in terms of applying LBYL to function declaration, the other consideration is to validate your function’s arguments. One common use case for argument validation is to check whether the argument is of the right data type. As we all know, Python is a dynamically-typed language, which doesn’t enforce type checking. For instance, your function’s arguments should be integers or floating-point numbers. However, calling the function by setting strings — the invocation itself — won’t prompt any error messages until the function is executed.

The following code shows how to validate the arguments before running the code:

Argument Validation

It should be noted that both EAFP and LBYL can be applied to more than just dealing with function arguments. They can be applied anywhere in your functions. Although EAFP is a preferred coding style in the Python world, depending on your use case, you should also consider using LBYL which can provide more user-friendly function-specific error messages than the generic built-in error messages you get with the EAFP style.

6. Consider Lambda Functions As Alternatives

Some functions can take another function (or are callable, in general terms) to perform particular operations. For instance, the sorted() function has the key argument that allows us to define more custom sorting behaviors. The following code snippet shows you a use case:

Custom Sorting Using Function

Notably, the sorting_grade function was used just once and it’s a simple function — in which case, we can consider using a lambda function.

If you’re not familiar with the lambda function, here’s a brief description. A lambda function is an anonymous function declared using the lambda keyword. It takes zero to more arguments and has one expression for applicable operations with the form: lambda arguments: expression. The following code shows you how we can use a lambda function in the sorted() function, which looks a little cleaner than the solution above:

Custom Sorting Using Lambda

Another common use-case that’s relevant to many data scientists is the use of lambda functions when they work with the pandas library. The following code is a trivial example how a lambda function assists data manipulation using the map() function, which operates each item in a pandas Series object:

Data Manipulation With map() and Lambda

7. Consider Decorators

Decorators are functions that modify the behavior of other functions without affecting their core functionalities. In other words, they provide modifications to the decorated functions at the cosmetic level. If you don’t know too much about decorators, please feel free to refer to my earlier articles (1, 2, and 3). Here’s a trivial example of how decorators work in Python.

Basic Decorator

As shown, the decorator function simply runs the decorated function twice. To use the decorator, we simply place the decorator function name above the decorated function with an @ prefix. As you can tell, the decorated function did get called twice.

For instance, one useful decorator is the property decorator that you can use in your custom class. The following code shows you how it works. In essence, the @property decorator converts an instance method to make it behave like a regular attribute, which allows the access of using the dot notation.

Decorators: Property

Another trivial use case of decorators is the time logging decorator, which can be particularly handy when the efficiency of your functions is of concern. The following code shows you such a usage:

Logging Time

8. Use *args and **kwargs — But Parsimoniously

In the previous section, you saw the use of *args and **kwargs in defining our decorator function, the use of which allows the decorator function to decorate any functions. In essence, we use *args to capture all (or an undetermined number of, to be more general) position arguments while **kwargs to capture all (or an undetermined number of, to be more general) keyword arguments. Specifically, position arguments are based on the positions of the arguments that are passed in the function call, while keyword arguments are based on setting parameters to specifically named function arguments.

If you’re unfamiliar with these terminologies, here’s a quick peek to the signature of the built-in sorted() function: sorted(iterable, *, key=None, reverse=False). The iterable argument is a position argument, while the key and reverse arguments are keyword arguments.

The major benefit of using *args and **kwargs is to make your function declaration looks clean, or less noisy for the same matter. The following example shows you a legitimate use of *arg in function declaration, which allows your function to accept any number of position arguments.

Use of *args

The following code shows you a legitimate use of **kwargs in function declaration. Similarly, the function with **kwargs allows the users to set any number of keyword arguments, to make your function more flexible.

Use of **kwargs

However, in most cases, you don’t need to use *args or **kwargs. Although it can make your declaration a bit cleaner, it hides the function’s signature. In other words, the users of your functions have to figure out exactly what parameters your functions take. So my advice is to avoid using them if you don’t have to. For instance, can I use a dictionary argument to replace the **kwargs? Similarly, can I use a list or tuple object to replace *args? In most cases, these alternatives should work without any problems.

9. Type Annotation for Arguments

As mentioned previously, Python is a dynamically-typed programming language as well as an interpreted language, the implication of which is that Python doesn’t check code validity, including type compatibility, during coding time. Until your code actually executes, will type incompatibility with your function (e.g., send a string to a function when an integer is expected) emerge.

For these reasons, Python doesn’t enforce the declaration of the type of input and output arguments. In other words, when you create your functions, you don’t need to specify what types of parameters they should have. However, it has become possible to do that in recent Python releases. The major benefit of having type annotation is that some IDEs (e.g., PyCharm or Visual Studio Code) could use the annotations to check the type compatibility for you, so that when you or other users use your functions you can get proper hints.

Another related benefit is that if the IDEs know the type of parameter, it can give proper auto-completion suggestions to help you code faster. Certainly, when you write docstrings for your functions, these type annotations will also be informative to the end developers of your code.

Image for post
Helpful Information With Type Annotation (PyCharm)

10. Responsible Documentation

I equate good documentation with responsible documentation. If your functions are for private uses, you don’t have to write very thorough documentation — you can make the assumption that your code tells the story clearly. If anywhere requires some clarification, you can write a very brief comment that can serve as a reminder for yourself or other readers when your code is revisited. Here, the discussion of responsible documentation is more concerned with the docstrings of your function as public APIs. The following aspects should be included:

  • A brief summary of the intended operation of your function. This should be very concise. In most cases, the summary shouldn’t be more than one sentence.
  • Input arguments: Type and explanation. You need to specify what type of your input arguments should be and what they can do by setting particular options.
  • Return Value: Type and explanation. Just as with input arguments, you need to specify the output of your function. If it doesn’t return anything, you can optionally specify None as the return value.

Conclusions

If you’re experienced with coding, you’ll find out that most of your time is spent on writing and refactoring functions. After all, your data usually doesn’t change too much itself— it’s the functions that process and manipulate your data. If you think of data as the trunk of your body, functions are the arms and legs that move you around. So, we have to write good functions to make our programs agile.

I hope that this article has conveyed some useful information that you can use in your coding.

Thanks for reading.

+ Recent posts