Data Science and Data Proicessing

Advanced Python: 9 Best Practices to Apply When You Define Classes

How to make your code more readable and maintainable

Aug 12 · 11 min read
Image for post
Photo by FERESHTEH AZADI on Unsplash

At its core, Python is an object-oriented programming (OOP) language. Being an OOP language, Python handles data and functionalities by supporting various features centered around objects. For instance, data structures are all objects, including primitive types (e.g., integers and strings) which aren’t considered objects in some other languages. For another instance, functions are all objects, and they are merely attributes of other objects where they are defined (e.g., class or module).

Although you can use built-in data types and write a bunch of functions without creating any custom classes, chances are that your code can become harder and harder to maintain when the project’s scope grows. These separate pieces have no shared themes, and there will be lots of hassles to manage connections between them, although much of the information is related.

In these scenarios, it’s worth defining your own classes, which will allow you to group related information and improve the structural design of your project. More importantly, the long-term maintainability of your codebase will be improved, because you’ll be dealing with less pieced code. However, there is a catch — this is only true when your class declaration is done in the right way such that the benefits of defining custom classes can outweigh the overhead of managing them.

In this article, I’d like to review nine important best practices that you should consider applying to your custom classes.

1. Good Names

When you’re defining your own class, you’re adding a new baby to your codebase. You should give the class a very good name. Although the only limit of your class name is the rules of a legal Python variable (e.g., can’t start with a number), there are preferred ways to give class names.

  • Use nouns that are easy to pronounce. It’s especially important if you work on a team project. During a group presentation, you probably don’t want to be the person to say, “in this case, we create an instance of the Zgnehst class.” In addition, being easy to pronounce also means the name shouldn’t be too long. I can barely think of cases when you need to use more than three words to define a class name. One word is best, two words are good, and three words are the limit.
  • Reflect its stored data and intended functionalities. It’s like in our real life — boys are given boy names. When we see boy names, we expect the kids are boys. It applies to class names too (or any other variables in general). The rule is simple — Don’t surprise people. If you’re dealing with the students’ information, the class should be named Student. KiddosAtCampus isn’t making the most common sense.
  • Follow naming conventions. We should use upper-case camel style for class names, like GoodName. The following is an incomplete list of unconventional class names: goodName, Good_Name, good_name, and GOodnAme. Following naming conventions is to make your intention clear. When people read your code, they can safely assume that an object with names like GoodName is a class.

There are also naming rules and conventions that apply to attributes and functions. In the below sections, I’ll briefly mention them where applicable, but the overall principles are the same. The only rule of thumb is simple: Don’t surprise people.

2. Explicit Instance Attributes

In most cases, we want to define our own instance initialization method (i.e., __init__). In this method, we set the initial state of our newly created instances of the class. However, Python doesn’t restrict where you can define instance attributes with custom classes. In other words, you can define additional instance attributes in later operations after the instance has been created. The following code shows you a possible scenario.

Initialization Method

As shown above, we can create an instance of the Student class by specifying a student’s first and last names. Later, when we call the instance method (i.e., verify_registration_status), the Student instance’s status attribute will be set. However, this isn’t the desired pattern, because if you spread various instance attributes throughout the entire class, you’re not making the class clear what data an instance object holds. Thus, the best practice is to place an instance’s attributes in the __init__ method, such that your code’s reader has a single place to get to know your class’s data structure, as shown below.

Better Initialization Method

For those instance attributes that you can’t set initially, you can set them with placeholder values, such as None. Although it’s of less concern, this change also helps prevent the possible error when you forget to call some instance methods to set the applicable instance attributes, causing AttributeError (‘Student’ object has no attribute ‘status_verified’).

In terms of the naming rules, the attributes should be named using lower cases and follow the snake case style, which means that if you use multiple words, connect them with underscores. Moreover, all the names should have meaningful indication regarding what data it holds (e.g., first_name is better than fn).

3. Use Properties — But Parsimoniously

Some people learn Python coding with an existing background of other OOP languages, such as Java, and they’re used to creating getters and setters for attributes of the instances. This pattern can be mimicked with the use of the property decorator in Python. The following code shows you the basic form of using the property decorator to implement getters and setters.

Property Decorator

Once this property is created, we can use it as regular attributes using the dot notation, although it’s implemented using functions under the hood.

Use Properties

As you may know, the advantages of using property implementations include verification of proper value settings (check a string is used, not an integer) and read-only access (by not implementing the setter method). However, you should use properties parsimoniously. It can be very distracting if your custom class looks like the below — there are too many properties!

Abuse of Properties

In most cases, these properties can be replaced with instance attributes, and thus we can access them and set them directly. Unless you have specific needs for the benefits of using properties as discussed (e.g., value verification), using attributes is preferred over creating properties in Python.

4. Define Meaningful String Representations

In Python, functions that have double underscores before and after the name are referred to as special or magic methods, and some people call them dunder methods. They have special usages for basic operations by the interpreter, including the __init__ method that we’ve covered previously. Two special methods, __repr__ and __str__, are essential for creating proper string representations of your custom class, which will give the code readers more intuitive information about your classes.

Between them, the major difference is that the __repr__ method defines the string, using which you can re-create the object by calling eval(repr(“the repr”)), while the __str__ method defines the string that is more descriptive and allows more customization. In other words, you can think that the string defined in the __repr__ method is to be viewed by developers while that used in the __str__ method is to be viewed by regular users. The following shows you an example.

Implementation of String Representations

Please note that in the __repr__ method’s implementation (Line 7), the f-string uses !r which will show these strings with quotation marks, because they’re necessary to construct the instance with strings properly formatted. Without the !r formatting, the string will be Student(John, Smith), which isn’t the correct way to construct a Student instance. Let’s see how these implementations show the strings for us. Specifically, the __repr__ method is called when you access the object in the interactive interpreter, while the __str__ method is called by default when you print the object.

String Representations

5. Instance, Class, and Static Methods

In a class, we can define three kinds of methods: instance, class, and static methods. You need to consider what methods you should use for the functionalities of concern. Here are some general guidelines.

If the methods are concerned with individual instance objects, for example, you need to access or update particular attributes of an instance, in which cases, you should use instance methods. These methods have a signature like this: def do_something(self):, in which the self argument refers to the instance object that calls the method. To know more about the self argument, you can refer to my previous article on this topic.

If the methods are not concerned with individual instance objects, you should consider using class or static methods. Both methods can be easily defined with applicable decorators: classmethod and staticmethod. The difference between these two is that class methods allow you to access or update attributes related to the class, while static methods are independent of any instance or the class itself. A common example of a class method is providing a convenience instantiation method, while a static method can be simply a utility function. The following code shows you some examples.

Different Kinds of Methods

In a similar fashion, you can also create class attributes. Unlike instance attributes that we discussed earlier, class attributes are shared by all instance objects, and they should reflect some characteristics independent of individual instance objects.

6. Encapsulation Using Private Attributes

When you write custom classes for your project, you need to take into account encapsulation, especially if you’re expecting that others will use your classes too. When the functionalities of the class grow, some functions or attributes are only relevant for data processing within the class. In other words, outside the class, these functions won’t be called and other users of your class won’t even care about the implementation details of these functions. In these scenarios, you should consider encapsulation.

One important way to apply encapsulation is to prefix attributes and functions with an underscore or two underscores, as a convention. The subtle difference is that those with an underscore are considered protected, while those with two underscores are considered private, which involves name-mangling after its creation. Differentiating these two categories is beyond the scope of the present article, and one of my previous articles have covered them.

In essence, by naming attributes and functions this way, you’re telling the IDEs (i.e., integrated development environment, such as PyCharm) that they’re not going to be accessed outside the class, although true private attributes don’t exist in Python. In other words, they’re still accessible if we choose so.

Encapsulation

The above code shows you a trivial example of encapsulation. For a student, we may be interested in knowing their average GPA, and we can get the point using the get_mean_gpa method. The user doesn’t need to know how the mean GPA is calculated, such that we can make related methods protected by placing an underscore prefixing the function names.

The key takeaway for this best practice is that you expose only the minimal number of public APIs that are relevant for the users to use your code. For those that are used only internally, make them protected or private methods.

7. Separate Concerns and Decoupling

With the development of your project, you find out that you’re dealing with more data, your class can become cumbersome if you’re sticking to one single class. Let’s continue with the example of the Student class. Suppose that students have lunch at school, and each of them has a dining account that they can use to pay for meals. Theoretically, we can deal with account-related data and functionalities within the Student class, as shown below.

Mixed Functionalities

The above code shows you some pseudocode on checking account balance and loading money to the account, both of which are implemented in the Student class. Imagine that there are more operations that can be related to the account, such as suspending a lost card, consolidating accounts — to implement all of them will make the Student class larger and larger, which make it gradually more difficult to maintain. Instead, you should isolate these responsibilities and make your Student class irresponsible for these account-related functionalities — a design pattern termed as decoupling.

Separated Concerns

The above code shows you how we can design the data structures with an additional Account class. As you can see, we move all account-related operations into the Account class. To retrieve the account information for the student, the Student class will handle the functionality by retrieving information from the Account class. If we want to implement more functions related to the class, we can simply update the Account class only.

The main takeaway for the design pattern is that you want your individual classes to have separate concerns. By having these responsibilities separated, your classes become smaller, which makes future changes easier, because you’ll be dealing with smaller code components.

8. Consider __slots__ For Optimization

If your class is used mostly as data containers for storing data only, you can consider using __slots__ to optimize the performance of your class. It doesn’t only increase the speed of attribute accessing but also saves memory, which can be a big benefit if you need to create thousands or many more instance objects. The reason is that for a regular class, instance attributes are stored through an internally managed dictionary. By contrast, with the use of the __slots__, instance attributes will be stored using array-related data structures implemented using C under the hood, and their performance is optimized with much higher efficiency.

Use of __slots__ in Class Definition

The above code shows you a trivial example of how we implement the __slots__ in a class. Specifically, you list all the attributes as a sequence, which will create a one-to-one match in data storage for faster access and less memory consumption. As just mentioned, regular classes use a dictionary for attribute accessing but not for those with __slots__ implemented. The following code shows you such a fact.

No __dict__ in Classes With __slots__

A detailed discussion of using __slots__ can be found in a nice answer on Stack Overflow, and you can find more information from the official documentation. Regarding the gained benefits of faster access and saved memory, a recent Medium article has a very good demonstration, and I’m not going to expand on this. However, one thing to note is that using __slots__ will have a side effect — it prevents you from dynamically creating additional attributes. Some people propose it as a mechanism for controlling what attributes your class has, but it’s not how it was designed.

9. Documentation

Last, but not least, we have to talk about documentation of your class. Most importantly, we need to understand that writing documents isn’t replacing any code. Writing tons of documents doesn’t improve your code’s performance, and it doesn’t necessarily make your code more readable. If you have to rely on docstrings to clarify your code, it’s very likely that your code has problems. I truly believe that your code should speak all by itself. The following code just shows you a mistake that some programmers can make — using unnecessary comments to compensate for bad code (i.e., meaningless variable names in this case). By contrast, some good code with good names doesn’t even need comments.

Bad Comment Examples

I’m not saying that I’m against writing comments and docstrings, but it really depends on your use cases. If your code is used by more than one person or more than one occasion (e.g., you’re the only one accessing the code but for multiple times), you should consider writing some good comments. They can help yourself or your teammates read your code, but no one should assume that your code does exactly what’s said in the comments. In other words, writing good code is always the top priority that you need to keep in mind.

If particular portions of your code are to be used by end users, you want to write docstrings, because those people aren’t familiar with the relevant codebase. All they want to know is how to use the pertinent APIs, and the docstrings will form the basis for the help menu. Thus, it’s your responsibility as the programmer to make sure that you provide clear instructions on how to use your programs.

Conclusions

In this article, we reviewed important factors that you need to consider when you define your own classes. If you’re new to Python or programming in general, you may not fully understand every aspect that we’ve discussed, which is OK. The more you code, the more you’ll find the importance of having these principles in mind before you define your classes. Practice these guidelines continuously when you work with classes because a good design will save much of your development time later.

+ Recent posts