Data Science and Data Proicessing

The Beginner’s Guide to Pydantic

A Python package to parse and validate data

Aug 10 · 7 min read
Photo by Marc Babin on Unsplash

The topic for today is on data validation and settings management using Python type hinting. We are going to use a Python package called pydantic which enforces type hints at runtime. It provides user-friendly errors, allowing you to catch any invalid data. Based on the official documentation, Pydantic is

“… primarily a parsing library, not a validation library. Validation is a means to an end: building a model which conforms to the types and constraints provided.

In other words, pydantic guarantees the types and constraints of the output model, not the input data.”

There are three sections in this tutorial:

  1. Setup
  2. Implementation
  3. Conclusion

Let’s proceed to the next section and start installing the necessary modules.

1. Setup

It is highly recommended to create a virtual environment before you proceed with the installation.

Basic installation

Open up a terminal and run the following command to install pydantic

pip install pydantic

Upgrade existing package

If you already have an existing package and would like to upgrade it, kindly run the following command:

pip install -U pydantic

Anaconda

For Anaconda users, you can install it as follows:

conda install pydantic -c conda-forge

Optional dependencies

pydantic comes with the following optional dependencies based on your needs:

  • email-validator — Support for email validation.
  • typing-extensions — Support use of Literal prior to Python 3.8.
  • python-dotenv — Support for dotenv file with settings.

You can install them manually:

# install email-validator
pip install email-validator
# install typing-extensions
pip install typing_extensions
# install python-dotenv
pip install python-dotenv

or along with pydantic as follows:

# install email-validator
pip install pydantic[email]
# install typing-extensions
pip install pydantic[typing_extensions]
# install python-dotenv
pip install pydantic[dotenv]
# install all dependencies
pip install pydantic[email,typing_extensions,dotenv]

2. Implementation

In this section, we are going to explore some of the useful functionalities available in pydantic.

Defining an object in pydantic is as simple as creating a new class which inherits from theBaseModel. When you create a new object from the class, pydantic guarantees that the fields of the resultant model instance will conform to the field types defined on the model.

Import

Add the following import declaration at the top of your Python file.

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel

User class

Declare a new class which inherits the BaseModel as follow:

class User(BaseModel):
id: int
username : str
password : str
confirm_password : str
alias = 'anonymous'
timestamp: Optional[datetime] = None
friends: List[int] = []

pydantic uses the built-in type hinting syntax to determine the data type of each variable. Let’s explore one by one what happens behind the scenes.

  • id — An integer variable represents an ID. Since the default value is not provided, this field is required and must be specified during object creation. Strings, bytes, or floats will be coerced to integer if possible; otherwise, an exception will be raised.
  • username — A string variable represents a username and is required.
  • password — A string variable represents a password and is required.
  • confirm_password — A string variable represents a confirmation password and is required. It will be used for data validation later on.
  • alias — A string variable represents an alias. It is not required and will be set to anonymous if it is not provided during object creation.
  • timestamp — A date/time field, which is not required. Default to None. pydantic will process either a unix timestamp int or a string representing the date/time.
  • friends — A list of integer inputs.

Object instantiation

The next step is to instantiate a new object from the User class.

data = {'id': '1234', 'username': 'wai foong', 'password': 'Password123', 'confirm_password': 'Password123', 'timestamp': '2020-08-03 10:30', 'friends': [1, '2', b'3']}user = User(**data)

You should get the following output when you print out the user variable. You can notice that id has been automatically converted to an integer, even though the input is a string. Likewise, bytes are automatically converted to integers, as shown by the friends field.

id=1234 username='wai foong' password='Password123' confirm_password='Password123' timestamp=datetime.datetime(2020, 8, 3, 10, 30) friends=[1, 2, 3] alias='anonymous'

Methods and attributes under BaseModel

Classes that inherit the BaseModel will have the following methods and attributes:

  • dict() — returns a dictionary of the model’s fields and values
  • json() — returns a JSON string representation dictionary
  • copy() — returns a deep copy of the model
  • parse_obj() — a utility for loading any object into a model with error handling if the object is not a dictionary
  • parse_raw() — a utility for loading strings of numerous formats
  • parse_field() — similar to parse_raw() but meant for files
  • from_orm() — loads data into a model from an arbitrary class
  • schema() — returns a dictionary representing the model as JSON schema
  • schema_json() — returns a JSON string representation of schema()
  • construct() — a class method for creating models without running validation
  • __fields_set__ — Set of names of fields which were set when the model instance was initialized
  • __fields__ — a dictionary of the model’s fields
  • __config__ — the configuration class for the model

Let’s change the input for id to a string as follows:

data = {'id': 'a random string', 'username': 'wai foong', 'password': 'Password123', 'confirm_password': 'Password123', 'timestamp': '2020-08-03 10:30', 'friends': [1, '2', b'3']}user = User(**data)

You should get the following error when you run the code.

value is not a valid integer (type=type_error.integer)

ValidationError

In order to get better details on the error, it is highly recommended to wrap it inside a try-catch block, as follows:

from pydantic import BaseModel, ValidationError# ... codes for User classdata = {'id': 'a random string', 'username': 'wai foong', 'password': 'Password123', 'confirm_password': 'Password123', 'timestamp': '2020-08-03 10:30', 'friends': [1, '2', b'3']}try:
user = User(**data)
except ValidationError as e:
print(e.json())

It will print out the following JSON, which indicates that the input for id is not a valid integer.

[
{
"loc": [
"id"
],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]

Field types

pydantic provides support for most of the common types from the Python standard library. The full list is as follows:

  • bool
  • int
  • float
  • str
  • bytes
  • list
  • tuple
  • dict
  • set
  • frozenset
  • datetime.date
  • datetime.time
  • datetime.datetime
  • datetime.timedelta
  • typing.Any
  • typing.TypeVar
  • typing.Union
  • typing.Optional
  • typing.List
  • typing.Tuple
  • typing.Dict
  • typing.Set
  • typing.FrozenSet
  • typing.Sequence
  • typing.Iterable
  • typing.Type
  • typing.Callable
  • typing.Pattern
  • ipaddress.IPv4Address
  • ipaddress.IPv4Interface
  • ipaddress.IPv4Network
  • ipaddress.IPv6Address
  • ipaddress.IPv6Interface
  • ipaddress.IPv6Network
  • enum.Enum
  • enum.IntEnum
  • decimal.Decimal
  • pathlib.Path
  • uuid.UUID
  • ByteSize

Constrained types

You can enforce your own restriction via the Constrained Types. Let’s have a look at the following example:

from pydantic import (
BaseModel,
NegativeInt,
PositiveInt,
conint,
conlist,
constr
)
class Model(BaseModel):
# minimum length of 2 and maximum length of 10
short_str: constr(min_length=2, max_length=10)
# regex
regex_str: constr(regex=r'^apple (pie|tart|sandwich)$')
# remove whitespace from string
strip_str: constr(strip_whitespace=True)

# value must be greater than 1000 and less than 1024
big_int: conint(gt=1000, lt=1024)

# value is multiple of 5
mod_int: conint(multiple_of=5)

# must be a positive integer
pos_int: PositiveInt

# must be a negative integer
neg_int: NegativeInt

# list of integers that contains 1 to 4 items
short_list: conlist(int, min_items=1, max_items=4)

Strict types

If you are looking for rigid restrictions which pass validation if and only if the validated value is of the respective type or is a subtype of that type, you can use the following strict types:

  • StrictStr
  • StrictInt
  • StrictFloat
  • StrictBool

The following example illustrates the proper way to enforce StrictBool in your inherited class.

from pydantic import BaseModel, StrictBool,class StrictBoolModel(BaseModel):
strict_bool: StrictBool

The string ‘False’ will raise ValidationError as it will only accept either True or False as input.

Validator

Furthermore, you can create your own custom validators using the validator decorator inside your inherited class. Let’s have a look at the following example which determine if the id is of four digits and whether the confirm_password matches the password field.

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel, ValidationError, validator
class User(BaseModel):
id: int
username : str
password : str
confirm_password : str
alias = 'anonymous'
timestamp: Optional[datetime] = None
friends: List[int] = []
@validator('id')
def id_must_be_4_digits(cls, v):
if len(str(v)) != 4:
raise ValueError('must be 4 digits')
return v
@validator('confirm_password')
def passwords_match(cls, v, values, **kwargs):
if 'password' in values and v != values['password']:
raise ValueError('passwords do not match')
return v

3. Conclusion

Let’s recap what we have learned today.

We started off with a detailed explanation on Pydantic which helps to parse and validate data.

Next, we created a virtual environment and installed Pydantic via pip or conda. It also includes support for three additional dependencies based on our use cases.

Once we were done with the installation, we explored in-depth the basic functionalities provided by the package. The basic building block is to create a new class which inherits from BaseModel.

We learned that Pydantic provides support for most of the common data types under Python standard library. We tested out both the Constrained Types and Strict Types which helps to enforce our own custom restrictions.

Lastly, you played around with the validator decorator to allow only four digits input for id, and the confirm_password must match the password field.

Thanks for reading this piece. Hope to see you again in the next article!

+ Recent posts