The Beginner’s Guide to Pydantic
A Python package to parse and validate data
The topic for today is on data validation and settings management using Python type hinting. We are going to use a Python package called pydantic
which enforces type hints at runtime. It provides user-friendly errors, allowing you to catch any invalid data. Based on the official documentation, Pydantic is
“… primarily a parsing library, not a validation library. Validation is a means to an end: building a model which conforms to the types and constraints provided.
In other words, pydantic guarantees the types and constraints of the output model, not the input data.”
There are three sections in this tutorial:
- Setup
- Implementation
- Conclusion
Let’s proceed to the next section and start installing the necessary modules.
1. Setup
It is highly recommended to create a virtual environment before you proceed with the installation.
Basic installation
Open up a terminal and run the following command to install pydantic
pip install pydantic
Upgrade existing package
If you already have an existing package and would like to upgrade it, kindly run the following command:
pip install -U pydantic
Anaconda
For Anaconda users, you can install it as follows:
conda install pydantic -c conda-forge
Optional dependencies
pydantic
comes with the following optional dependencies based on your needs:
email-validator
— Support for email validation.typing-extensions
— Support use ofLiteral
prior to Python 3.8.python-dotenv
— Support fordotenv
file with settings.
You can install them manually:
# install email-validator
pip install email-validator# install typing-extensions
pip install typing_extensions# install python-dotenv
pip install python-dotenv
or along with pydantic
as follows:
# install email-validator
pip install pydantic[email]# install typing-extensions
pip install pydantic[typing_extensions]# install python-dotenv
pip install pydantic[dotenv]# install all dependencies
pip install pydantic[email,typing_extensions,dotenv]
2. Implementation
In this section, we are going to explore some of the useful functionalities available in pydantic
.
Defining an object in pydantic
is as simple as creating a new class which inherits from theBaseModel
. When you create a new object from the class, pydantic
guarantees that the fields of the resultant model instance will conform to the field types defined on the model.
Import
Add the following import declaration at the top of your Python file.
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
User class
Declare a new class which inherits the BaseModel
as follow:
class User(BaseModel):
id: int
username : str
password : str
confirm_password : str
alias = 'anonymous'
timestamp: Optional[datetime] = None
friends: List[int] = []
pydantic
uses the built-in type hinting syntax to determine the data type of each variable. Let’s explore one by one what happens behind the scenes.
id
— An integer variable represents an ID. Since the default value is not provided, this field is required and must be specified during object creation. Strings, bytes, or floats will be coerced to integer if possible; otherwise, an exception will be raised.username
— A string variable represents a username and is required.password
— A string variable represents a password and is required.confirm_password
— A string variable represents a confirmation password and is required. It will be used for data validation later on.alias
— A string variable represents an alias. It is not required and will be set to anonymous if it is not provided during object creation.timestamp
— A date/time field, which is not required. Default to None.pydantic
will process either a unix timestamp int or a string representing the date/time.friends
— A list of integer inputs.
Object instantiation
The next step is to instantiate a new object from the User
class.
data = {'id': '1234', 'username': 'wai foong', 'password': 'Password123', 'confirm_password': 'Password123', 'timestamp': '2020-08-03 10:30', 'friends': [1, '2', b'3']}user = User(**data)
You should get the following output when you print out the user
variable. You can notice that id
has been automatically converted to an integer, even though the input is a string. Likewise, bytes are automatically converted to integers, as shown by the friends
field.
id=1234 username='wai foong' password='Password123' confirm_password='Password123' timestamp=datetime.datetime(2020, 8, 3, 10, 30) friends=[1, 2, 3] alias='anonymous'
Methods and attributes under BaseModel
Classes that inherit the BaseModel
will have the following methods and attributes:
dict()
— returns a dictionary of the model’s fields and valuesjson()
— returns a JSON string representation dictionarycopy()
— returns a deep copy of the modelparse_obj()
— a utility for loading any object into a model with error handling if the object is not a dictionaryparse_raw()
— a utility for loading strings of numerous formatsparse_field()
— similar toparse_raw()
but meant for filesfrom_orm()
— loads data into a model from an arbitrary classschema()
— returns a dictionary representing the model as JSON schemaschema_json()
— returns a JSON string representation ofschema()
construct()
— a class method for creating models without running validation__fields_set__
— Set of names of fields which were set when the model instance was initialized__fields__
— a dictionary of the model’s fields__config__
— the configuration class for the model
Let’s change the input for id
to a string as follows:
data = {'id': 'a random string', 'username': 'wai foong', 'password': 'Password123', 'confirm_password': 'Password123', 'timestamp': '2020-08-03 10:30', 'friends': [1, '2', b'3']}user = User(**data)
You should get the following error when you run the code.
value is not a valid integer (type=type_error.integer)
ValidationError
In order to get better details on the error, it is highly recommended to wrap it inside a try-catch block, as follows:
from pydantic import BaseModel, ValidationError# ... codes for User classdata = {'id': 'a random string', 'username': 'wai foong', 'password': 'Password123', 'confirm_password': 'Password123', 'timestamp': '2020-08-03 10:30', 'friends': [1, '2', b'3']}try:
user = User(**data)
except ValidationError as e:
print(e.json())
It will print out the following JSON, which indicates that the input for id
is not a valid integer.
[
{
"loc": [
"id"
],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]
Field types
pydantic
provides support for most of the common types from the Python standard library. The full list is as follows:
- bool
- int
- float
- str
- bytes
- list
- tuple
- dict
- set
- frozenset
- datetime.date
- datetime.time
- datetime.datetime
- datetime.timedelta
- typing.Any
- typing.TypeVar
- typing.Union
- typing.Optional
- typing.List
- typing.Tuple
- typing.Dict
- typing.Set
- typing.FrozenSet
- typing.Sequence
- typing.Iterable
- typing.Type
- typing.Callable
- typing.Pattern
- ipaddress.IPv4Address
- ipaddress.IPv4Interface
- ipaddress.IPv4Network
- ipaddress.IPv6Address
- ipaddress.IPv6Interface
- ipaddress.IPv6Network
- enum.Enum
- enum.IntEnum
- decimal.Decimal
- pathlib.Path
- uuid.UUID
- ByteSize
Constrained types
You can enforce your own restriction via the Constrained Types
. Let’s have a look at the following example:
from pydantic import (
BaseModel,
NegativeInt,
PositiveInt,
conint,
conlist,
constr
)class Model(BaseModel):
# minimum length of 2 and maximum length of 10
short_str: constr(min_length=2, max_length=10) # regex
regex_str: constr(regex=r'^apple (pie|tart|sandwich)$') # remove whitespace from string
strip_str: constr(strip_whitespace=True)
# value must be greater than 1000 and less than 1024
big_int: conint(gt=1000, lt=1024)
# value is multiple of 5
mod_int: conint(multiple_of=5)
# must be a positive integer
pos_int: PositiveInt
# must be a negative integer
neg_int: NegativeInt
# list of integers that contains 1 to 4 items
short_list: conlist(int, min_items=1, max_items=4)
Strict types
If you are looking for rigid restrictions which pass validation if and only if the validated value is of the respective type or is a subtype of that type, you can use the following strict types:
- StrictStr
- StrictInt
- StrictFloat
- StrictBool
The following example illustrates the proper way to enforce StrictBool
in your inherited class.
from pydantic import BaseModel, StrictBool,class StrictBoolModel(BaseModel):
strict_bool: StrictBool
The string ‘False’
will raise ValidationError as it will only accept either True
or False
as input.
Validator
Furthermore, you can create your own custom validators using the validator
decorator inside your inherited class. Let’s have a look at the following example which determine if the id
is of four digits and whether the confirm_password
matches the password
field.
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel, ValidationError, validatorclass User(BaseModel):
id: int
username : str
password : str
confirm_password : str
alias = 'anonymous'
timestamp: Optional[datetime] = None
friends: List[int] = [] @validator('id')
def id_must_be_4_digits(cls, v):
if len(str(v)) != 4:
raise ValueError('must be 4 digits')
return v @validator('confirm_password')
def passwords_match(cls, v, values, **kwargs):
if 'password' in values and v != values['password']:
raise ValueError('passwords do not match')
return v
3. Conclusion
Let’s recap what we have learned today.
We started off with a detailed explanation on Pydantic which helps to parse and validate data.
Next, we created a virtual environment and installed Pydantic via pip or conda. It also includes support for three additional dependencies based on our use cases.
Once we were done with the installation, we explored in-depth the basic functionalities provided by the package. The basic building block is to create a new class which inherits from BaseModel
.
We learned that Pydantic provides support for most of the common data types under Python standard library. We tested out both the Constrained Types
and Strict Types
which helps to enforce our own custom restrictions.
Lastly, you played around with the validator
decorator to allow only four digits input for id
, and the confirm_password
must match the password
field.
Thanks for reading this piece. Hope to see you again in the next article!
'Data Analytics(en)' 카테고리의 다른 글
Advanced Python: Consider These 10 Elements When You Define Python Functions (0) | 2020.10.21 |
---|---|
ROCKET: Fast and Accurate Time Series Classification (0) | 2020.10.20 |
7 Commands in Python to Make Your Life Easier (0) | 2020.10.18 |
Don’t Choose Python as Your First Programming Language (0) | 2020.10.17 |
Visual Studio Code for Data Science — the Power User’s guide (0) | 2020.10.16 |