(Pre-)Commit to Better Code
Stefanie Molin
Prerequisites
-
Comfort writing Python code and working with Git on the command
line using basic commands (e.g.,
clone
,
status
, diff
, add
,
commit
, and push
)
-
Have Python and Git installed on your computer, as well as a
text editor for writing code (e.g.,
Visual Studio Code)
-
Fork and clone this repository:
github.com/stefmolin/pre-commit-workshop
Setting Up Pre-Commit Hooks
Overview of Git hooks
-
Scripts triggered when taking certain actions on a repository
(e.g., committing changes)
-
Can be client-side or server-side
-
Stored in the
.git/hooks/
directory of your
repository, but excluded from version control
.git
└── hooks
├── commit-msg.sample
├── pre-commit.sample
├── pre-merge-commit.sample
├── pre-push.sample
└── [...]
Overview of pre-commit hooks
-
One pre-commit hook executable per repository
-
Triggered by Git when you run
git commit
-
Often used to enforce coding standards (e.g., linting
and formatting)
-
Should only include checks that run quickly
-
The commit fails if any of the checks fail
How to enable pre-commit hooks
-
Create an executable script that runs any checks you want to
include and save it as
.git/hooks/pre-commit
,
-
Or, use a tool like
pre-commit
and include the hook
configuration in version control.
Pre-commit hooks != pre-commit
-
pre-commit
is "a multi-language package manager for
pre-commit hooks" (source).
-
pre-commit
installs its own script as the
pre-commit hook executable for your repository.
-
The installed pre-commit hook then runs
pre-commit
,
which in turn runs any checks you specify using a YAML file.
Setup
-
Install the
pre-commit
package in your repository.
-
Define the checks you want to use in a file called
.pre-commit-config.yaml
.
-
Have
pre-commit
install itself as your pre-commit
hook.
-
Work on your code as usual.
Package installation
$ python3 -m pip install pre-commit
Basic configuration
Defined in .pre-commit-config.yaml
at the repository
root:
The repos
section lists the repositories to use:
This represents the configuration for a single repository:
The URL of the repository to clone:
The tag or commit hash to clone at:
The hooks to run from that repository:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
Source:
How to Set Up Pre-Commit Hooks
by Stefanie Molin
Git hook installation
While the .pre-commit-config.yaml
file lives in version
control, each contributor must run this locally:
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
Add the configuration to version control
The check-yaml
hook we included will make sure that the
.pre-commit-config.yaml
file (and any other YAML files
in this repository) is valid YAML. Try committing it:
$ git add .pre-commit-config.yaml
$ git commit -m "Add pre-commit config"
[INFO] Initializing environment for [...]/pre-commit-hooks.
[INFO] Installing environment for [...]/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
check toml..........................(no files to check)Skipped
check yaml..............................................Passed
fix end of files........................................Passed
trim trailing whitespace................................Passed
Exercise
There are more hooks provided by the repository that we are
currently using. Take a moment to look through their documentation,
and add some additional hooks to the setup that are relevant to your
work. Be sure to commit your changes.
github.com/pre-commit/pre-commit-hooks
Example solution
Adding these two hooks ensures that your scripts are actually
executable:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
+ - id: check-executables-have-shebangs
+ - id: check-shebang-scripts-are-executable
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
Third-party hooks
We are free to use hooks from multiple repositories. A
non-exhaustive list of popular hooks along with tips for searching
for compatible repositories can be found at
pre-commit.com/hooks.html.
Let's add
ruff
to lint and format our Python code, since it is "10-100x faster
than existing linters (like Flake8) and formatters (like
Black),"*
and speed very is important with pre-commit hooks.
Adding ruff
to .pre-commit-config.yaml
Create a new entry under the repos
key:
The ruff
pre-commit hook is in a separate repository:
The tag or commit hash to clone at:
The hooks to run from that repository:
Use args
to pass command line arguments to the hook:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.10
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --show-fixes]
- id: ruff-format
Commit your changes
This once again triggers an update to the environment used for
running the checks:
All hooks that were added or modified hooks need to be updated in
the environment:
This will be slower the first time, but the environment persists
until modification:
Hook order in the configuration file matters – hooks run in
the order you specify and may make conflicting changes:
$ git add .pre-commit-config.yaml
$ git commit -m "Add ruff pre-commit hooks"
[INFO] Initializing environment for [...]/ruff-pre-commit.
[INFO] Installing environment for [...]/ruff-pre-commit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
check toml..........................(no files to check)Skipped
check yaml..............................................Passed
fix end of files........................................Passed
trim trailing whitespace................................Passed
ruff................................(no files to check)Skipped
ruff-format.........................(no files to check)Skipped
Testing the ruff
pre-commit hooks
Create a new file called example.py
with the following
contents, and try to commit it:
import re
def my_function(a):
"""My function."""
pass
The commit fails
No TOML or YAML files to check:
No issues with these checks:
ruff
modified a file, which is an automatic failure:
ruff
found an unused import when linting:
ruff
summarizes its findings (sometimes we have to
fix it):
ruff
also made a change while formatting the file:
check toml.........................(no files to check)Skipped
check yaml.........................(no files to check)Skipped
fix end of files.......................................Passed
trim trailing whitespace...............................Passed
ruff...................................................Failed
- hook id: ruff
- exit code: 1
- files were modified by this hook
Fixed 1 error:
- example.py:
1 × F401 (unused-import)
Found 1 error (1 fixed, 0 remaining).
ruff-format............................................Failed
- hook id: ruff-format
- files were modified by this hook
1 file reformatted
Verify the changes before trying again
$ git diff example.py
diff --git a/example.py b/example.py
index ebb29eb..ad4c89e 100644
--- a/example.py
+++ b/example.py
@@ -1,5 +1,3 @@
-import re
-
def my_function(a):
"""My function."""
pass
The commit succeeds now
$ git add example.py
$ git commit -m "Add example.py"
check toml..........................(no files to check)Skipped
check yaml..........................(no files to check)Skipped
fix end of files........................................Passed
trim trailing whitespace................................Passed
ruff....................................................Passed
ruff-format.............................................Passed
[example 79b0f28] Add example.py
1 file changed, 3 insertions(+)
create mode 100644 example.py
Exercise
Look at the setup information for the
numpydoc-validation
hook at
numpydoc.readthedocs.io/en/latest/validation.html, and add it to your configuration. Be sure to commit your changes.
Tip: You can run the following to check for any formatting
errors:
$ pre-commit validate-config .pre-commit-config.yaml
Example solution
Adding the numpydoc-validation
hook:
- repo: https://github.com/numpy/numpydoc
rev: v1.7.0
hooks:
- id: numpydoc-validation
Running hooks on demand
Run all hooks on staged changes:
Run all hooks on example.py
:
Run only the numpydoc-validation
hook on all files:
$ pre-commit run
$ pre-commit run --files example.py
$ pre-commit run numpydoc-validation --all-files
Validating docstrings
Run numpydoc-validation
hook on
example.py
:
example.py
fails the docstring checks:
Without any additional configuration, all numpydoc checks are run:
$ pre-commit run numpydoc-validation --files example.py
numpydoc-validation.....................................Failed
- hook id: numpydoc-validation
- exit code: 1
+---------------------+-------+------------------------------+
| item | check | description |
+=====================+=======+==============================+
| example | GL08 | The object missing docstring |
+---------------------+-------+------------------------------+
| example.my_function | ES01 | No extended summary found |
+---------------------+-------+------------------------------+
| example.my_function | PR01 | Param {'a'} not documented |
+---------------------+-------+------------------------------+
| example.my_function | SA01 | See Also section not found |
+---------------------+-------+------------------------------+
| example.my_function | EX01 | No examples section found |
+---------------------+-------+------------------------------+
Output abbreviated.
Modifying hook behavior
Two options:
- via command line arguments
- via a configuration file
Modifying hook behavior with command line arguments
For behavior that you only want to happen when the tool is run as a
pre-commit hook (we want the issues fixed automatically here, but
perhaps when we run ruff
on our own, we just want it to
report on what it would have changed):
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.10
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --show-fixes]
- id: ruff-format
Modifying hook behavior with a configuration file
For settings that should be applied for every invocation of the
tool:
# pyproject.toml (the tool must support this file)
[tool.ruff]
line-length = 88
[tool.ruff.format]
indent-style = "space"
quote-style = "single"
[tool.ruff.lint]
select = [
"B", # flake8-bugbear rules
"C", # mccabe rules
"E", # pycodestyle error rules
"F", # pyflakes rules
"I", # isort rules
"W", # pycodestyle warning rules
]
ignore = [
"C901", # max-complexity-10
"E501", # line-too-long
]
Example solution
First, specify which checks to report in the configuration file:
# pyproject.toml (this tool supports pyproject.toml)
[tool.numpydoc_validation]
checks = [
"all", # report on all checks
"ES01", # but don't require an extended summary
"EX01", # or examples
"SA01", # or a see also section
"SS06", # and don't require the summary to fit on one line
]
Then, update the docstrings in example.py
:
"""Utility functions."""
def my_function(a):
"""
My function.
Parameters
----------
a : int
The value to use.
"""
pass
Excluding files
The docstring checks are strict, but things like the test suite
won't end up in our documentation, so we are creating extra work.
Let's restrict when the docstring checks will run:
- repo: https://github.com/numpy/numpydoc
rev: v1.7.0
hooks:
- id: numpydoc-validation
+ exclude: (tests|docs)/.*
Tip: Inclusive filtering on the file name is also supported with
files
. File type filtering is also available. See the
pre-commit documentation
for all supported options.
Keeping hooks up-to-date
Run pre-commit autoupdate
to update all hooks to their
latest versions (the rev
key in the YAML):
$ pre-commit autoupdate
Updating [...]/pre-commit-hooks ... updating v4.5.0 -> v4.6.0.
Updating [...]/ruff-pre-commit ... updating v0.4.10 -> v0.5.1.
Updating [...]/numpy/numpydoc ... already up to date.
Warning: Make sure these new versions work with your setup by
running pre-commit run --all-files
.
Good to know
-
Pass
--no-verify
when committing to bypass the
checks.
-
Hooks can be configured to run at different Git stages
(e.g., pre-push).
-
Check out
pre-commit.com/hooks.html
for tips on finding hooks.
-
Use
pre-commit run --all-files
to run your hooks in
CI/CD workflows (or you can use
pre-commit.ci).
Creating a Pre-Commit Hook
Recipe
- Implement the check(s) in a function.
- Wrap the function in a CLI.
- Make it installable.
- Configure it as a hook.
1. Implement the check(s) in a function
from typing import Sequence
def perform_check(filenames: Sequence[str]) -> int:
"""
Given a sequence of filenames,
perform the check(s),
return 1 if there is a failure, 0 otherwise.
"""
failure = ? # your logic
return 1 if failure else 0
Toy example
Only allow committing one file at a time:
from typing import Sequence
def perform_check(filenames: Sequence[str]) -> int:
"""
Given a sequence of filenames,
check the number of files,
return 1 if there is a failure, 0 otherwise.
"""
failure = len(filenames) > 1
return 1 if failure else 0
Simplified example from exif-stripper
This hook removes metadata from images, and the check is implemented
to run on a single file at once.
from PIL import Image, UnidentifiedImageError
def process_image(filename: str) -> bool: # given a file
has_changed = False
try:
with Image.open(filename) as im:
if exif := im.getexif(): # run check
exif.clear() # edit file to fix the issue
im.save(filename)
has_changed = True
print(f'Stripped metadata from {filename}')
except (FileNotFoundError, UnidentifiedImageError):
pass # not an image
return has_changed # report result
Simplified to only strip EXIF data.
Source:
stefmolin/exif-stripper
What makes a helpful hook?
- Runs quickly
-
Tells you what is wrong and where the issue is (the file and,
potentially, line number)
-
Fixes the file for you, if possible, or, if not, guides you to the
fix
Exercise
Come up with your own check. It doesn't matter if something already
exists for your idea. If you can't think of anything, try creating a
check that enforces a file naming convention that you follow.
Write your code in the
src/your_pkg/your_module.py
file, renaming both the
directory and file as you see fit.
Example solution
Require filenames to be at least three characters (without the
extension):
from pathlib import Path
def validate_filename(filename: str, min_len: int = 3) -> int:
# extract the name so that `/my/repo/x.py` becomes `x`
name = Path(filename).stem
# check the length
if failure := len(name) < min_len:
print(f'Name too short ({min_len=}): {filename}')
# convert to an exit code for later
return int(failure)
Contents of
src/filename_validation/validate_filename.py
2. Wrap your function in a CLI
We will use argparse
for this example:
At a minimum, we need to accept filenames:
We can then parse the received arguments:
Run your check(s) on each item in args.filenames
:
Return 1 if there were any failures, 0 otherwise:
import argparse
from typing import Sequence
def main(argv: Sequence[str] | None = None) -> int:
parser = argparse.ArgumentParser(prog='your-hook')
parser.add_argument(
'filenames',
nargs='*',
help='Filenames to process.',
)
args = parser.parse_args(argv)
failures = ? # run your check(s) on `args.filenames`
return 1 if failures else 0 # must be 0 or 1 this time
Simplified example from exif-stripper
Create the ArgumentParser
:
Parse the received arguments:
Process each file, storing the outcomes (these are Boolean
values):
Aggregate the results, returning 1 if there were any failures, 0
otherwise:
import argparse
from typing import Sequence
def main(argv: Sequence[str] | None = None) -> int:
parser = argparse.ArgumentParser(prog='strip-exif')
parser.add_argument(
'filenames',
nargs='*',
help='Filenames to process.',
)
args = parser.parse_args(argv)
results = [
process_image(filename)
for filename in args.filenames
]
return int(any(results))
Source:
stefmolin/exif-stripper
Exercise
Create a CLI for the check function you created in the previous
exercise. Add an optional command line argument to modify how your
check behaves.
You can put this in the same file from the previous exercise or in
another file in the same directory (e.g.,
src/your_pkg/cli.py
).
Example solution
Here, we continue with the hook requiring filenames of a certain
length:
Most of this is identical to the example:
But, this time we add the minimum length as an optional argument:
When we run the checks on each file, we also pass it in:
import argparse
from typing import Sequence
from .validate_filename import validate_filename
def main(argv: Sequence[str] | None = None) -> int:
parser = argparse.ArgumentParser(
prog='validate-filename'
)
parser.add_argument(
'filenames',
nargs='*',
help='Filenames to process.',
)
parser.add_argument(
'--min-len',
default=3,
type=int,
help='Minimum length for a filename.',
)
args = parser.parse_args(argv)
results = [
validate_filename(filename, args.min_len)
for filename in args.filenames
]
return int(any(results))
Contents of src/filename_validation/cli.py
3. Make it installable
- It can be a package or standalone script.
- You must state all dependencies.
- Installation should create an executable.
Package example with exif-stripper
Let's look at the file structure for a Python package:
For this example, the repository name is
exif-stripper
:
Configuration file to have pre-commit
run hooks on
our code:
Configuration file for our new hook (for the
strip-exif
hook, here):
Important files for any codebase:
This package uses the src-layout:
Tests for the logic to make this easier to maintain:
The pyproject.toml
file is used to install this:
exif-stripper
├── .pre-commit-config.yaml
├── .pre-commit-hooks.yaml
├── LICENSE
├── README.md
├── pyproject.toml
├── src
│ └── exif_stripper
│ ├── __init__.py
│ └── cli.py
└── tests
└── test_cli.py
Based on
stefmolin/exif-stripper
pyproject.toml
This is a template pyproject.toml
file:
Name your distribution (likely the name of your repository):
Provide a version (can also be done
dynamically):
Provide information about your project:
Include all dependencies required to use your hook here:
Optional dependencies for development of the hook go here,
instead:
Creating a package is recommended for testing and reuse:
Create an executable for calling your hook's CLI:
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"
[project]
name = "your-pre-commit-hook"
version = "0.1.0"
authors = [{name = "Your Name", email = "email@example.com"},]
description = "TODO"
readme = "README.md"
license = {file = "LICENSE"}
classifiers = [
"Development Status :: 1 - Planning", # update later
"Programming Language :: Python"
]
urls.Homepage = "https://example.com"
urls.Documentation = "https://example.com/docs"
dependencies = ["your-hook-dependencies"]
requires-python = ">=3.10"
optional-dependencies.dev = [
"pre-commit", # so you can run hooks on the codebase
"pytest", # remember to add tests for your hook
]
# TODO: update `your-script-name` and the path
scripts.your-script-name = "your_pkg.your_module:hook_function"
# optional (if you are making a package)
[tool.setuptools.packages.find]
where = ["src"]
Based on
stefmolin/pre-commit-example
Example of specifying the entry point
When the exif_stripper
package is installed, an entry
point (executable) called, strip-exif
, is created
automatically, which when run calls the main()
function
in the exif_stripper.cli
module:
Exercise
Using the provided template, populate the
pyproject.toml
file for your hook from the previous
exercise. Be sure to create an entry point that will hit the CLI.
Confirm that you can pip install
it (note that you will
need to create any files referenced in the
pyproject.toml
, such as README.md
).
Example solution
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"
[project]
name = "filename-validation"
version = "0.1.0"
authors = [
{name = "Stefanie Molin", email = "email@example.com"},
]
description = "Validates that filenames meet criteria."
readme = "README.md"
license = {file = "LICENSE"}
classifiers = [
"Development Status :: 3 - Alpha",
"Programming Language :: Python"
]
dependencies = []
requires-python = ">=3.10"
optional-dependencies.dev = [
"pre-commit",
"pytest",
]
scripts.validate-filename = "filename_validation.cli:main"
[project.urls]
Homepage = "https://github.com/stefmolin/validate-filename"
Documentation = "https://stefaniemolin.com/validate-filename"
[tool.setuptools.packages.find]
where = ["src"]
4. Configure it as a hook
Configuration goes in
.pre-commit-hooks.yaml
as a list of hooks:
The hook ID (we can use this to run just this hook):
Display name for the hook when the checks are run:
(Optional) Description of what your hook does:
The entry point or executable to run, which can also include
arguments:
The language your hook is written in (so
pre-commit
can install it):
(Optional) Restrict this hook to only run on certain file types:
- id: strip-exif
name: strip-exif
description: This hook strips image metadata.
entry: strip-exif
language: python
types: [image]
Source:
stefmolin/exif-stripper
Exercise
Create a .pre-commit-hooks.yaml
file for configuring
your hook from the previous exercise. Note that, by default, hooks
will run in parallel, if you need to run in serial, specify
require_serial: true
. Be sure to consult
pre-commit.com/#creating-new-hooks
for a listing of the configuration options.
Tip: You can run the following to check for any formatting
errors:
$ pre-commit validate-manifest .pre-commit-hooks.yaml
Example solution
Continuing the example from previous exercises, we can use the
following as our .pre-commit-hooks.yaml
file for the
hook to validate the length of the filename:
- id: validate-filename
name: validate-filename
description: This hook checks filename length.
entry: validate-filename
language: python
Testing the hook configuration
-
Run
pre-commit try-repo
. (pre-commit.com/#developing-hooks-interactively)
-
Add your new hook to the
.pre-commit-config.yaml
file in another repository,
and set the rev
value to the most recent commit hash.
Then, you can run your hook with pre-commit run
.
Exercise
Test that your hook is properly configured.
Example solution
Start by creating a file that will fail the check:
Run pre-commit try-repo
passing in the file:
Now, pre-commit
will generate a configuration for us:
Think of this as the .pre-commit-config.yaml
file
that will be used:
The repo
points to the path we passed to
pre-commit try-repo
:
The rev
is set to the most recent commit hash:
The hooks are pulled out of .pre-commit-hooks.yaml
:
Just like before, pre-commit
will need to set up the
environment:
As expected, x.py
fails the
validate-filename
check:
$ touch x.py
$ pre-commit try-repo . --files x.py
[INFO] Initializing environment for ..
=============================================================
Using config:
=============================================================
repos:
- repo: .
rev: e11041f74c8a0f074f0633138506dce2efe9c5e7
hooks:
- id: validate-filename
=============================================================
[INFO] Installing environment for ..
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
validate-filename......................................Failed
- hook id: validate-filename
- exit code: 1
Name too short (min_len=3): x.py
Maintaining your hook
There are several points of failure when building and maintaining a
hook, so it is best to set up testing in layers:
-
Build a test suite that exercises the underlying logic and the
CLI.
-
Run your test suite on different operating systems and Python
versions.
-
Use
pre-commit
to run your hooks in your CI/CD
workflows.
Tip: See the references slide at the
end for some examples.
Document your hook
Make it easy for people to get started with your hook by including a
setup snippet in your README
. If you have command line
arguments and/or support configuration files, mention some common
configurations. Note that you should create tags so people don't
have to use commit hashes.
Sharing your hook with the world
-
Once your repository is public on GitHub (or similar), your hook
is available for use.
-
Help users find your hook by talking about it: conference talks,
blog posts, tweets, etc.
-
Consider publishing it to PyPI if it is useful beyond just being a
hook.
Thank you!
I hope you enjoyed the workshop. You can follow my work on these
platforms: