License: CC BY-NC-SA 4.0

(Pre-)Commit to Better Code



Stefanie Molin

Bio

Prerequisites

  • Comfort writing Python code and working with Git on the command line using basic commands (e.g., clone, status, diff, add, commit, and push)
  • Have Python and Git installed on your computer, as well as a text editor for writing code (e.g., Visual Studio Code)
  • Fork and clone this repository: github.com/stefmolin/pre-commit-workshop

Agenda

  1. Setting Up Pre-Commit Hooks
  2. Creating a Pre-Commit Hook

Setting Up Pre-Commit Hooks

Overview of Git hooks

  • Scripts triggered when taking certain actions on a repository (e.g., committing changes)
  • Can be client-side or server-side
  • Stored in the .git/hooks/ directory of your repository, but excluded from version control
              
                .git
                └── hooks
                    ├── commit-msg.sample
                    ├── pre-commit.sample
                    ├── pre-merge-commit.sample
                    ├── pre-push.sample
                    └── [...]
              
            

Overview of pre-commit hooks

  • One pre-commit hook executable per repository
  • Triggered by Git when you run git commit
  • Often used to enforce coding standards (e.g., linting and formatting)
  • Should only include checks that run quickly
  • The commit fails if any of the checks fail

How to enable pre-commit hooks

  1. Create an executable script that runs any checks you want to include and save it as .git/hooks/pre-commit,
  2. Or, use a tool like pre-commit and include the hook configuration in version control.

Pre-commit hooks != pre-commit

  • pre-commit is "a multi-language package manager for pre-commit hooks" (source).
  • pre-commit installs its own script as the pre-commit hook executable for your repository.
  • The installed pre-commit hook then runs pre-commit, which in turn runs any checks you specify using a YAML file.

Setup

  1. Install the pre-commit package in your repository.
  2. Define the checks you want to use in a file called .pre-commit-config.yaml.
  3. Have pre-commit install itself as your pre-commit hook.
  4. Work on your code as usual.

Package installation

            
              $ python3 -m pip install pre-commit
            
          

Basic configuration

Defined in .pre-commit-config.yaml at the repository root:

The repos section lists the repositories to use:

This represents the configuration for a single repository:

The URL of the repository to clone:

The tag or commit hash to clone at:

The hooks to run from that repository:

              
                repos:
                  - repo: https://github.com/pre-commit/pre-commit-hooks
                    rev: v4.5.0
                    hooks:
                      - id: check-toml
                      - id: check-yaml
                      - id: end-of-file-fixer
                      - id: trailing-whitespace
              
            
Source: How to Set Up Pre-Commit Hooks by Stefanie Molin

Git hook installation

While the .pre-commit-config.yaml file lives in version control, each contributor must run this locally:

            
              $ pre-commit install
              pre-commit installed at .git/hooks/pre-commit
            
          

Add the configuration to version control

The check-yaml hook we included will make sure that the .pre-commit-config.yaml file (and any other YAML files in this repository) is valid YAML. Try committing it:

            
              $ git add .pre-commit-config.yaml
              $ git commit -m "Add pre-commit config"
              [INFO] Initializing environment for [...]/pre-commit-hooks.
              [INFO] Installing environment for [...]/pre-commit-hooks.
              [INFO] Once installed this environment will be reused.
              [INFO] This may take a few minutes...
              check toml..........................(no files to check)Skipped
              check yaml..............................................Passed
              fix end of files........................................Passed
              trim trailing whitespace................................Passed
            
          

Exercise

There are more hooks provided by the repository that we are currently using. Take a moment to look through their documentation, and add some additional hooks to the setup that are relevant to your work. Be sure to commit your changes.

github.com/pre-commit/pre-commit-hooks

Example solution

Adding these two hooks ensures that your scripts are actually executable:

            
               repos:
                 - repo: https://github.com/pre-commit/pre-commit-hooks
                   rev: v4.5.0
                   hooks:
              +      - id: check-executables-have-shebangs
              +      - id: check-shebang-scripts-are-executable
                     - id: check-toml
                     - id: check-yaml
                     - id: end-of-file-fixer
                     - id: trailing-whitespace
            
          

Third-party hooks

We are free to use hooks from multiple repositories. A non-exhaustive list of popular hooks along with tips for searching for compatible repositories can be found at pre-commit.com/hooks.html.

Let's add ruff to lint and format our Python code, since it is "10-100x faster than existing linters (like Flake8) and formatters (like Black),"* and speed very is important with pre-commit hooks.


* Source: ruff homepage as of June 16, 2024

Adding ruff to .pre-commit-config.yaml

Create a new entry under the repos key:

The ruff pre-commit hook is in a separate repository:

The tag or commit hash to clone at:

The hooks to run from that repository:

Use args to pass command line arguments to the hook:

            
              repos:
                - repo: https://github.com/pre-commit/pre-commit-hooks
                  rev: v4.5.0
                  hooks:
                    - id: check-toml
                    - id: check-yaml
                    - id: end-of-file-fixer
                    - id: trailing-whitespace

                - repo: https://github.com/astral-sh/ruff-pre-commit
                  rev: v0.4.10
                  hooks:
                    - id: ruff
                      args: [--fix, --exit-non-zero-on-fix, --show-fixes]
                    - id: ruff-format
            
          

Commit your changes

This once again triggers an update to the environment used for running the checks:

All hooks that were added or modified hooks need to be updated in the environment:

This will be slower the first time, but the environment persists until modification:

Hook order in the configuration file matters – hooks run in the order you specify and may make conflicting changes:

            
              $ git add .pre-commit-config.yaml
              $ git commit -m "Add ruff pre-commit hooks"
              [INFO] Initializing environment for [...]/ruff-pre-commit.
              [INFO] Installing environment for [...]/ruff-pre-commit.
              [INFO] Once installed this environment will be reused.
              [INFO] This may take a few minutes...
              check toml..........................(no files to check)Skipped
              check yaml..............................................Passed
              fix end of files........................................Passed
              trim trailing whitespace................................Passed
              ruff................................(no files to check)Skipped
              ruff-format.........................(no files to check)Skipped
            
          

Testing the ruff pre-commit hooks

Create a new file called example.py with the following contents, and try to commit it:

            
              import re

              def my_function(a):
                  """My function."""
                  pass
            
          

The commit fails

No TOML or YAML files to check:

No issues with these checks:

ruff modified a file, which is an automatic failure:

ruff found an unused import when linting:

ruff summarizes its findings (sometimes we have to fix it):

ruff also made a change while formatting the file:

            
              check toml..........................(no files to check)Skipped
              check yaml..........................(no files to check)Skipped
              fix end of files........................................Passed
              trim trailing whitespace................................Passed
              ruff....................................................Failed
              - hook id: ruff
              - exit code: 1
              - files were modified by this hook

              Fixed 1 error:
              - example.py:
                  1 × F401 (unused-import)

              Found 1 error (1 fixed, 0 remaining).

              ruff-format.............................................Failed
              - hook id: ruff-format
              - files were modified by this hook

              1 file reformatted
            
          

Verify the changes before trying again

              
                $ git diff example.py
                diff --git a/example.py b/example.py
                index ebb29eb..ad4c89e 100644
                --- a/example.py
                +++ b/example.py
                @@ -1,5 +1,3 @@
                -import re
                -
                def my_function(a):
                    """My function."""
                    pass
              
            

The commit succeeds now

            
              $ git add example.py
              $ git commit -m "Add example.py"
              check toml..........................(no files to check)Skipped
              check yaml..........................(no files to check)Skipped
              fix end of files........................................Passed
              trim trailing whitespace................................Passed
              ruff....................................................Passed
              ruff-format.............................................Passed
              [example 79b0f28] Add example.py
               1 file changed, 3 insertions(+)
               create mode 100644 example.py
            
          

Exercise

Look at the setup information for the numpydoc-validation hook at numpydoc.readthedocs.io/en/latest/validation.html, and add it to your configuration. Be sure to commit your changes.

Tip: You can run the following to check for any formatting errors:

            
              $ pre-commit validate-config .pre-commit-config.yaml
            
          

Example solution

Adding the numpydoc-validation hook:

            
              - repo: https://github.com/numpy/numpydoc
                rev: v1.7.0
                hooks:
                  - id: numpydoc-validation
            
          

Running hooks on demand

Run all hooks on staged changes:

Run all hooks on example.py:

Run only the numpydoc-validation hook on all files:

            
              $ pre-commit run
              $ pre-commit run --files example.py
              $ pre-commit run numpydoc-validation --all-files
            
          

Validating docstrings

Run numpydoc-validation hook on example.py:

example.py fails the docstring checks:

Without any additional configuration, all numpydoc checks are run:

            
              $ pre-commit run numpydoc-validation --files example.py
              numpydoc-validation.....................................Failed
              - hook id: numpydoc-validation
              - exit code: 1

              +---------------------+-------+------------------------------+
              | item                | check | description                  |
              +=====================+=======+==============================+
              | example             | GL08  | The object missing docstring |
              +---------------------+-------+------------------------------+
              | example.my_function | ES01  | No extended summary found    |
              +---------------------+-------+------------------------------+
              | example.my_function | PR01  | Param {'a'} not documented   |
              +---------------------+-------+------------------------------+
              | example.my_function | SA01  | See Also section not found   |
              +---------------------+-------+------------------------------+
              | example.my_function | EX01  | No examples section found    |
              +---------------------+-------+------------------------------+

            
          
Output abbreviated.

Modifying hook behavior

Two options:

  • via command line arguments
  • via a configuration file

Modifying hook behavior with command line arguments

For behavior that you only want to happen when the tool is run as a pre-commit hook (we want the issues fixed automatically here, but perhaps when we run ruff on our own, we just want it to report on what it would have changed):

            
              - repo: https://github.com/astral-sh/ruff-pre-commit
                rev: v0.4.10
                hooks:
                  - id: ruff
                    args: [--fix, --exit-non-zero-on-fix, --show-fixes]
                  - id: ruff-format
            
          

Modifying hook behavior with a configuration file

For settings that should be applied for every invocation of the tool:

            
              # pyproject.toml (the tool must support this file)
              [tool.ruff]
              line-length = 88

              [tool.ruff.format]
              indent-style = "space"
              quote-style = "single"

              [tool.ruff.lint]
              select = [
                  "B", # flake8-bugbear rules
                  "C", # mccabe rules
                  "E", # pycodestyle error rules
                  "F", # pyflakes rules
                  "I", # isort rules
                  "W", # pycodestyle warning rules
              ]
              ignore = [
                  "C901", # max-complexity-10
                  "E501", # line-too-long
              ]
            
          

Exercise

Configure the numpydoc-validation hook to ignore the checks ES01, EX01, SA01, and SS06. Then, address any issues in example.py so that it passes the checks.

numpydoc.readthedocs.io/en/latest/validation.html

Example solution

First, specify which checks to report in the configuration file:

            
              # pyproject.toml (this tool supports pyproject.toml)
              [tool.numpydoc_validation]
              checks = [
                  "all",  # report on all checks
                  "ES01", # but don't require an extended summary
                  "EX01", # or examples
                  "SA01", # or a see also section
                  "SS06", # and don't require the summary to fit on one line
              ]
            
          

Then, update the docstrings in example.py:

            
              """Utility functions."""


              def my_function(a):
                  """
                  My function.

                  Parameters
                  ----------
                  a : int
                      The value to use.
                  """
                  pass
            
          

Excluding files

The docstring checks are strict, but things like the test suite won't end up in our documentation, so we are creating extra work. Let's restrict when the docstring checks will run:

            
              - repo: https://github.com/numpy/numpydoc
                rev: v1.7.0
                hooks:
                  - id: numpydoc-validation
            +       exclude: (tests|docs)/.*
            
          

Tip: Inclusive filtering on the file name is also supported with files. File type filtering is also available. See the pre-commit documentation for all supported options.

Keeping hooks up-to-date

Run pre-commit autoupdate to update all hooks to their latest versions (the rev key in the YAML):

            
              $ pre-commit autoupdate
              Updating [...]/pre-commit-hooks ... updating v4.5.0 -> v4.6.0.
              Updating [...]/ruff-pre-commit ... updating v0.4.10 -> v0.5.1.
              Updating [...]/numpy/numpydoc ... already up to date.
            
          

Warning: Make sure these new versions work with your setup by running pre-commit run --all-files.

Good to know

  • Pass --no-verify when committing to bypass the checks.
  • Hooks can be configured to run at different Git stages (e.g., pre-push).
  • Check out pre-commit.com/hooks.html for tips on finding hooks.
  • Use pre-commit run --all-files to run your hooks in CI/CD workflows (or you can use pre-commit.ci).

Creating a Pre-Commit Hook

Recipe

  1. Implement the check(s) in a function.
  2. Wrap the function in a CLI.
  3. Make it installable.
  4. Configure it as a hook.

1. Implement the check(s) in a function

            
              from typing import Sequence

              def perform_check(filenames: Sequence[str]) -> int:
                  """
                  Given a sequence of filenames,
                  perform the check(s),
                  return 1 if there is a failure, 0 otherwise.
                  """
                  failure = ?  # your logic

                  return 1 if failure else 0
            
          

Toy example

Only allow committing one file at a time:

            
              from typing import Sequence

              def perform_check(filenames: Sequence[str]) -> int:
                  """
                  Given a sequence of filenames,
                  check the number of files,
                  return 1 if there is a failure, 0 otherwise.
                  """
                  failure = len(filenames) > 1

                  return 1 if failure else 0
            
          

Simplified example from exif-stripper

This hook removes metadata from images, and the check is implemented to run on a single file at once.

            
              from PIL import Image, UnidentifiedImageError

              def process_image(filename: str) -> bool:  # given a file
                  has_changed = False
                  try:
                      with Image.open(filename) as im:
                          if exif := im.getexif():  # run check
                              exif.clear()  # edit file to fix the issue
                              im.save(filename)
                              has_changed = True
                              print(f'Stripped metadata from {filename}')
                  except (FileNotFoundError, UnidentifiedImageError):
                      pass  # not an image

                  return has_changed  # report result
            
          
Simplified to only strip EXIF data. Source: stefmolin/exif-stripper

What makes a helpful hook?

  1. Runs quickly
  2. Tells you what is wrong and where the issue is (the file and, potentially, line number)
  3. Fixes the file for you, if possible, or, if not, guides you to the fix

Exercise

Come up with your own check. It doesn't matter if something already exists for your idea. If you can't think of anything, try creating a check that enforces a file naming convention that you follow.

Write your code in the src/your_pkg/your_module.py file, renaming both the directory and file as you see fit.

Example solution

Require filenames to be at least three characters (without the extension):

          
            from pathlib import Path

            def validate_filename(filename: str, min_len: int = 3) -> int:
                # extract the name so that `/my/repo/x.py` becomes `x`
                name = Path(filename).stem

                # check the length
                if failure := len(name) < min_len:
                    print(f'Name too short ({min_len=}): {filename}')

                # convert to an exit code for later
                return int(failure)
          
        

2. Wrap your function in a CLI

We will use argparse for this example:

At a minimum, we need to accept filenames:

We can then parse the received arguments:

Run your check(s) on each item in args.filenames:

Return 1 if there were any failures, 0 otherwise:

            
              import argparse
              from typing import Sequence

              def main(argv: Sequence[str] | None = None) -> int:
                  parser = argparse.ArgumentParser(prog='your-hook')
                  parser.add_argument(
                      'filenames',
                      nargs='*',
                      help='Filenames to process.',
                  )

                  args = parser.parse_args(argv)

                  failures = ?  # run your check(s) on `args.filenames`
                  return 1 if failures else 0  # must be 0 or 1 this time
            
          

Simplified example from exif-stripper

Create the ArgumentParser:

Parse the received arguments:

Process each file, storing the outcomes (these are Boolean values):

Aggregate the results, returning 1 if there were any failures, 0 otherwise:

            
              import argparse
              from typing import Sequence

              def main(argv: Sequence[str] | None = None) -> int:
                  parser = argparse.ArgumentParser(prog='strip-exif')
                  parser.add_argument(
                      'filenames',
                      nargs='*',
                      help='Filenames to process.',
                  )

                  args = parser.parse_args(argv)

                  results = [
                      process_image(filename)
                      for filename in args.filenames
                  ]
                  return int(any(results))
            
          
Source: stefmolin/exif-stripper

Exercise

Create a CLI for the check function you created in the previous exercise. Add an optional command line argument to modify how your check behaves.

You can put this in the same file from the previous exercise or in another file in the same directory (e.g., src/your_pkg/cli.py).

Example solution

Here, we continue with the hook requiring filenames of a certain length:

Most of this is identical to the example:

But, this time we add the minimum length as an optional argument:

When we run the checks on each file, we also pass it in:

          
            import argparse
            from typing import Sequence

            def main(argv: Sequence[str] | None = None) -> int:
                parser = argparse.ArgumentParser(
                    prog='validate-filename'
                )
                parser.add_argument(
                    'filenames',
                    nargs='*',
                    help='Filenames to process.',
                )
                parser.add_argument(
                    '--min-len',
                    default=3,
                    type=int,
                    help='Minimum length for a filename.',
                )

                args = parser.parse_args(argv)

                results = [
                    validate_filename(filename, args.min_len)
                    for filename in args.filenames
                ]
                return int(any(results))
          
        

3. Make it installable

  • It can be a package or standalone script.
  • You must state all dependencies.
  • Installation should create an executable.

Package example with exif-stripper

Let's look at the file structure for a Python package:

For this example, the repository name is exif-stripper:

Configuration file to have pre-commit run hooks on our code:

Configuration file for our new hook (for the strip-exif hook, here):

Important files for any codebase:

This package uses the src-layout:

Tests for the logic to make this easier to maintain:

The pyproject.toml file is used to install this:

            
              exif-stripper
              ├── .pre-commit-config.yaml
              ├── .pre-commit-hooks.yaml
              ├── LICENSE
              ├── README.md
              ├── pyproject.toml
              ├── src
              │   └── exif_stripper
              │       ├── __init__.py
              │       └── cli.py
              └── tests
                  └── test_cli.py
            
          
Based on stefmolin/exif-stripper

pyproject.toml

This is a template pyproject.toml file:

Name your distribution (likely the name of your repository):

Provide a version (can also be done dynamically):

Provide information about your project:

Include all dependencies required to use your hook here:

Optional dependencies for development of the hook go here, instead:

Creating a package is recommended for testing and reuse:

Create an executable for calling your hook's CLI:

            
              [build-system]
              requires = ["setuptools", "setuptools-scm"]
              build-backend = "setuptools.build_meta"

              [project]
              name = "your-pre-commit-hook"
              version = "0.1.0"
              authors = [{name = "Your Name", email = "email@example.com"},]
              description = "TODO"
              readme = "README.md"
              license = {file = "LICENSE"}
              classifiers = [
                  "Development Status :: 1 - Planning",  # update later
                  "Programming Language :: Python"
              ]
              urls.Homepage = "https://example.com"
              urls.Documentation = "https://example.com/docs"

              dependencies = ["your-hook-dependencies"]
              requires-python = ">=3.8"
              optional-dependencies.dev = [
                  "pre-commit",  # so you can run hooks on the codebase
                  "pytest",  # remember to add tests for your hook
              ]

              # TODO: update `your-script-name` and the path
              scripts.your-script-name = "your_pkg.your_module:hook_function"

              # optional (if you are making a package)
              [tool.setuptools.packages.find]
              where = ["src"]
            
          
Based on stefmolin/pre-commit-example

Example of specifying the entry point

When the exif_stripper package is installed, an entry point (executable) called, strip-exif, is created automatically, which when run calls the main() function in the exif_stripper.cli module:

            
              scripts.strip-exif = "exif_stripper.cli:main"
            
          
Source: stefmolin/exif-stripper

Exercise

Using the provided template, populate the pyproject.toml file for your hook from the previous exercise. Be sure to create an entry point that will hit the CLI.

Confirm that you can pip install it (note that you will need to create any files referenced in the pyproject.toml, such as README.md).

Example solution

          
            [build-system]
            requires = ["setuptools", "setuptools-scm"]
            build-backend = "setuptools.build_meta"

            [project]
            name = "filename-validation"
            version = "0.1.0"
            authors = [
              {name = "Stefanie Molin", email = "email@example.com"},
            ]
            description = "Validates that filenames meet criteria."
            readme = "README.md"
            license = {file = "LICENSE"}
            classifiers = [
                "Development Status :: 3 - Alpha",
                "Programming Language :: Python"
            ]

            dependencies = []
            requires-python = ">=3.8"
            optional-dependencies.dev = [
                "pre-commit",
                "pytest",
            ]

            scripts.validate-filename = "filename_validation.cli:main"

            [project.urls]
            Homepage = "https://github.com/stefmolin/validate-filename"
            Documentation = "https://stefaniemolin.com/validate-filename"

            [tool.setuptools.packages.find]
            where = ["src"]
          
        

4. Configure it as a hook

Configuration goes in .pre-commit-hooks.yaml as a list of hooks:

The hook ID (we can use this to run just this hook):

Display name for the hook when the checks are run:

(Optional) Description of what your hook does:

The entry point or executable to run, which can also include arguments:

The language your hook is written in (so pre-commit can install it):

(Optional) Restrict this hook to only run on certain file types:

            
              - id: strip-exif
                name: strip-exif
                description: This hook strips image metadata.
                entry: strip-exif
                language: python
                types: [image]
            
          
Source: stefmolin/exif-stripper

Exercise

Create a .pre-commit-hooks.yaml file for configuring your hook from the previous exercise. Note that, by default, hooks will run in parallel, if you need to run in serial, specify require_serial: true. Be sure to consult pre-commit.com/#creating-new-hooks for a listing of the configuration options.

Tip: You can run the following to check for any formatting errors:

            
              $ pre-commit validate-manifest .pre-commit-hooks.yaml
            
          

Example solution

Continuing the example from previous exercises, we can use the following as our .pre-commit-hooks.yaml file for the hook to validate the length of the filename:

            
              - id: validate-filename
                name: validate-filename
                description: This hook checks filename length.
                entry: validate-filename
                language: python
            
        

Testing the hook configuration

The best way to test that your hook is working correctly is from another repository:

  • Run pre-commit try-repo from another repository without committing. (pre-commit.com/#developing-hooks-interactively)
  • Add your new hook to the .pre-commit-config.yaml file in another repository, and set the rev value to the most recent commit hash. Then, you can run your hook with pre-commit run.

Exercise

Test that your hook is properly configured.

Maintaining your hook

There are several points of failure when building and maintaining a hook, so it is best to set up testing in layers:

  • Build a test suite that exercises the underlying logic and the CLI.
  • Run your test suite on different operating systems and Python versions.
  • Use pre-commit to run your hooks in your CI/CD workflows.

Tip: See the references slide at the end for some examples.

Document your hook

Make it easy for people to get started with your hook by including a setup snippet in your README. If you have command line arguments and/or support configuration files, mention some common configurations.

Sharing your hook with the world

  • Once your repository is public on GitHub (or similar), your hook is available for use.
  • Help users find your hook by talking about it: conference talks, blog posts, tweets, etc.
  • Consider publishing it to PyPI if it is useful beyond just being a hook.

Good to know

References

Thank you!

I hope you enjoyed the workshop. You can follow my work on the following platforms: