Shape Creation#

This tutorial walks you through the process of creating a new shape for use as a target in the morphing process.


Select the appropriate base class#

All Data Morph shapes are defined as classes inside the shapes subpackage. Data Morph uses a hierarchy of shapes that all descend from an abstract base class (Shape), which defines the basics of how a shape needs to behave (i.e., it must have a distance() method and a plot() method).

UML diagram showing the hierarchy of the shape classes.

Any new shape must inherit from Shape or one of its child classes:

  • If your shape is composed of lines, inherit from LineCollection (e.g., Star).

  • If your shape is composed of points, inherit from PointCollection (e.g., Heart).

  • If your shape isn’t composed of lines or points you can inherit directly from Shape (e.g., Circle). Note that, in this case, you must define both the distance() and plot() methods (this is done for you if you inherit from LineCollection or PointCollection).

Define the scale and placement of the shape based on the dataset#

Each shape will be initialized with a Dataset instance. Use the dataset to determine where in the xy-plane the shape should be placed and also to scale it to the data. If you take a look at the code for the existing shapes, you will see that they use various bits of information from the dataset, such as the automatically-calculated bounds (e.g., Dataset.data_bounds, which form the bounding box of the starting data, and Dataset.morph_bounds, which define the limits of where the algorithm can move the points) or percentiles using the data itself (see Dataset.data). For example, the XLines shape inherits from LineCollection and uses the morph bounds (Dataset.morph_bounds) to calculate its position and scale:

from data_morph.data.dataset import Dataset
from data_morph.shapes.bases.line_collection import LineCollection

class XLines(LineCollection):

  def __init__(self, dataset: Dataset) -> None:
     (xmin, xmax), (ymin, ymax) = dataset.morph_bounds

     super().__init__([[xmin, ymin], [xmax, ymax]], [[xmin, ymax], [xmax, ymin]])

Since we inherit from LineCollection here, we don’t need to define the distance() and plot() methods (unless we want to override them).

Test out the shape#

Defining how your shape should be generated from the input dataset will require a few iterations. Be sure to test out your shape on different datasets:

from data_morph.data.loader import DataLoader
from data_morph.morpher import DataMorpher

dataset = DataLoader.load_dataset('panda')
target_shape = YourShape(dataset) # TODO replace with your class

morpher = DataMorpher(
    decimals=2,
    in_notebook=False,  # whether you are running in a Jupyter Notebook
    output_dir='data_morph/output',  # where you want the output to go
)

result = morpher.morph(start_shape=dataset, target_shape=target_shape)

Some shapes will work better on certain datasets, and that’s fine. However, if your shape only works well on one of the built-in datasets (see the DataLoader), then you need to keep tweaking your implementation.

(Optional) Contribute the shape#

If you think that your shape would be a good addition to Data Morph, create an issue in the Data Morph repository proposing its inclusion. Be sure to consult the contributing guidelines before doing so.

If and only if you are given the go ahead, work through this section to contribute your shape.

1. Create a new module for your shape#

Note

If you haven’t already, fork and clone the Data Morph repository and follow the instructions in the contributing guidelines to install Data Morph in editable mode and configure pre-commit.

Save your shape in src/data_morph/shapes/<base>/<your_shape>.py. In the case of the example in this tutorial (XLines), it inherits from LineCollection, and its module is called x_lines, so the file is src/data_morph/shapes/lines/x_lines.py.

Add type annotations and prepare a docstring for your shape following what the other shapes have. Be sure to change the plotting code in the docstring (in the .. plot:: block) to use your shape. Here’s how the x_lines module looks in the package:

"""X lines shape."""

from ...data.dataset import Dataset
from ..bases.line_collection import LineCollection


class XLines(LineCollection):
    """
    Class for the X shape consisting of two crossing, perpendicular lines.

    .. plot::
       :scale: 75
       :caption:
            This shape is generated using the panda dataset.

        from data_morph.data.loader import DataLoader
        from data_morph.shapes.lines import XLines

        _ = XLines(DataLoader.load_dataset('panda')).plot()

    Parameters
    ----------
    dataset : Dataset
        The starting dataset to morph into other shapes.
    """

    name = 'x'

    def __init__(self, dataset: Dataset) -> None:
        (xmin, xmax), (ymin, ymax) = dataset.morph_bounds

        super().__init__([[xmin, ymin], [xmax, ymax]], [[xmin, ymax], [xmax, ymin]])

Notice that we set the name attribute here since the default will result in a value of xlines and x makes more sense for use in the documentation (see ShapeFactory). Check out some of the other modules inheriting from the same base as your shape to make sure you are following the project’s conventions, such as using relative imports within the package.

Note

If your shape inherits from PointCollection, try to create your shape with as few points as possible because each additional point requires another calculation per iteration of the morphing algorithm. Take a look at how many points existing shapes in the points module use as a guideline.

At this point, your shape should pass all the pre-commit checks. If you haven’t set up your development environment for Data Morph or aren’t sure how to run these checks, please consult the contributing guidelines.

2. Register the shape#

For the Data Morph CLI to find your shape, you need to register it with the ShapeFactory:

  1. Add your shape to __all__ in the __init__.py closest to the module you created in the previous step (e.g., use src/data_morph/shapes/lines/__init__.py for a new shape inheriting from LineCollection).

  2. Add an entry to the ShapeFactory._SHAPE_CLASSES tuple in src/data_morph/shapes/factory.py, preserving alphabetical order.

3. Create test cases for the shape#

Data Morph uses pytest for the test suite, and all tests are located in the tests/ directory, with a folder structure that mirrors the actual package. The test cases for your shape will go in tests/shapes/<base>/test_<your_shape>.py. In the case of the example in this tutorial (XLines), it inherits from LineCollection, and its module is called x_lines, so the test file is tests/shapes/lines/test_x_lines.py.

There are test bases for each type of shape in tests/shapes/<base>/bases.py, which handle most of the logic for running the tests. For shapes inheriting from LineCollection, this base is LinesModuleTestBase, which can be used as follows:

"""Test the x_lines module."""

import numpy as np
import pytest

from .bases import LinesModuleTestBase

pytestmark = [pytest.mark.shapes, pytest.mark.lines]


class TestXLines(LinesModuleTestBase):
    """Test the XLines class."""

    shape_name = 'x'
    distance_test_cases = (
        ((8, 83), 0),  # edge of X line
        ((20, 65), 0),  # middle of X (intersection point)
        ((19, 64), 0.277350),  # off the X
        ((10, 20), 27.073973),  # off the X
    )
    expected_line_count = 2
    expected_slopes = (-1.5, 1.5)

    def test_lines_form_an_x(self, shape):
        """Test that the lines form an X."""
        lines = np.array(shape.lines)

        # check perpendicular
        xs, ys = lines.T
        runs = np.diff(xs, axis=0)
        rises = np.diff(ys, axis=0)
        assert np.dot(rises, runs.T) == 0

        # check that the lines intersect in the middle
        midpoints = np.mean(lines.T, axis=1)[0].T
        assert np.unique(midpoints).size == 1

Note that the class variables provide the test cases for LinesModuleTestBase to use. To get distance_test_cases, which is a tuple of test cases of the form ((x, y), expected_distance), for example, you will need to come up with a few points that have distance zero to the shape, and a few points that have a non-zero distance. You can come up with these by using the instantiated shape’s distance() method, or by inspecting the instantiated shape’s attributes like PointCollection.points on shapes inheriting from PointCollection.

Note

The XLines shape also defines its own test case to make sure that the lines form an X. It’s only necessary to add additional test methods like this to test aspects not covered by the base class.

You should now be able to run the test suite with pytest. Make sure your test cases pass before moving on. If you haven’t set up your development environment for Data Morph or aren’t sure how to run these checks, please consult the contributing guidelines.

4. Confirm that your shape works via the CLI#

Run the following on the command line replacing <your shape> with the value you set for the name attribute of your shape class to generate three animations:

$ data-morph --start-shape panda music soccer --target-shape <your shape> --workers 3

Review the animations. Remember, some shapes will work better on certain datasets, and that’s fine. However, if your shape only works well on one of the built-in datasets (see the DataLoader), then you need to keep tweaking your implementation.

Tip

If you decide to run with multiple datasets, you can set --workers 0 to run as many transformations in parallel as possible on your computer. In the above example, we only have three transformations, so --workers 3 will run all three in parallel, assuming your machine has at least three CPU cores.

5. Submit your pull request#

If your shape works well on different datasets and your code passes all the checks and tests cases, you are ready to make a pull request. If you aren’t sure how to do this, please consult the contributing guidelines.