Write a task#

Starting from the project structure in the previous tutorial, you will learn how to write your first task.

pytask will look for tasks in modules whose name starts with task_. Tasks are functions in these modules whose name also begins with task_.

Our first task, task_create_random_data, will be defined in src/my_project/task_data_preparation.py, and it will generate artificial data stored in bld/data.pkl.

my_project
├───pyproject.toml
│
├───src
│   └───my_project
│       ├────config.py
│       └────task_data_preparation.py
│
├───setup.py
│
├───.pytask.sqlite3
│
└───bld
    └────data.pkl

Here, we define the function.

# Content of task_data_preparation.py.

import pytask
import numpy as np
import pandas as pd

from my_project.config import BLD


@pytask.mark.produces(BLD / "data.pkl")
def task_create_random_data(produces):
    rng = np.random.default_rng(0)
    beta = 2

    x = rng.normal(loc=5, scale=10, size=1_000)
    epsilon = rng.standard_normal(1_000)

    y = beta * x + epsilon

    df = pd.DataFrame({"x": x, "y": y})
    df.to_pickle(produces)

To let pytask track the product of the task, you need to use the @pytask.mark.produces decorator.

Now, execute pytask to collect tasks in the current and subsequent directories.

$ pytask
──────────────────────────── Start pytask session ────────────────────────────
Platform: win32 -- Python <span style="color: var(--termynal-blue)">3.10.0</span>, pytask <span style="color: var(--termynal-blue)">0.3.0</span>, pluggy <span style="color: var(--termynal-blue)">1.0.0</span>
Root: C:\Users\pytask-dev\git\my_project
Collected <span style="color: var(--termynal-blue)">1</span> task.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Task                                              ┃ Outcome ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ <span class="termynal-dim">task_data_preparation.py::</span>task_create_random_data │ <span class="termynal-success">.</span>       │
└───────────────────────────────────────────────────┴─────────┘

<span class="termynal-dim">──────────────────────────────────────────────────────────────────────────────</span>
<span class="termynal-success">╭───────────</span> <span style="font-weight: bold;">Summary</span> <span class="termynal-success">────────────╮</span>
<span class="termynal-success">│</span> <span style="font-weight: bold;"> 1  Collected tasks </span>           <span class="termynal-success">│</span>
<span class="termynal-success">│</span> <span class="termynal-success-textonly"> 1  Succeeded        (100.0%) </span> <span class="termynal-success">│</span>
<span class="termynal-success">╰────────────────────────────────╯</span>
<span class="termynal-success">───────────────────────── Succeeded in 0.06 seconds ──────────────────────────</span>

Customize task names#

Use the @pytask.mark.task decorator to mark a function as a task regardless of its function name. You can optionally pass a new name for the task. Otherwise, pytask uses the function name.

# The id will be ".../task_data_preparation.py::create_random_data".


@pytask.mark.task
def create_random_data():
    ...


# The id will be ".../task_data_preparation.py::create_data".


@pytask.mark.task(name="create_data")
def create_random_data():
    ...

Customize task module names#

Use the configuration value task_files if you prefer a different naming scheme for the task modules. task_*.py is the default. You can specify one or multiple patterns to collect tasks from other files.