Write a task#
Starting from the project structure in the previous tutorial, you will learn how to write your first task.
pytask will look for tasks in modules whose name starts with task_
. Tasks are
functions in these modules whose name also begins with task_
.
Our first task, task_create_random_data
, will be defined in
src/my_project/task_data_preparation.py
, and it will generate artificial data stored
in bld/data.pkl
.
my_project
├───pyproject.toml
│
├───src
│ └───my_project
│ ├────config.py
│ └────task_data_preparation.py
│
├───setup.py
│
├───.pytask.sqlite3
│
└───bld
└────data.pkl
Here, we define the function.
# Content of task_data_preparation.py.
import pytask
import numpy as np
import pandas as pd
from my_project.config import BLD
@pytask.mark.produces(BLD / "data.pkl")
def task_create_random_data(produces):
rng = np.random.default_rng(0)
beta = 2
x = rng.normal(loc=5, scale=10, size=1_000)
epsilon = rng.standard_normal(1_000)
y = beta * x + epsilon
df = pd.DataFrame({"x": x, "y": y})
df.to_pickle(produces)
To let pytask track the product of the task, you need to use the
@pytask.mark.produces
decorator.
Now, execute pytask to collect tasks in the current and subsequent directories.
$ pytask
──────────────────────────── Start pytask session ────────────────────────────
Platform: win32 -- Python <span style="color: var(--termynal-blue)">3.10.0</span>, pytask <span style="color: var(--termynal-blue)">0.3.0</span>, pluggy <span style="color: var(--termynal-blue)">1.0.0</span>
Root: C:\Users\pytask-dev\git\my_project
Collected <span style="color: var(--termynal-blue)">1</span> task.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Task ┃ Outcome ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ <span class="termynal-dim">task_data_preparation.py::</span>task_create_random_data │ <span class="termynal-success">.</span> │
└───────────────────────────────────────────────────┴─────────┘
<span class="termynal-dim">──────────────────────────────────────────────────────────────────────────────</span>
<span class="termynal-success">╭───────────</span> <span style="font-weight: bold;">Summary</span> <span class="termynal-success">────────────╮</span>
<span class="termynal-success">│</span> <span style="font-weight: bold;"> 1 Collected tasks </span> <span class="termynal-success">│</span>
<span class="termynal-success">│</span> <span class="termynal-success-textonly"> 1 Succeeded (100.0%) </span> <span class="termynal-success">│</span>
<span class="termynal-success">╰────────────────────────────────╯</span>
<span class="termynal-success">───────────────────────── Succeeded in 0.06 seconds ──────────────────────────</span>
Customize task names#
Use the @pytask.mark.task
decorator to mark a function as a
task regardless of its function name. You can optionally pass a new name for the task.
Otherwise, pytask uses the function name.
# The id will be ".../task_data_preparation.py::create_random_data".
@pytask.mark.task
def create_random_data():
...
# The id will be ".../task_data_preparation.py::create_data".
@pytask.mark.task(name="create_data")
def create_random_data():
...
Customize task module names#
Use the configuration value task_files
if you prefer a different naming
scheme for the task modules. task_*.py
is the default. You can specify one or multiple
patterns to collect tasks from other files.