Functional Interface

pytask offers a functional interface to users who want more flexibility than is given by a command line interface. It even allows you to run pytask from a Python interpreter or a Jupyter notebook like this article here.

Let’s see how it works!

[1]:
from pathlib import Path
from typing import Annotated

import pytask
from pytask import task

Here is a small workflow where two tasks create two text files and the third task merges both of them into one file.

One important bit to note here is that the second task is created from a lambda function. So, you can use dynamically defined functions to create tasks.

It also shows how easy it is to wrap any third-party function where you have no control over the signature, but you can still easily wrap them with pytask.

[2]:
def task_create_first_file() -> Annotated[str, Path("first.txt")]:
    return "Hello, "


task_create_second_file = task(
    name="task_create_second_file", produces=Path("second.txt")
)(lambda *x: "World!")


def task_merge_files(
    first: Path = Path("first.txt"), second: Path = Path("second.txt")
) -> Annotated[str, Path("hello_world.txt")]:
    return first.read_text() + second.read_text()

Now, let us execute this little workflow.

[3]:
session = pytask.build(
    tasks=[task_create_first_file, task_merge_files, task_create_second_file]
)
────────────────────────────────────────────── Start pytask session ───────────────────────────────────────────────
Platform: linux -- Python 3.11.5, pytask 0.4.0, pluggy 1.3.0
Root: /home/tobia/git/pytask
Collected 3 tasks.
╭─────────────────────────┬─────────╮
│ Task                     Outcome │
├─────────────────────────┼─────────┤
│ task_create_first_file .       │
│ task_create_second_file.       │
│ task_merge_files       .       │
╰─────────────────────────┴─────────╯
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭─────────── Summary ────────────╮
  3  Collected tasks            
  3  Succeeded        (100.0%)  
╰────────────────────────────────╯
──────────────────────────────────────────── Succeeded in 0.19 seconds ────────────────────────────────────────────

The information on the executed workflow can be found in the session.

[4]:
session
[4]:
Session(config={'pm': <pluggy._manager.PluginManager object at 0x7f3c1b438090>, 'markers': {'depends_on': 'Add dependencies to a task. See this tutorial for more information: [link https://bit.ly/3JlxylS]https://bit.ly/3JlxylS[/].', 'filterwarnings': 'Add a filter for a warning to a task.', 'persist': 'Prevent execution of a task if all products exist and even if something has changed (dependencies, source file, products). This decorator might be useful for expensive tasks where only the formatting of the file has changed. The state of the files which have changed will also be remembered and another run will skip the task with success.', 'produces': 'Add products to a task. See this tutorial for more information: [link https://bit.ly/3JlxylS]https://bit.ly/3JlxylS[/].', 'skip': 'Skip a task and all its dependent tasks.', 'skip_ancestor_failed': 'Internal decorator applied to tasks if any of its preceding tasks failed.', 'skip_unchanged': 'Internal decorator applied to tasks which have already been executed and have not been changed.', 'skipif': 'Skip a task and all its dependent tasks if a condition is met.', 'task': 'Mark a function as a task regardless of its name. Or mark tasks which are repeated in a loop. See this tutorial for more information: [link https://bit.ly/3DWrXS3]https://bit.ly/3DWrXS3[/].', 'try_first': 'Try to execute a task a early as possible.', 'try_last': 'Try to execute a task a late as possible.'}, 'config': None, 'database_url': sqlite:////home/tobia/git/pytask/.pytask/.pytask.sqlite3, 'editor_url_scheme': 'file', 'export': <_ExportFormats.NO: 'no'>, 'ignore': ['.codecov.yml', '.gitignore', '.pre-commit-config.yaml', '.readthedocs.yml', '.readthedocs.yaml', 'readthedocs.yml', 'readthedocs.yaml', 'environment.yml', 'pyproject.toml', 'tox.ini', '.git/*', '.venv/*', '*.egg-info/*', '.ipynb_checkpoints/*', '.mypy_cache/*', '.nox/*', '.tox/*', '_build/*', '__pycache__/*', 'build/*', 'dist/*', 'pytest_cache/*'], 'paths': [], 'layout': 'dot', 'output_path': 'dag.pdf', 'rank_direction': <_RankDirection.TB: 'TB'>, 'expression': '', 'marker_expression': '', 'nodes': False, 'strict_markers': False, 'directories': False, 'exclude': [None, '.git/*'], 'mode': <_CleanMode.DRY_RUN: 'dry-run'>, 'quiet': False, 'capture': <CaptureMethod.NO: 'no'>, 'debug_pytask': False, 'disable_warnings': False, 'dry_run': False, 'force': False, 'max_failures': inf, 'n_entries_in_table': 15, 'pdb': False, 'pdbcls': None, 's': False, 'show_capture': True, 'show_errors_immediately': False, 'show_locals': False, 'show_traceback': True, 'sort_table': True, 'trace': False, 'verbose': 1, 'stop_after_first_failure': False, 'check_casing_of_paths': True, 'pdb_cls': '', 'tasks': [<function task_create_first_file at 0x7f3c1b55f6a0>, <function task_merge_files at 0x7f3c1b407e20>, <function <lambda> at 0x7f3c1b407d80>], 'task_files': ['task_*.py'], 'command': 'build', 'root': PosixPath('/home/tobia/git/pytask'), 'filterwarnings': []}, hook=<pluggy._hooks.HookRelay object at 0x7f3c3c31bbc0>, collection_reports=[CollectionReport(outcome=<CollectionOutcome.SUCCESS: 1>, node=TaskWithoutPath(name='task_create_first_file', function=<function task_create_first_file at 0x7f3c1b55f6a0>, depends_on={}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/first.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/first.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.0767577, 1696055304.077608)}), exc_info=None), CollectionReport(outcome=<CollectionOutcome.SUCCESS: 1>, node=TaskWithoutPath(name='task_merge_files', function=<function task_merge_files at 0x7f3c1b407e20>, depends_on={'first': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/first.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/first.txt')), 'second': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/second.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/second.txt'))}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/hello_world.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/hello_world.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.123595, 1696055304.1244528)}), exc_info=None), CollectionReport(outcome=<CollectionOutcome.SUCCESS: 1>, node=TaskWithoutPath(name='task_create_second_file', function=<function <lambda> at 0x7f3c1b407d80>, depends_on={}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/second.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/second.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.025182, 1696055304.0267167)}), exc_info=None)], tasks=[TaskWithoutPath(name='task_create_first_file', function=<function task_create_first_file at 0x7f3c1b55f6a0>, depends_on={}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/first.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/first.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.0767577, 1696055304.077608)}), TaskWithoutPath(name='task_merge_files', function=<function task_merge_files at 0x7f3c1b407e20>, depends_on={'first': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/first.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/first.txt')), 'second': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/second.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/second.txt'))}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/hello_world.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/hello_world.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.123595, 1696055304.1244528)}), TaskWithoutPath(name='task_create_second_file', function=<function <lambda> at 0x7f3c1b407d80>, depends_on={}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/second.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/second.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.025182, 1696055304.0267167)})], dag=<networkx.classes.digraph.DiGraph object at 0x7f3c1b440810>, resolving_dependencies_report=None, execution_reports=[ExecutionReport(task=TaskWithoutPath(name='task_create_second_file', function=<function <lambda> at 0x7f3c1b407d80>, depends_on={}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/second.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/second.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.025182, 1696055304.0267167)}), outcome=<TaskOutcome.SUCCESS: 1>, exc_info=None, sections=[]), ExecutionReport(task=TaskWithoutPath(name='task_create_first_file', function=<function task_create_first_file at 0x7f3c1b55f6a0>, depends_on={}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/first.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/first.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.0767577, 1696055304.077608)}), outcome=<TaskOutcome.SUCCESS: 1>, exc_info=None, sections=[]), ExecutionReport(task=TaskWithoutPath(name='task_merge_files', function=<function task_merge_files at 0x7f3c1b407e20>, depends_on={'first': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/first.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/first.txt')), 'second': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/second.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/second.txt'))}, produces={'return': PathNode(name='/home/tobia/git/pytask/docs/source/how_to_guides/hello_world.txt', path=PosixPath('/home/tobia/git/pytask/docs/source/how_to_guides/hello_world.txt'))}, markers=[Mark(name='task', args=(), kwargs={})], report_sections=[], attributes={'duration': (1696055304.123595, 1696055304.1244528)}), outcome=<TaskOutcome.SUCCESS: 1>, exc_info=None, sections=[])], exit_code=<ExitCode.OK: 0>, collection_start=1696055303.989013, collection_end=1696055303.9959698, execution_start=1696055304.0121965, execution_end=1696055304.207084, n_tasks_failed=0, scheduler=TopologicalSorter(dag=<networkx.classes.digraph.DiGraph object at 0x7f3c1b4a76d0>, priorities={'task_create_first_file': 0, 'task_merge_files': 0, 'task_create_second_file': 0}, _dag_backup=<networkx.classes.digraph.DiGraph object at 0x7f3c1b4a2e10>, _is_prepared=True, _nodes_out=set()), should_stop=False, warnings=[])

Configuring the build

To configure the build, {func}pytask.build has many more options that are the same that you find on the commandline.

[5]:
pytask.build?
Signature:
pytask.build(
    *,
    capture: "Literal['fd', 'no', 'sys', 'tee-sys'] | CaptureMethod" = <CaptureMethod.NO: 'no'>,
    check_casing_of_paths: 'bool' = True,
    config: 'Path | None' = None,
    database_url: 'str' = '',
    debug_pytask: 'bool' = False,
    disable_warnings: 'bool' = False,
    dry_run: 'bool' = False,
    editor_url_scheme: "Literal['no_link', 'file', 'vscode', 'pycharm'] | str" = 'file',
    expression: 'str' = '',
    force: 'bool' = False,
    ignore: 'Iterable[str]' = (),
    marker_expression: 'str' = '',
    max_failures: 'float' = inf,
    n_entries_in_table: 'int' = 15,
    paths: 'str | Path | Iterable[str | Path]' = (),
    pdb: 'bool' = False,
    pdb_cls: 'str' = '',
    s: 'bool' = False,
    show_capture: 'bool' = True,
    show_errors_immediately: 'bool' = False,
    show_locals: 'bool' = False,
    show_traceback: 'bool' = True,
    sort_table: 'bool' = True,
    stop_after_first_failure: 'bool' = False,
    strict_markers: 'bool' = False,
    tasks: 'Callable[..., Any] | PTask | Iterable[Callable[..., Any] | PTask]' = (),
    task_files: 'str | Iterable[str]' = 'task_*.py',
    trace: 'bool' = False,
    verbose: 'int' = 1,
    **kwargs: 'Any',
) -> 'Session'
Docstring:
Run pytask.

This is the main command to run pytask which usually receives kwargs from the
command line interface. It can also be used to run pytask interactively. Pass
configuration in a dictionary.

Parameters
----------
capture
    The capture method for stdout and stderr.
check_casing_of_paths
    Whether errors should be raised when file names have different casings.
config
    A path to the configuration file.
database_url
    An URL to the database that tracks the status of tasks.
debug_pytask
    Whether debug information should be shown.
disable_warnings
    Whether warnings should be disabled and not displayed.
dry_run
    Whether a dry-run should be performed that shows which tasks need to be rerun.
editor_url_scheme
    An url scheme that allows to click on task names, node names and filenames and
    jump right into you preferred edior to the right line.
expression
    Same as ``-k`` on the command line. Select tasks via expressions on task ids.
force
    Run tasks even though they would be skipped since nothing has changed.
ignore
    A pattern to ignore files or directories. Refer to ``pathlib.Path.match``
    for more info.
marker_expression
    Same as ``-m`` on the command line. Select tasks via marker expressions.
max_failures
    Stop after some failures.
n_entries_in_table
    How many entries to display in the table during the execution. Tasks which are
    running are always displayed.
paths
    A path or collection of paths where pytask looks for the configuration and
    tasks.
pdb
    Start the interactive debugger on errors.
pdb_cls
    Start a custom debugger on errors. For example:
    ``--pdbcls=IPython.terminal.debugger:TerminalPdb``
s
    Shortcut for ``pytask.build(capture"no")``.
show_capture
    Choose which captured output should be shown for failed tasks.
show_errors_immediately
    Show errors with tracebacks as soon as the task fails.
show_locals
    Show local variables in tracebacks.
show_traceback
    Choose whether tracebacks should be displayed or not.
sort_table
    Sort the table of tasks at the end of the execution.
stop_after_first_failure
    Stop after the first failure.
strict_markers
    Raise errors for unknown markers.
tasks
    A task or a collection of tasks that is passed to ``pytask.build(tasks=...)``.
task_files
    A pattern to describe modules that contain tasks.
trace
    Enter debugger in the beginning of each task.
verbose
    Make pytask verbose (>= 0) or quiet (= 0).

Returns
-------
session : pytask.Session
    The session captures all the information of the current run.
File:      ~/git/pytask/src/_pytask/build.py
Type:      function
[6]:
# Cleanup
for name in ("first.txt", "second.txt", "hello_world.txt"):
    Path(name).unlink()