API#

Command line interface#

To extend pytask’s command line interface and set the right types for your options, pytask offers the following functionalities.

Classes#

class pytask.ColoredCommand(name: str | None, context_settings: MutableMapping[str, Any] | None = None, callback: Callable[[...], Any] | None = None, params: List[Parameter] | None = None, help: str | None = None, epilog: str | None = None, short_help: str | None = None, options_metavar: str | None = '[OPTIONS]', add_help_option: bool = True, no_args_is_help: bool = False, hidden: bool = False, deprecated: bool = False)[source]#: A command with colored help pages.

class pytask.ColoredGroup(*args, **kwargs)[source]#: A command group with colored help pages.

class pytask.EnumChoice(enum_type: type[Enum], case_sensitive: bool = True)[source]#

An enum-based choice type.

The implementation is copied from https://github.com/pallets/click/pull/2210 and related discussion can be found in https://github.com/pallets/click/issues/605.

In contrast to using click.Choice, using this type ensures that the error message does not show the enum members.

In contrast to the proposed implementation in the PR, this implementation does not use the members than rather the values of the enum.

Compatibility#

pytask.check_for_optional_program(name: str, extra: str = '', errors: str = 'raise', caller: str = 'pytask') → bool | None[source]#: Check whether an optional program exists.

pytask.import_optional_dependency(name: str, extra: str = '', errors: str = 'raise', min_version: str | None = None, caller: str = 'pytask') → module | None[source]#

Import an optional dependency.

By default, if a dependency is missing an ImportError with a nice message will be raised. If a dependency is present, but too old, we raise.

Parameters:

name – The module name.
extra – Additional text to include in the ImportError message.
errors –
What to do when a dependency is not found or its version is too old.
- raise : Raise an ImportError
- warn : Only applicable when a module’s version is to old. Warns that the version is too old and returns None
- ignore: If the module is not installed, return None, otherwise, return the module, even if the version is too old. It’s expected that users validate the version locally when using errors="ignore" (see. io/html.py)
min_version – Specify a minimum version that is different from the global pandas minimum version required.
caller – The caller of the function.

Returns:

The imported module, when found and the version is correct. None is returned when the package is not found and errors is False, or when the package’s version is too old and errors is 'warn'.

Return type:

types.ModuleType | None

Console#

To write to the terminal, use pytask’s console.

class pytask.console#

Marks#

pytask uses marks to attach additional information to task functions which is processed by the host or by plugins. The following marks are available by default.

Marks#

pytask.mark.depends_on(objects: Any | Iterable[Any] | dict[Any, Any])#

Specify dependencies for a task.

Parameters:: objects (Any | Iterable[Any] | dict[Any, Any]) – Can be any valid Python object or an iterable of any Python objects. To be valid, it must be parsed by some hook implementation for the _pytask.hookspecs.pytask_collect_node() entry-point.

pytask.mark.parametrize(arg_names, arg_values, *, ids)#

Parametrize a task function.

Parametrizing a task allows to execute the same task with different arguments.

Parameters:

arg_names (str | list[str] | tuple[str, ...]) – The names of the arguments which can either be given as a comma-separated string, a tuple of strings, or a list of strings.
arg_values (Iterable[Sequence[Any] | Any]) – The values which correspond to names in arg_names. For one argument, it is a single iterable. For multiple argument names it is an iterable of iterables.
ids (None | (Iterable[None | str | float | int | bool] | Callable[..., Any])) –
This argument can either be a list with ids or a function which is called with every value passed to the parametrized function.

If you pass an iterable with ids, make sure to only use bool, float, int, or str as values which are used to create task ids like "task_dummpy.py::task_dummy[first_task_id]".

If you pass a function, the function receives each value of the parametrization and may return a boolean, number, string or None. For the latter, the auto-generated value is used.

pytask.mark.persist()#: A marker for a task which should be peristed.

pytask.mark.produces(objects: Any | Iterable[Any] | dict[Any, Any])#

Specify products of a task.

Parameters:: objects (Any | Iterable[Any] | dict[Any, Any]) – Can be any valid Python object or an iterable of any Python objects. To be valid, it must be parsed by some hook implementation for the _pytask.hookspecs.pytask_collect_node() entry-point.

pytask.mark.skipif(condition: bool, *, reason: str)#

Skip a task based on a condition and provide a necessary reason.

Parameters:

condition (bool) – A condition for when the task is skipped.
reason (str) – A reason why the task is skipped.

pytask.mark.skip_ancestor_failed(reason: str = 'No reason provided')#

An internal marker for a task which is skipped because an ancestor failed.

Parameters:: reason (str) – A reason why the task is skipped.

pytask.mark.skip_unchanged()#

An internal marker for a task which is skipped because nothing has changed.

Parameters:: reason (str) – A reason why the task is skipped.

pytask.mark.skip()#: Skip a task.

pytask.mark.task(name, *, id, kwargs)#

The task decorator allows to mark any task function regardless of its name as a task or assigns a new task name.

It also allows to repeat tasks in for-loops by adding a specific id or keyword arguments via kwargs.

Parameters:

name (str | None) – The name of the task.
id (str | None) – An id for the task if it is part of a parametrization.
kwargs (dict[Any, Any] | None) – A dictionary containing keyword arguments which are passed to the task when it is executed.

pytask.mark.try_first()#

Indicate that the task should be executed as soon as possible.

This indicator is a soft measure to influence the execution order of pytask.

Important

This indicator is not intended for general use to influence the build order and to overcome misspecification of task dependencies and products.

It should only be applied to situations where it is hard to define all dependencies and products and automatic inference may be incomplete like with pytask-latex and latex-dependency-scanner.

pytask.mark.try_last()#

Indicate that the task should be executed as late as possible.

This indicator is a soft measure to influence the execution order of pytask.

Important

This indicator is not intended for general use to influence the build order and to overcome misspecification of task dependencies and products.

It should only be applied to situations where it is hard to define all dependencies and products and automatic inference may be incomplete like with pytask-latex and latex-dependency-scanner.

Custom marks#

Marks are created dynamically using the factory object pytask.mark and applied as a decorator.

For example:

@pytask.mark.timeout(10, "slow", method="thread")
def task_function():
    ...

Will create and attach a Mark object to the collected Task to the markers attribute. The mark object will have the following attributes:

mark.args == (10, "slow")
mark.kwargs == {"method": "thread"}

Example for using multiple custom markers:

@pytask.mark.timeout(10, "slow", method="thread")
@pytask.mark.slow
def task_function():
    ...

Classes#

class pytask.Mark(name: str, args: tuple[Any, ...], kwargs: Mapping[str, Any])[source]#: A class for a mark containing the name, positional and keyword arguments.

pytask.mark#: alias of <_pytask.mark.structures.MarkGenerator object>

class pytask.MarkDecorator(mark: Mark)[source]#

A decorator for applying a mark on task function.

Decorators are created with pytask.mark.

mark1 = pytask.mark.NAME  # Simple MarkDecorator
mark2 = pytask.mark.NAME(name1=value)  # Parametrized MarkDecorator

and can then be applied as decorators to task functions

@mark2
def task_function():
    pass

When a MarkDecorator is called it does the following:

If called with a single function as its only positional argument and no additional keyword arguments, it attaches the mark to the function, containing all the arguments already stored internally in the MarkDecorator.
When called in any other case, it returns a new MarkDecorator instance with the original MarkDecorator’s content updated with the arguments passed to this call.

Notes

The rules above prevent decorators from storing only a single function or class reference as their positional argument with no additional keyword or positional arguments. You can work around this by using MarkDecorator.with_args().

class pytask.MarkGenerator[source]#

Factory for MarkDecorator objects.

Exposed as a pytask.mark singleton instance.

Example

>>> import pytask

>>> @pytask.mark.skip
... def task_function():
...    pass

applies a ‘skip’ Mark on task_function.

Functions#

These functions help you to handle marks.

pytask.get_all_marks(obj_or_task: Any | Task) → list[Mark][source]#: Get all marks from a callable or task.

pytask.get_marks(obj_or_task: Any | Task, marker_name: str) → list[Mark][source]#: Get marks from callable or task.

pytask.has_mark(obj_or_task: Any | Task, marker_name: str) → bool[source]#: Test if callable or task has a certain mark.

pytask.remove_marks(obj_or_task: Any | Task, marker_name: str) → tuple[Any | Task, list[Mark]][source]#: Remove marks from callable or task.

pytask.set_marks(obj_or_task: Any | Task, marks: list[Mark]) → Any | Task[source]#: Set marks on a callable or task.

Exceptions#

Exceptions all inherit from

class pytask.PytaskError[source]#: Base exception for pytask which should be inherited by all other exceptions.

The following exceptions can be used to interrupt pytask’s flow, emit reduced tracebacks and return the correct exit codes.

class pytask.CollectionError[source]#: Exception during collection.

class pytask.ConfigurationError[source]#: Exception during the configuration.

class pytask.ExecutionError[source]#: Exception during execution.

class pytask.ResolvingDependenciesError[source]#: Exception during resolving dependencies.

The remaining exceptions convey specific errors.

class pytask.NodeNotCollectedError[source]#: Exception for nodes which could not be collected.

class pytask.NodeNotFoundError[source]#: Exception for missing dependencies.

General classes#

class pytask.Session(config: dict[str, Any] = _Nothing.NOTHING, hook: _HookRelay | None = None, collection_reports: list[CollectionReport] = _Nothing.NOTHING, tasks: list[Task] = _Nothing.NOTHING, dag: nx.DiGraph | None = None, resolving_dependencies_report: DagReport | None = None, execution_reports: list[ExecutionReport] = _Nothing.NOTHING, exit_code: ExitCode = ExitCode.OK, collection_start: float | None = None, collection_end: float | None = None, execution_start: float | None = None, execution_end: float | None = None, n_tasks_failed: int = 0, scheduler: Any = None, should_stop: bool = False, warnings: list[WarningReport] = _Nothing.NOTHING)[source]#: The session of pytask.

Nodes#

Nodes are the interface for different kinds of dependencies or products. They inherit from pytask.MetaNode.

class pytask.MetaNode[source]#: Meta class for nodes.

Then, different kinds of nodes can be implemented.

class pytask.FilePathNode(*, name: str, value: Path, path: Path)[source]#: The class for a node which is a path.

To parse dependencies and products from nodes, use the following functions.

pytask.depends_on(objects: Any | Iterable[Any] | dict[Any, Any]) → Any | Iterable[Any] | dict[Any, Any][source]#

Specify dependencies for a task.

Parameters:: objects – Can be any valid Python object or an iterable of any Python objects. To be valid, it must be parsed by some hook implementation for the _pytask.hookspecs.pytask_collect_node() entry-point.

pytask.parse_nodes(session: Session, path: Path, name: str, obj: Any, parser: Callable[..., Any]) → Any[source]#: Parse nodes from object.

pytask.produces(objects: Any | Iterable[Any] | dict[Any, Any]) → Any | Iterable[Any] | dict[Any, Any][source]#

Specify products of a task.

Parameters:: objects – Can be any valid Python object or an iterable of any Python objects. To be valid, it must be parsed by some hook implementation for the _pytask.hookspecs.pytask_collect_node() entry-point.

Tasks#

Task are currently represented by the following class:

class pytask.Task(*, base_name: str, path: Path, function: Callable[..., Any], depends_on: dict[str, MetaNode] = _Nothing.NOTHING, produces: dict[str, MetaNode] = _Nothing.NOTHING, markers: list[Mark] = _Nothing.NOTHING, kwargs: dict[str, Any] = _Nothing.NOTHING, report_sections: list[tuple[str, str, str]] = _Nothing.NOTHING, attributes: dict[Any, Any] = _Nothing.NOTHING)[source]#: The class for tasks which are Python functions.

Currently, there are no different types of tasks since changing the .function attribute with a custom callable proved to be sufficient.

To carry over information from user-defined tasks like task functions to pytask.Task objects, use a metadata object that is stored in an .pytask_meta attribute of the task function.

class pytask.CollectionMetadata(id_: str | None = None, kwargs: dict[str, Any] = _Nothing.NOTHING, markers: list[Mark] = _Nothing.NOTHING, name: str | None = None)[source]#: A class for carrying metadata from functions to tasks.

Outcomes#

The exit code of pytask is determined by

class pytask.ExitCode(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Exit codes for pytask.

OK = 0#: Tasks were executed successfully.

FAILED = 1#: Failed while executing tasks.

COLLECTION_FAILED = 3#: Failed while collecting tasks.

DAG_FAILED = 4#: Failed while building the DAG.

Collected items can have the following outcomes

class pytask.CollectionOutcome(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Outcomes of collected files or tasks.

FAIL#: Outcome for failed collected files or tasks.

SUCCESS#: Outcome for task which was executed successfully.

Tasks can have the following outcomes

class pytask.TaskOutcome(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Outcomes of tasks.

FAIL#: Outcome for failed tasks.

PERSISTENCE#: Outcome for tasks which should persist. Even if dependencies or products have changed, skip the task, update all hashes to the new ones, mark it as successful.

SKIP#: Outcome for skipped tasks.

SKIP_PREVIOUS_FAILED#: Outcome for tasks where a necessary preceding task has failed and, thus, this task could not have been executed.

SKIP_UNCHANGED#: Outcome for tasks which do not need to be executed since all dependencies, source files and products have not changed.

SUCCESS#: Outcome for task which was executed successfully.

The following exceptions are used to abort the execution of a task with an appropriate outcome.

class pytask.Exit(msg: str = 'unknown reason', returncode: int | None = None)[source]#: Raised for immediate program exits (no tracebacks/summaries).

class pytask.Persisted[source]#: Outcome if task should persist.

class pytask.Skipped[source]#: Outcome if task is skipped.

class pytask.SkippedAncestorFailed[source]#: Outcome if an ancestor failed.

class pytask.SkippedUnchanged[source]#: Outcome if task has run before and is unchanged.

Functions#

pytask.count_outcomes(reports: Sequence[CollectionReport | ExecutionReport], outcome_enum: type[CollectionOutcome] | type[TaskOutcome]) → dict[Enum, int][source]#

Count how often an outcome occurred.

Examples

>>> from _pytask.outcomes import CollectionOutcome, TaskOutcome
>>> count_outcomes([], CollectionOutcome)
{<CollectionOutcome.SUCCESS: 1>: 0, <CollectionOutcome.FAIL: 2>: 0}

Programmatic Interfaces#

pytask.build_dag(raw_config: dict[str, Any]) → DiGraph[source]#

Build the DAG.

This function is the programmatic interface to pytask dag and returns a preprocessed pygraphviz.AGraph which makes plotting easier than with matplotlib.

To change the style of the graph, it might be easier to convert the graph back to networkx, set attributes, and convert back to pygraphviz.

Parameters:: raw_config (Dict[str, Any]) – The configuration usually received from the CLI. For example, use {"paths": "example-directory/"} to collect tasks from a directory.
Returns:: A preprocessed graph which can be customized and exported.
Return type:: pygraphviz.AGraph

pytask.main(raw_config: dict[str, Any]) → Session[source]#

Run pytask.

This is the main command to run pytask which usually receives kwargs from the command line interface. It can also be used to run pytask interactively. Pass configuration in a dictionary.

Parameters:: raw_config (dict[str, Any]) – A dictionary with options passed to pytask. In general, this dictionary holds the information passed via the command line interface.
Returns:: session – The session captures all the information of the current run.
Return type:: _pytask.session.Session

Reports#

There are some classes to handle different kinds of reports.

class pytask.CollectionReport(outcome: CollectionOutcome, node: MetaNode | None = None, exc_info: ExceptionInfo | None = None)[source]#: A collection report for a task.

class pytask.ExecutionReport(task: Task, outcome: TaskOutcome, exc_info: ExceptionInfo | None = None, sections: list[tuple[str, str, str]] = _Nothing.NOTHING)[source]#: A report for an executed task.

class pytask.DagReport(exc_info: Tuple[Type[BaseException], BaseException, TracebackType | None])[source]#: A report for an error during the creation of the DAG.

Tracebacks#

pytask.format_exception_without_traceback(exc_info: Tuple[Type[BaseException], BaseException, TracebackType | None]) → str[source]#: Format an exception without displaying the traceback.

pytask.remove_internal_traceback_frames_from_exc_info(exc_info: Tuple[Type[BaseException], BaseException, TracebackType | None]) → Tuple[Type[BaseException], BaseException, TracebackType | None][source]#

Remove internal traceback frames from exception info.

If a non-internal traceback frame is found, return the traceback from the first occurrence downwards.

pytask.remove_traceback_from_exc_info(exc_info: Tuple[Type[BaseException], BaseException, TracebackType | None]) → Tuple[Type[BaseException], BaseException, TracebackType | None][source]#: Remove traceback from exception.

pytask.render_exc_info(exc_type: type[BaseException], exc_value: BaseException, traceback: str | TracebackType, show_locals: bool = False) → str | Traceback[source]#: Render an exception info.

Warnings#

Classes#

class pytask.WarningReport(message, fs_location, id_)[source]#

Functions#

pytask.parse_warning_filter(arg: str, *, escape: bool) → tuple[warnings._ActionKind, str, type[Warning], str, int][source]#

Parse a warnings filter string.

This is copied from warnings._setoption with the following changes:

Does not apply the filter.
Escaping is optional.
Raises UsageError so we get nice error messages on failure.

pytask.warning_record_to_str(warning_message: WarningMessage) → str[source]#: Convert a warnings.WarningMessage to a string.