API¶

Command line interface¶

To extend pytask’s command line interface and set the right types for your options, pytask offers the following functionalities.

Classes¶

class pytask.ColoredCommand(name: str | None, context_settings: MutableMapping[str, Any] | None = None, callback: Callable[[...], Any] | None = None, params: List[Parameter] | None = None, help: str | None = None, epilog: str | None = None, short_help: str | None = None, options_metavar: str | None = '[OPTIONS]', add_help_option: bool = True, no_args_is_help: bool = False, hidden: bool = False, deprecated: bool = False)[source]¶: A command with colored help pages.

class pytask.ColoredGroup(*args, **kwargs)[source]¶: A command group with colored help pages.

class pytask.EnumChoice(enum_type: type[Enum], case_sensitive: bool = True)[source]¶

An enum-based choice type.

The implementation is copied from https://github.com/pallets/click/pull/2210 and related discussion can be found in https://github.com/pallets/click/issues/605.

In contrast to using click.Choice, using this type ensures that the error message does not show the enum members.

In contrast to the proposed implementation in the PR, this implementation does not use the members than rather the values of the enum.

Compatibility¶

pytask.check_for_optional_program(name: str, extra: str = '', errors: str = 'raise', caller: str = 'pytask') → bool | None[source]¶: Check whether an optional program exists.

pytask.import_optional_dependency(name: str, extra: str = '', errors: str = 'raise', min_version: str | None = None, caller: str = 'pytask') → types.ModuleType | None[source]¶

Import an optional dependency.

By default, if a dependency is missing an ImportError with a nice message will be raised. If a dependency is present, but too old, we raise.

Parameters:

name – The module name.
extra – Additional text to include in the ImportError message.
errors –
What to do when a dependency is not found or its version is too old.
- raise : Raise an ImportError
- warn : Only applicable when a module’s version is to old. Warns that the version is too old and returns None
- ignore: If the module is not installed, return None, otherwise, return the module, even if the version is too old. It’s expected that users validate the version locally when using errors="ignore" (see. io/html.py)
min_version – Specify a minimum version that is different from the global pandas minimum version required.
caller – The caller of the function.

Returns:

The imported module, when found and the version is correct. None is returned when the package is not found and errors is False, or when the package’s version is too old and errors is 'warn'.

Return type:

types.ModuleType | None

Console¶

To write to the terminal, use pytask’s console.

class pytask.console¶

Exceptions¶

Exceptions all inherit from

class pytask.PytaskError[source]¶: Base exception for pytask which should be inherited by all other exceptions.

The following exceptions can be used to interrupt pytask’s flow, emit reduced tracebacks and return the correct exit codes.

class pytask.CollectionError[source]¶: Exception during collection.

class pytask.ConfigurationError[source]¶: Exception during the configuration.

class pytask.ExecutionError[source]¶: Exception during execution.

class pytask.ResolvingDependenciesError[source]¶: Exception during resolving dependencies.

The remaining exceptions convey specific errors.

class pytask.NodeNotCollectedError[source]¶: Exception for nodes which could not be collected.

class pytask.NodeNotFoundError[source]¶: Exception for missing dependencies.

General classes¶

class pytask.Session(*, config: dict[str, Any] = NOTHING, collection_reports: list[CollectionReport] = NOTHING, dag: nx.DiGraph = NOTHING, hook: HookRelay = NOTHING, tasks: list[PTask] = NOTHING, dag_report: DagReport | None = None, execution_reports: list[ExecutionReport] = NOTHING, exit_code: ExitCode = ExitCode.OK, collection_start: float = inf, collection_end: float = inf, execution_start: float = inf, execution_end: float = inf, n_tasks_failed: int = 0, scheduler: Any = None, should_stop: bool = False, warnings: list[WarningReport] = NOTHING)[source]¶

The session of pytask.

Parameters:

config (dict[str, Any]) – Configuration of the session.
collection_reports (list[CollectionReport]) – Reports for collected items.
dag (nx.DiGraph) – The DAG of the project.
hook (HookRelay) – Holds all hooks collected by pytask.
tasks (list[PTask]) – List of collected tasks.
dag_reports – Reports for resolving dependencies failed.
execution_reports (list[ExecutionReport]) – Reports for executed tasks.
n_tasks_failed (int) – Number of tests which have failed.
should_stop (bool) – Indicates whether the session should be stopped.
warnings (list[WarningReport]) – A list of warnings captured during the run.

class pytask.DataCatalog(*, default_node: type[~_pytask.node_protocols.PNode] = <class '_pytask.nodes.PickleNode'>, entries: dict[str, ~_pytask.node_protocols.PNode | ~_pytask.node_protocols.PProvisionalNode] = NOTHING, name: str = 'default', path: ~pathlib.Path | None = None, session_config: dict[str, ~typing.Any] = NOTHING, instance_path: ~pathlib.Path = NOTHING)[source]¶

A data catalog.

Parameters:

default_node (type[_pytask.node_protocols.PNode]) – A default node for loading and saving values. By default, PickleNode is used to serialize any Python object with the pickle module.
entries (dict[str, _pytask.node_protocols.PNode | _pytask.node_protocols.PProvisionalNode]) – A collection of entries in the catalog. Entries can be PNode or a DataCatalog itself for nesting catalogs.
name (str) – The name of the data catalog. Use it when you are working with multiple data catalogs that store data under the same keys.
path (pathlib.Path | None) – A path where automatically created files are stored. By default, it will be .pytask/data_catalogs/default.

add(name: str, node: PNode | PProvisionalNode | None = None) → None[source]¶: Add an entry to the data catalog.

Marks¶

pytask uses marks to attach additional information to task functions that the host or plugins process. The following marks are available by default.

Built-in marks¶

pytask.mark.persist()¶: A marker for a task which should be persisted.

pytask.mark.skipif(condition: bool, *, reason: str)¶

Skip a task based on a condition and provide a necessary reason.

Parameters:

condition (bool) – A condition for when the task is skipped.
reason (str) – A reason why the task is skipped.

pytask.mark.skip_ancestor_failed(reason: str = 'No reason provided')¶

An internal marker for a task which is skipped because an ancestor failed.

Parameters:: reason (str) – A reason why the task is skipped.

pytask.mark.skip_unchanged()¶

An internal marker for a task which is skipped because nothing has changed.

Parameters:: reason (str) – A reason why the task is skipped.

pytask.mark.skip()¶: Skip a task.

pytask.mark.try_first()¶

Indicate that the task should be executed as soon as possible.

This indicator is a soft measure to influence the execution order of pytask.

Important

This indicator is not intended for general use to influence the build order and to overcome misspecification of task dependencies and products.

It should only be applied to situations where it is hard to define all dependencies and products and automatic inference may be incomplete like with pytask-latex and latex-dependency-scanner.

pytask.mark.try_last()¶

Indicate that the task should be executed as late as possible.

This indicator is a soft measure to influence the execution order of pytask.

Important

This indicator is not intended for general use to influence the build order and to overcome misspecification of task dependencies and products.

It should only be applied to situations where it is hard to define all dependencies and products and automatic inference may be incomplete like with pytask-latex and latex-dependency-scanner.

Custom marks¶

Marks are created dynamically using the factory object pytask.mark and applied as a decorator.

For example:

@pytask.mark.timeout(10, "slow", method="thread")
def task_function(): ...

Will create and attach a Mark object to the collected Task to the markers attribute. The mark object will have the following attributes:

mark.args == (10, "slow")
mark.kwargs == {"method": "thread"}

Example for using multiple custom markers:

@pytask.mark.timeout(10, "slow", method="thread")
@pytask.mark.slow
def task_function(): ...

Classes¶

class pytask.Mark(name: str, args: tuple[Any, ...], kwargs: Mapping[str, Any])[source]¶: A class for a mark containing the name, positional and keyword arguments.

pytask.mark¶: alias of <_pytask.mark.structures.MarkGenerator object>

class pytask.MarkDecorator(mark: Mark)[source]¶

A decorator for applying a mark on task function.

Decorators are created with pytask.mark.

mark1 = pytask.mark.NAME  # Simple MarkDecorator
mark2 = pytask.mark.NAME(name1=value)  # Parametrized MarkDecorator

and can then be applied as decorators to task functions

@mark2
def task_function():
    pass

When a MarkDecorator is called it does the following:

If called with a single function as its only positional argument and no additional keyword arguments, it attaches the mark to the function, containing all the arguments already stored internally in the MarkDecorator.
When called in any other case, it returns a new MarkDecorator instance with the original MarkDecorator’s content updated with the arguments passed to this call.

Notes

The rules above prevent decorators from storing only a single function or class reference as their positional argument with no additional keyword or positional arguments. You can work around this by using MarkDecorator.with_args().

class pytask.MarkGenerator[source]¶

Factory for MarkDecorator objects.

Exposed as a pytask.mark singleton instance.

Example

>>> import pytask

>>> @pytask.mark.skip
... def task_function():
...    pass

applies a ‘skip’ Mark on task_function.

Functions¶

These functions help you to handle marks.

pytask.get_all_marks(obj_or_task: Any | PTask) → list[Mark][source]¶: Get all marks from a callable or task.

pytask.get_marks(obj_or_task: Any | PTask, marker_name: str) → list[Mark][source]¶: Get marks from callable or task.

pytask.has_mark(obj_or_task: Any | PTask, marker_name: str) → bool[source]¶: Test if callable or task has a certain mark.

pytask.remove_marks(obj_or_task: Any | PTask, marker_name: str) → tuple[Any | PTask, list[Mark]][source]¶: Remove marks from callable or task.

pytask.set_marks(obj_or_task: Any | PTask, marks: list[Mark]) → Any | PTask[source]¶: Set marks on a callable or task.

Protocols¶

Protocols define how tasks and nodes for dependencies and products have to be set up.

protocol pytask.PNode[source]¶

Bases: Protocol

Protocol for nodes.

This protocol is runtime checkable.

Classes that implement this protocol must have the following methods / attributes:

__callable_proto_members_only__ = False¶

load(is_product: bool = False) → Any[source]¶

Return the value of the node that will be injected into the task.

Parameters:: is_product – Indicates whether the node is loaded as a dependency or as a product. It can be used to return a different value when the node is loaded with a product annotation. Then, we usually want to insert the node itself to allow the user calling PNode.load().

name: str¶

save(value: Any) → Any[source]¶: Save the value that was returned from a task.

property signature: str[source]¶: Return the signature of the node.

state() → str | None[source]¶

Return the state of the node.

The state can be something like a hash or a last modified timestamp. If the node does not exist, you can also return None.

protocol pytask.PPathNode[source]¶

Bases: PNode, Protocol

Nodes with paths.

Nodes with paths receive special handling when it comes to printing their names.

This protocol is runtime checkable.

Classes that implement this protocol must have the following methods / attributes:

path: Path¶

protocol pytask.PTask[source]¶

Bases: Protocol

Protocol for nodes.

This protocol is runtime checkable.

Classes that implement this protocol must have the following methods / attributes:

__callable_proto_members_only__ = False¶

attributes: dict[Any, Any]¶

depends_on: dict[str, PyTree[PNode | PProvisionalNode]]¶

execute(**kwargs: Any) → Any[source]¶: Return the value of the node that will be injected into the task.

function: Callable[..., Any]¶

markers: list[Mark]¶

name: str¶

produces: dict[str, PyTree[PNode | PProvisionalNode]]¶

report_sections: list[tuple[str, str, str]]¶

property signature: str[source]¶: Return the signature of the node.

state() → str | None[source]¶

Return the state of the node.

The state can be something like a hash or a last modified timestamp. If the node does not exist, you can also return None.

protocol pytask.PTaskWithPath[source]¶

Bases: PTask, Protocol

Tasks with paths.

Tasks with paths receive special handling when it comes to printing their names.

This protocol is runtime checkable.

Classes that implement this protocol must have the following methods / attributes:

path: Path¶

protocol pytask.PProvisionalNode[source]¶

Bases: Protocol

A protocol for provisional nodes.

This type of nodes is provisional since it resolves to actual nodes, PNode, right before a task is executed as a dependency and after the task is executed as a product.

Provisional nodes are nodes that define how the actual nodes look like. They can be useful when, for example, a task produces an unknown amount of nodes because it downloads some files.

This protocol is runtime checkable.

Classes that implement this protocol must have the following methods / attributes:

__callable_proto_members_only__ = False¶

collect() → list[Any][source]¶: Collect the objects that are defined by the provisional nodes.

load(is_product: bool = False) → Any[source]¶

Load a probisional node.

A provisional node will never be loaded as a dependency since it would be collected before.

It is possible to load a provisional node as a dependency so that it can inject basic information about it in the task. For example, pytask.DirectoryNode.load() injects the root directory.

name: str¶

property signature: str[source]¶: Return the signature of the node.

Nodes¶

Nodes are the interface for different kinds of dependencies or products.

class pytask.PathNode(*, path: Path, name: str = '')[source]¶

The class for a node which is a path.

name¶

Name of the node which makes it identifiable in the DAG.

Type:: str

path¶

The path to the file.

Type:: Path

classmethod from_path(path: Path) → PathNode[source]¶: Instantiate class from path to file.

load(is_product: bool = False) → Path[source]¶: Load the value.

save(value: bytes | str) → None[source]¶: Save strings or bytes to file.

property signature: str[source]¶: The unique signature of the node.

state() → str | None[source]¶

Calculate the state of the node.

The state is given by the modification timestamp.

class pytask.PickleNode(path: Path, name: str = '')[source]¶

A node for pickle files.

name¶

Name of the node which makes it identifiable in the DAG.

Type:: str

path¶

The path to the file.

Type:: Path

classmethod from_path(path: Path) → PickleNode[source]¶: Instantiate class from path to file.

load(is_product: bool = False) → Any[source]¶

Return the value of the node that will be injected into the task.

Parameters:: is_product – Indicates whether the node is loaded as a dependency or as a product. It can be used to return a different value when the node is loaded with a product annotation. Then, we usually want to insert the node itself to allow the user calling PNode.load().

save(value: Any) → None[source]¶: Save the value that was returned from a task.

property signature: str[source]¶: The unique signature of the node.

state() → str | None[source]¶

Return the state of the node.

The state can be something like a hash or a last modified timestamp. If the node does not exist, you can also return None.

class pytask.PythonNode(*, name: str = '', value: Any | NoDefault = <no_default>, hash: bool | Callable[[Any], bool] = False, node_info: NodeInfo | None = None)[source]¶

The class for a node which is a Python object.

name¶

The name of the node.

Type:: str

value¶

The value of the node.

Type:: Any | NoDefault

hash¶

Whether the value should be hashed to determine the state. Use True for objects that are hashable like strings and tuples. For dictionaries and other non-hashable objects, you need to provide a function that can hash these objects.

Type:: bool | Callable[[Any], bool]

node_info¶

The infos acquired while collecting the node.

Type:: NodeInfo | None

Examples

To allow a PythonNode to hash a dictionary, you need to pass your own hashing function. For example, from the deepdiff library.

>>> from deepdiff import DeepHash
>>> node = PythonNode(name="node", value={"a": 1}, hash=lambda x: DeepHash(x)[x])

Warning

Hashing big objects can require some time.

load(is_product: bool = False) → Any[source]¶: Load the value.

save(value: Any) → None[source]¶: Save the value.

property signature: str[source]¶: The unique signature of the node.

state() → str | None[source]¶

Calculate state of the node.

If hash = False, the function returns "0", a constant hash value, so the PythonNode is ignored when checking for a changed state of the task.

If hash is a callable, then use this function to calculate a hash.

If hash = True, the builtin hash() function (link) is used for all types except strings.

The hash for strings and bytes is calculated using hashlib because hash("asd") returns a different value every invocation since the hash of strings is salted with a random integer and it would confuse users. See {meth}`object.__hash__` for more information.

class pytask.DirectoryNode(*, name: str = '', pattern: str = '*', root_dir: Path | None = None)[source]¶

The class for a provisional node that works with directories.

name¶

The name of the node.

Type:: str

pattern¶

Patterns are the same as for fnmatch, with the addition of ** which means “this directory and all subdirectories, recursively”.

Type:: str

root_dir¶

The pattern is interpreted relative to the path given by root_dir. If root_dir = None, it is the directory where the path is defined.

Type:: pathlib.Path | None

collect() → list[Path][source]¶: Collect paths defined by the pattern.

load(is_product: bool = False) → Path[source]¶: Inject a path into the task when loaded as a product.

property signature: str[source]¶: The unique signature of the node.

To parse dependencies and products from nodes, use the following functions.

pytask.parse_dependencies_from_task_function(session: Session, task_path: Path | None, task_name: str, node_path: Path, obj: Any) → dict[str, Any][source]¶: Parse dependencies from task function.

pytask.parse_products_from_task_function(session: Session, task_path: Path | None, task_name: str, node_path: Path, obj: Any) → dict[str, Any][source]¶

Parse products from task function.

Raises:: NodeNotCollectedError – If multiple ways to parse products from the return of the task function are used.

Tasks¶

To mark any callable as a task use

Decorate a task function.

This decorator declares every callable as a pytask task.

The function also attaches some metadata to the function like parsed kwargs and markers.

Parameters:

name – Use it to override the name of the task that is, by default, the name of the task function. Read Customize task names for more information.
after – An expression or a task function or a list of task functions that need to be executed before this task can be executed. See Depending on a task for more information.
is_generator – An indicator whether this task is a task generator.
id – An id for the task if it is part of a parametrization. Otherwise, an automatic id will be generated. See this tutorial for more information.
kwargs – A dictionary containing keyword arguments which are passed to the task when it is executed.
produces – Definition of products to parse the function returns and store them. See this how-to guide for more
id – An id for the task if it is part of a repetition. Otherwise, an automatic id will be generated. See The id for more information.
kwargs – Use a dictionary to pass any keyword arguments to the task function which can be dependencies or products of the task. Read @task(kwargs=...) for more information.
produces – Use this argument if you want to parse the return of the task function as a product, but you cannot annotate the return of the function. See this how-to guide or @task(produces=...) for more information.

Examples

To mark a function without the task_ prefix as a task, attach the decorator.

from typing import Annotated from pytask import task

@task()
def create_text_file() -> Annotated[str, Path("file.txt")]:
    return "Hello, World!"

Task are currently represented by the following classes:

class pytask.Task(*, base_name: str, path: Path, function: Callable[..., Any], depends_on: dict[str, PyTree[PNode | PProvisionalNode]] = NOTHING, produces: dict[str, PyTree[PNode | PProvisionalNode]] = NOTHING, markers: list[Mark] = NOTHING, report_sections: list[tuple[str, str, str]] = NOTHING, attributes: dict[Any, Any] = NOTHING)[source]¶

The class for tasks which are Python functions.

base_name¶

The base name of the task.

Type:: str

path¶

Path to the file where the task was defined.

Type:: Path

function¶

The task function.

Type:: Callable[…, Any]

name¶

The name of the task.

Type:: str

depends_on¶

A list of dependencies of task.

Type:: dict[str, PyTree[PNode | PProvisionalNode]]

produces¶

A list of products of task.

Type:: dict[str, PyTree[PNode | PProvisionalNode]]

markers¶

A list of markers attached to the task function.

Type:: list[Mark]

report_sections¶

Reports with entries for when, what, and content.

Type:: list[tuple[str, str, str]]

attributes¶

A dictionary to store additional information of the task.

Type:: dict[Any, Any]

class pytask.TaskWithoutPath(*, name: str, function: Callable[..., Any], depends_on: dict[str, PyTree[PNode | PProvisionalNode]] = NOTHING, produces: dict[str, PyTree[PNode | PProvisionalNode]] = NOTHING, markers: list[Mark] = NOTHING, report_sections: list[tuple[str, str, str]] = NOTHING, attributes: dict[Any, Any] = NOTHING)[source]¶

The class for tasks without a source file.

Tasks may have no source file because - they are dynamically created in a REPL. - they are created in a Jupyter notebook.

name¶

The name of the task.

Type:: str

function¶

The task function.

Type:: Callable[…, Any]

depends_on¶

A list of dependencies of task.

Type:: dict[str, PyTree[PNode | PProvisionalNode]]

produces¶

A list of products of task.

Type:: dict[str, PyTree[PNode | PProvisionalNode]]

markers¶

A list of markers attached to the task function.

Type:: list[Mark]

report_sections¶

Reports with entries for when, what, and content.

Type:: list[tuple[str, str, str]]

attributes¶

A dictionary to store additional information of the task.

Type:: dict[Any, Any]

Currently, there are no different types of tasks since changing the .function attribute with a custom callable proved to be sufficient.

To carry over information from user-defined tasks like task functions to pytask.Task objects, use a metadata object that is stored in an .pytask_meta attribute of the task function.

class pytask.CollectionMetadata(after: str | list[Callable[..., Any]] = NOTHING, attributes: dict[str, Any] = NOTHING, is_generator: bool = False, id_: str | None = None, kwargs: dict[str, Any] = NOTHING, markers: list[Mark] = NOTHING, name: str | None = None, produces: PyTree[Any] | None = None, id: UUID = NOTHING)[source]¶

A class for carrying metadata from functions to tasks.

after¶

An expression or a task function or a list of task functions that need to be executed before this task can.

Type:: str | list[Callable[…, Any]]

id_¶

An id for the task if it is part of a parametrization. Otherwise, an automatic id will be generated. See this tutorial for more information.

Type:: str | None

is_generator¶

An indicator for whether a task generates other tasks or not.

Type:: bool

kwargs¶

A dictionary containing keyword arguments which are passed to the task when it is executed.

Type:: dict[str, Any]

markers¶

A list of markers that are attached to the task.

Type:: list[Mark]

name¶

Use it to override the name of the task that is, by default, the name of the callable.

Type:: str | None

produces¶

Definition of products to parse the function returns and store them. See this how-to guide for more information.

Type:: PyTree[Any] | None

Outcomes¶

The exit code of pytask is determined by

class pytask.ExitCode(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Exit codes for pytask.

OK = 0¶: Tasks were executed successfully.

FAILED = 1¶: Failed while executing tasks.

COLLECTION_FAILED = 3¶: Failed while collecting tasks.

DAG_FAILED = 4¶: Failed while building the DAG.

Collected items can have the following outcomes

class pytask.CollectionOutcome(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Outcomes of collected files or tasks.

FAIL¶: Outcome for failed collected files or tasks.

SUCCESS¶: Outcome for task which was executed successfully.

Tasks can have the following outcomes

class pytask.TaskOutcome(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶

Outcomes of tasks.

FAIL¶: Outcome for failed tasks.

PERSISTENCE¶: Outcome for tasks which should persist. Even if dependencies or products have changed, skip the task, update all hashes to the new ones, mark it as successful.

SKIP¶: Outcome for skipped tasks.

SKIP_PREVIOUS_FAILED¶: Outcome for tasks where a necessary preceding task has failed and, thus, this task could not have been executed.

SKIP_UNCHANGED¶: Outcome for tasks which do not need to be executed since all dependencies, source files and products have not changed.

SUCCESS¶: Outcome for task which was executed successfully.

The following exceptions are used to abort the execution of a task with an appropriate outcome.

class pytask.Exit(msg: str = 'unknown reason', returncode: int | None = None)[source]¶: Raised for immediate program exits (no tracebacks/summaries).

class pytask.Persisted[source]¶: Outcome if task should persist.

class pytask.Skipped[source]¶: Outcome if task is skipped.

class pytask.SkippedAncestorFailed[source]¶: Outcome if an ancestor failed.

class pytask.SkippedUnchanged[source]¶: Outcome if task has run before and is unchanged.

Functions¶

pytask.count_outcomes(reports: Sequence[CollectionReport | ExecutionReport], outcome_enum: type[CollectionOutcome | TaskOutcome]) → dict[Enum, int][source]¶

Count how often an outcome occurred.

Examples

>>> from _pytask.outcomes import CollectionOutcome, TaskOutcome
>>> count_outcomes([], CollectionOutcome)
{<CollectionOutcome.SUCCESS: 1>: 0, <CollectionOutcome.FAIL: 2>: 0}

Path utilities¶

pytask.path.import_path(path: Path, root: Path) → ModuleType[source]¶

Import and return a module from the given path.

The functions are taken from pytest when the import mode is set to importlib. It was assumed to be the new default import mode but insurmountable tradeoffs caused the default to be set to prepend. More discussion and information can be found in #373.

pytask.path.hash_path(path: Path, modification_time: float, digest: str = 'sha256') → str[source]¶

Compute the hash of a file.

The function is connected to a cache that is warmed up with previous hashes during the configuration phase.

Programmatic Interfaces¶

pytask.build_dag(raw_config: dict[str, Any]) → DiGraph[source]¶

Build the DAG.

This function is the programmatic interface to pytask dag and returns a preprocessed pygraphviz.AGraph which makes plotting easier than with matplotlib.

To change the style of the graph, it might be easier to convert the graph back to networkx, set attributes, and convert back to pygraphviz.

Parameters:: raw_config (Dict[str, Any]) – The configuration usually received from the CLI. For example, use {"paths": "example-directory/"} to collect tasks from a directory.
Returns:: A preprocessed graph which can be customized and exported.
Return type:: pygraphviz.AGraph

pytask.build(*, capture: Literal['fd', 'no', 'sys', 'tee-sys'] | CaptureMethod = CaptureMethod.FD, check_casing_of_paths: bool = True, config: Path | None = None, database_url: str = '', debug_pytask: bool = False, disable_warnings: bool = False, dry_run: bool = False, editor_url_scheme: Literal['no_link', 'file', 'vscode', 'pycharm'] | str = 'file', expression: str = '', force: bool = False, ignore: Iterable[str] = (), marker_expression: str = '', max_failures: float = inf, n_entries_in_table: int = 15, paths: Path | Iterable[Path] = (), pdb: bool = False, pdb_cls: str = '', s: bool = False, show_capture: Literal['no', 'stdout', 'stderr', 'all'] | ShowCapture = ShowCapture.ALL, show_errors_immediately: bool = False, show_locals: bool = False, show_traceback: bool = True, sort_table: bool = True, stop_after_first_failure: bool = False, strict_markers: bool = False, tasks: Callable[..., Any] | PTask | Iterable[Callable[..., Any] | PTask] = (), task_files: Iterable[str] = ('task_*.py',), trace: bool = False, verbose: int = 1, **kwargs: Any) → Session[source]¶

Run pytask.

This is the main command to run pytask which usually receives kwargs from the command line interface. It can also be used to run pytask interactively. Pass configuration in a dictionary.

Parameters:

capture – The capture method for stdout and stderr.
check_casing_of_paths – Whether errors should be raised when file names have different casings.
config – A path to the configuration file.
database_url – An URL to the database that tracks the status of tasks.
debug_pytask – Whether debug information should be shown.
disable_warnings – Whether warnings should be disabled and not displayed.
dry_run – Whether a dry-run should be performed that shows which tasks need to be rerun.
editor_url_scheme – An url scheme that allows to click on task names, node names and filenames and jump right into you preferred editor to the right line.
expression – Same as -k on the command line. Select tasks via expressions on task ids.
force – Run tasks even though they would be skipped since nothing has changed.
ignore – A pattern to ignore files or directories. Refer to pathlib.Path.match for more info.
marker_expression – Same as -m on the command line. Select tasks via marker expressions.
max_failures – Stop after some failures.
n_entries_in_table – How many entries to display in the table during the execution. Tasks which are running are always displayed.
paths – A path or collection of paths where pytask looks for the configuration and tasks.
pdb – Start the interactive debugger on errors.
pdb_cls – Start a custom debugger on errors. For example: --pdbcls=IPython.terminal.debugger:TerminalPdb
s – Shortcut for capture="no".
show_capture – Choose which captured output should be shown for failed tasks.
show_errors_immediately – Show errors with tracebacks as soon as the task fails.
show_locals – Show local variables in tracebacks.
show_traceback – Choose whether tracebacks should be displayed or not.
sort_table – Sort the table of tasks at the end of the execution.
stop_after_first_failure – Stop after the first failure.
strict_markers – Raise errors for unknown markers.
tasks – A task or a collection of tasks which can be callables or instances following {class}`~pytask.PTask`.
task_files – A pattern to describe modules that contain tasks.
trace – Enter debugger in the beginning of each task.
verbose – Make pytask verbose (>= 0) or quiet (= 0).

Returns:

session – The session captures all the information of the current run.

Return type:

pytask.Session

Reports¶

Reports are classes that handle successes and errors during the collection, dag resolution and execution.

class pytask.CollectionReport(outcome: CollectionOutcome, node: PTask | PNode | PProvisionalNode | None = None, exc_info: OptionalExceptionInfo | None = None)[source]¶: A collection report for a task.

class pytask.ExecutionReport(task: PTask, outcome: TaskOutcome, exc_info: OptionalExceptionInfo | None = None, sections: list[tuple[str, str, str]] = NOTHING)[source]¶: A report for an executed task.

class pytask.DagReport(exc_info: Tuple[Type[BaseException], BaseException, TracebackType | None] | Tuple[None, None, None])[source]¶: A report for an error during the creation of the DAG.

Tree utilities¶

class pytask.tree_util.PyTree[source]¶

Generic PyTree type.

>>> import torch
>>> from optree.typing import PyTree
>>> TensorTree = PyTree[torch.Tensor]
>>> TensorTree  
typing.Union[torch.Tensor,
             typing.Tuple[ForwardRef('PyTree[torch.Tensor]'), ...],
             typing.List[ForwardRef('PyTree[torch.Tensor]')],
             typing.Dict[typing.Any, ForwardRef('PyTree[torch.Tensor]')],
             typing.Deque[ForwardRef('PyTree[torch.Tensor]')],
             optree.typing.CustomTreeNode[ForwardRef('PyTree[torch.Tensor]')]]

pytask.tree_util.tree_flatten_with_path(tree: PyTree[T], is_leaf: Callable[[T], bool] | None = None, *, none_is_leaf: bool = True, namespace: str = 'pytask') → tuple[list[tuple[Any, ...]], list[T], PyTreeSpec]¶

Flatten a pytree and additionally record the paths.

See also tree_flatten(), tree_paths(), and treespec_paths().

The flattening order (i.e., the order of elements in the output list) is deterministic, corresponding to a left-to-right depth-first tree traversal.

>>> tree = {'b': (2, [3, 4]), 'a': 1, 'c': None, 'd': 5}
>>> tree_flatten_with_path(tree)  
(
    [('a',), ('b', 0), ('b', 1, 0), ('b', 1, 1), ('d',)],
    [1, 2, 3, 4, 5],
    PyTreeSpec({'a': *, 'b': (*, [*, *]), 'c': None, 'd': *})
)
>>> tree_flatten_with_path(tree, none_is_leaf=True)  
(
    [('a',), ('b', 0), ('b', 1, 0), ('b', 1, 1), ('c',), ('d',)],
    [1, 2, 3, 4, None, 5],
    PyTreeSpec({'a': *, 'b': (*, [*, *]), 'c': *, 'd': *}, NoneIsLeaf)
)
>>> tree_flatten_with_path(1)
([()], [1], PyTreeSpec(*))
>>> tree_flatten_with_path(None)
([], [], PyTreeSpec(None))
>>> tree_flatten_with_path(None, none_is_leaf=True)
([()], [None], PyTreeSpec(*, NoneIsLeaf))

For unordered dictionaries, dict and collections.defaultdict, the order is dependent on the sorted keys in the dictionary. Please use collections.OrderedDict if you want to keep the keys in the insertion order.

>>> from collections import OrderedDict
>>> tree = OrderedDict([('b', (2, [3, 4])), ('a', 1), ('c', None), ('d', 5)])
>>> tree_flatten_with_path(tree)  
(
    [('b', 0), ('b', 1, 0), ('b', 1, 1), ('a',), ('d',)],
    [2, 3, 4, 1, 5],
    PyTreeSpec(OrderedDict([('b', (*, [*, *])), ('a', *), ('c', None), ('d', *)]))
)
>>> tree_flatten_with_path(tree, none_is_leaf=True)  
(
    [('b', 0), ('b', 1, 0), ('b', 1, 1), ('a',), ('c',), ('d',)],
    [2, 3, 4, 1, None, 5],
    PyTreeSpec(OrderedDict([('b', (*, [*, *])), ('a', *), ('c', *), ('d', *)]), NoneIsLeaf)
)

Parameters:

tree (pytree) – A pytree to flatten.
is_leaf (callable, optional) – An optionally specified function that will be called at each flattening step. It should return a boolean, with True stopping the traversal and the whole subtree being treated as a leaf, and False indicating the flattening should traverse the current object.
none_is_leaf (bool, optional) – Whether to treat None as a leaf. If False, None is a non-leaf node with arity 0. Thus None is contained in the treespec rather than in the leaves list. (default: False)
namespace (str, optional) – The registry namespace used for custom pytree node types. (default: '', i.e., the global namespace)

Returns:

A triple (paths, leaves, treespec). The first element is a list of the paths to the leaf values, while each path is a tuple of the index or keys. The second element is a list of leaf values and the last element is a treespec representing the structure of the pytree.

pytask.tree_util.tree_leaves(tree: PyTree[T], is_leaf: Callable[[T], bool] | None = None, *, none_is_leaf: bool = True, namespace: str = 'pytask') → list[T]¶

Get the leaves of a pytree.

See also tree_flatten() and tree_iter().

>>> tree = {'b': (2, [3, 4]), 'a': 1, 'c': None, 'd': 5}
>>> tree_leaves(tree)
[1, 2, 3, 4, 5]
>>> tree_leaves(tree, none_is_leaf=True)
[1, 2, 3, 4, None, 5]
>>> tree_leaves(1)
[1]
>>> tree_leaves(None)
[]
>>> tree_leaves(None, none_is_leaf=True)
[None]

Parameters:

tree (pytree) – A pytree to flatten.
is_leaf (callable, optional) – An optionally specified function that will be called at each flattening step. It should return a boolean, with True stopping the traversal and the whole subtree being treated as a leaf, and False indicating the flattening should traverse the current object.
none_is_leaf (bool, optional) – Whether to treat None as a leaf. If False, None is a non-leaf node with arity 0. Thus None is contained in the treespec rather than in the leaves list. (default: False)
namespace (str, optional) – The registry namespace used for custom pytree node types. (default: '', i.e., the global namespace)

Returns:

A list of leaf values.

pytask.tree_util.tree_map(func: Callable[..., U], tree: PyTree[T], *rests: PyTree[S], is_leaf: Callable[[T], bool] | None = None, none_is_leaf: bool = True, namespace: str = 'pytask') → PyTree[U]¶

Map a multi-input function over pytree args to produce a new pytree.

See also tree_map_(), tree_map_with_path(), tree_map_with_path_(), and tree_broadcast_map().

>>> tree_map(lambda x: x + 1, {'x': 7, 'y': (42, 64)})
{'x': 8, 'y': (43, 65)}
>>> tree_map(lambda x: x + 1, {'x': 7, 'y': (42, 64), 'z': None})
{'x': 8, 'y': (43, 65), 'z': None}
>>> tree_map(lambda x: x is None, {'x': 7, 'y': (42, 64), 'z': None})
{'x': False, 'y': (False, False), 'z': None}
>>> tree_map(lambda x: x is None, {'x': 7, 'y': (42, 64), 'z': None}, none_is_leaf=True)
{'x': False, 'y': (False, False), 'z': True}

If multiple inputs are given, the structure of the tree is taken from the first input; subsequent inputs need only have tree as a prefix:

>>> tree_map(lambda x, y: [x] + y, [5, 6], [[7, 9], [1, 2]])
[[5, 7, 9], [6, 1, 2]]

Parameters:

func (callable) – A function that takes 1 + len(rests) arguments, to be applied at the corresponding leaves of the pytrees.
tree (pytree) – A pytree to be mapped over, with each leaf providing the first positional argument to function func.
rests (tuple of pytree) – A tuple of pytrees, each of which has the same structure as tree or has tree as a prefix.
is_leaf (callable, optional) – An optionally specified function that will be called at each flattening step. It should return a boolean, with True stopping the traversal and the whole subtree being treated as a leaf, and False indicating the flattening should traverse the current object.
none_is_leaf (bool, optional) – Whether to treat None as a leaf. If False, None is a non-leaf node with arity 0. Thus None is contained in the treespec rather than in the leaves list and None will be remain in the result pytree. (default: False)
namespace (str, optional) – The registry namespace used for custom pytree node types. (default: '', i.e., the global namespace)

Returns:

A new pytree with the same structure as tree but with the value at each leaf given by func(x, *xs) where x is the value at the corresponding leaf in tree and xs is the tuple of values at corresponding nodes in rests.

pytask.tree_util.tree_map_with_path(func: Callable[..., U], tree: PyTree[T], *rests: PyTree[S], is_leaf: Callable[[T], bool] | None = None, none_is_leaf: bool = True, namespace: str = 'pytask') → PyTree[U]¶

Map a multi-input function over pytree args as well as the tree paths to produce a new pytree.

See also tree_map(), tree_map_(), and tree_map_with_path_().

>>> tree_map_with_path(lambda p, x: (len(p), x), {'x': 7, 'y': (42, 64)})
{'x': (1, 7), 'y': ((2, 42), (2, 64))}
>>> tree_map_with_path(lambda p, x: x + len(p), {'x': 7, 'y': (42, 64), 'z': None})
{'x': 8, 'y': (44, 66), 'z': None}
>>> tree_map_with_path(lambda p, x: p, {'x': 7, 'y': (42, 64), 'z': {1.5: None}})
{'x': ('x',), 'y': (('y', 0), ('y', 1)), 'z': {1.5: None}}
>>> tree_map_with_path(lambda p, x: p, {'x': 7, 'y': (42, 64), 'z': {1.5: None}}, none_is_leaf=True)
{'x': ('x',), 'y': (('y', 0), ('y', 1)), 'z': {1.5: ('z', 1.5)}}

Parameters:

func (callable) – A function that takes 2 + len(rests) arguments, to be applied at the corresponding leaves of the pytrees with extra paths.
tree (pytree) – A pytree to be mapped over, with each leaf providing the second positional argument and the corresponding path providing the first positional argument to function func.
rests (tuple of pytree) – A tuple of pytrees, each of which has the same structure as tree or has tree as a prefix.
is_leaf (callable, optional) – An optionally specified function that will be called at each flattening step. It should return a boolean, with True stopping the traversal and the whole subtree being treated as a leaf, and False indicating the flattening should traverse the current object.
none_is_leaf (bool, optional) – Whether to treat None as a leaf. If False, None is a non-leaf node with arity 0. Thus None is contained in the treespec rather than in the leaves list and None will be remain in the result pytree. (default: False)
namespace (str, optional) – The registry namespace used for custom pytree node types. (default: '', i.e., the global namespace)

Returns:

A new pytree with the same structure as tree but with the value at each leaf given by func(p, x, *xs) where (p, x) are the path and value at the corresponding leaf in tree and xs is the tuple of values at corresponding nodes in rests.

pytask.tree_util.tree_structure(tree: PyTree[T], is_leaf: Callable[[T], bool] | None = None, *, none_is_leaf: bool = True, namespace: str = 'pytask') → PyTreeSpec¶

Get the treespec for a pytree.

Typing¶

class pytask.Product¶

An indicator to mark arguments of tasks as products.

>>> def task_example(path: Annotated[Path, Product]) -> None:
...     path.write_text("Hello, World!")

pytask.is_task_function(obj: Any) → bool[source]¶: Check if an object is a task function.

Tracebacks¶

class pytask.Traceback(exc_info: Tuple[Type[BaseException], BaseException, TracebackType | None] | Tuple[None, None, None], show_locals: bool = NOTHING)[source]¶

Warnings¶

Classes¶

class pytask.WarningReport(message, fs_location, id_)[source]¶

Functions¶

pytask.parse_warning_filter(arg: str, *, escape: bool) → tuple[warnings._ActionKind, str, type[Warning], str, int][source]¶

Parse a warnings filter string.

This is copied from warnings._setoption with the following changes:

Does not apply the filter.
Escaping is optional.
Raises UsageError so we get nice error messages on failure.

pytask.warning_record_to_str(warning_message: WarningMessage) → str[source]¶: Convert a warnings.WarningMessage to a string.