Skip to content

Nodes and Tasks

Protocols

pytask.PNode

Bases: Protocol

Protocol for nodes.

signature property

Return the signature of the node.

load(is_product=False)

Return the value of the node that will be injected into the task.

Parameters:

Name Type Description Default
is_product bool

Indicates whether the node is loaded as a dependency or as a product. It can be used to return a different value when the node is loaded with a product annotation. Then, we usually want to insert the node itself to allow the user calling PNode.load.

False

save(value)

Save the value that was returned from a task.

state()

Return the state of the node.

The state can be something like a hash or a last modified timestamp. If the node does not exist, you can also return None.

pytask.PPathNode

Bases: PNode, Protocol

Nodes with paths.

Nodes with paths receive special handling when it comes to printing their names.

pytask.PTask

Bases: Protocol

Protocol for nodes.

signature property

Return the signature of the node.

execute(**kwargs)

Return the value of the node that will be injected into the task.

state()

Return the state of the node.

The state can be something like a hash or a last modified timestamp. If the node does not exist, you can also return None.

pytask.PTaskWithPath

Bases: PTask, Protocol

Tasks with paths.

Tasks with paths receive special handling when it comes to printing their names.

pytask.PProvisionalNode

Bases: Protocol

A protocol for provisional nodes.

This type of nodes is provisional since it resolves to actual nodes, pytask.PNode, right before a task is executed as a dependency and after the task is executed as a product.

Provisional nodes are nodes that define how the actual nodes look like. They can be useful when, for example, a task produces an unknown amount of nodes because it downloads some files.

signature property

Return the signature of the node.

collect()

Collect the objects that are defined by the provisional nodes.

load(is_product=False)

Load a probisional node.

A provisional node will never be loaded as a dependency since it would be collected before.

It is possible to load a provisional node as a dependency so that it can inject basic information about it in the task. For example, pytask.DirectoryNode.load injects the root directory.

Nodes

pytask.PathNode dataclass

Bases: PPathNode

The class for a node which is a path.

Attributes:

Name Type Description
name str

Name of the node which makes it identifiable in the DAG.

path NodePath

The path to the file.

attributes dict[Any, Any]

A dictionary to store additional information of the task.

signature property

The unique signature of the node.

from_path(path) classmethod

Instantiate class from path to file.

load(is_product=False)

Load the value.

save(value)

Save strings or bytes to file.

state()

Calculate the state of the node.

The state is given by the modification timestamp.

pytask.PickleNode dataclass

Bases: PPathNode

A node for pickle files.

Attributes:

Name Type Description
name str

Name of the node which makes it identifiable in the DAG.

path NodePath

The path to the file.

attributes dict[Any, Any]

A dictionary to store additional information of the task.

serializer Callable[[Any, BinaryIO], None]

A function to serialize the object. Defaults to pickle.dump.

deserializer Callable[[BinaryIO], Any]

A function to deserialize the object. Defaults to pickle.load.

signature property

The unique signature of the node.

from_path(path) classmethod

Instantiate class from path to file.

load(is_product=False)

Load the value or return the node when used as a product.

save(value)

Serialize and save the value to disk.

state()

Return the current state of the node.

pytask.PythonNode dataclass

Bases: PNode

The class for a node which is a Python object.

Attributes:

Name Type Description
name str

The name of the node.

value Any | NoDefault

The value of the node.

hash bool | Callable[[Any], int | str]

Whether the value should be hashed to determine the state. Use True for objects that are hashable like strings and tuples. For dictionaries and other non-hashable objects, you need to provide a function that can hash these objects. The function should return either an integer or a string.

node_info NodeInfo | None

The infos acquired while collecting the node.

attributes dict[Any, Any]

A dictionary to store additional information of the task.

Examples:

To allow a pytask.PythonNode to hash a dictionary, you need to pass your own hashing function. For example, from the deepdiff library.

>>> from deepdiff import DeepHash
>>> from pytask import PythonNode
>>> node = PythonNode(name="node", value={"a": 1}, hash=lambda x: DeepHash(x)[x])

Warning

Hashing big objects can require some time.

signature property

The unique signature of the node.

load(is_product=False)

Load the value.

save(value)

Save the value.

state()

Calculate state of the node.

If hash = False, the function returns "0", a constant hash value, so the pytask.PythonNode is ignored when checking for a changed state of the task.

If hash is a callable, then use this function to calculate a hash expecting an integer or string.

If hash = True, the builtin hash() function (link <https://docs.python.org/3.11/library/functions.html?highlight=hash#hash>_) is used for all types except strings.

The hash for strings and bytes is calculated using hashlib because hash("asd") returns a different value every invocation since the hash of strings is salted with a random integer and it would confuse users. See object.__hash__ for more information.

pytask.DirectoryNode dataclass

Bases: PProvisionalNode

The class for a provisional node that works with directories.

Attributes:

Name Type Description
name str

The name of the node.

pattern str

Patterns are the same as for fnmatch, with the addition of ** which means "this directory and all subdirectories, recursively".

root_dir Path | None

The pattern is interpreted relative to the path given by root_dir. If root_dir = None, it is the directory where the path is defined.

attributes dict[Any, Any]

A dictionary to store additional information of the task.

signature property

The unique signature of the node.

collect()

Collect paths defined by the pattern.

load(is_product=False)

Inject a path into the task when loaded as a product.

pytask.parse_dependencies_from_task_function(session, task_path, task_name, node_path, obj)

Parse dependencies from task function.

pytask.parse_products_from_task_function(session, task_path, task_name, node_path, obj)

Parse products from task function.

Raises:

Type Description
NodeNotCollectedError

If multiple ways to parse products from the return of the task function are used.

Tasks

pytask.task(name=None, *, after=None, is_generator=False, id=None, kwargs=None, produces=None)

task(name: T) -> TaskDecorated[T]
task(name: str | None = None, *, after: str | Callable[..., Any] | list[Callable[..., Any]] | None = None, is_generator: bool = False, id: str | None = None, kwargs: dict[Any, Any] | None = None, produces: Any | None = None) -> Callable[[T], TaskDecorated[T]]

Decorate a task function.

This decorator declares every callable as a pytask task.

The function also attaches some metadata to the function like parsed kwargs and markers.

Parameters:

Name Type Description Default
name str | T | None

Use it to override the name of the task that is, by default, the name of the task function. Read customize task names for more information.

None
after str | Callable[..., Any] | list[Callable[..., Any]] | None

An expression or a task function or a list of task functions that need to be executed before this task can be executed. See after for more information.

None
is_generator bool

An indicator whether this task is a task generator.

False
id str | None

An id for the task if it is part of a repetition. Otherwise, an automatic id will be generated. See this section for more information.

None
kwargs dict[Any, Any] | None

A dictionary containing keyword arguments which are passed to the task function. These can be dependencies or products of the task. Read task kwargs for more information.

None
produces Any | None

Use this argument to parse the return of the task function as a product. See this how-to guide or task produces for more information.

None

Examples:

To mark a function without the task_ prefix as a task, attach the decorator.

from pathlib import Path
from typing import Annotated

from pytask import task


@task()
def create_text_file() -> Annotated[str, Path("file.txt")]:
    return "Hello, World!"

pytask.Task dataclass

Bases: PTaskWithPath

The class for tasks which are Python functions.

Attributes:

Name Type Description
base_name str

The base name of the task.

path Path

Path to the file where the task was defined.

function Callable[..., Any]

The task function.

name str

The name of the task.

depends_on dict[str, PyTree[PNode | PProvisionalNode]]

A list of dependencies of task.

produces dict[str, PyTree[PNode | PProvisionalNode]]

A list of products of task.

markers list[Mark]

A list of markers attached to the task function.

report_sections list[tuple[str, str, str]]

Reports with entries for when, what, and content.

attributes dict[Any, Any]

A dictionary to store additional information of the task.

signature property

The unique signature of the node.

execute(**kwargs)

Execute the task.

state()

Return the state of the node.

pytask.TaskWithoutPath dataclass

Bases: PTask

The class for tasks without a source file.

Tasks may have no source file because - they are dynamically created in a REPL. - they are created in a Jupyter notebook.

Attributes:

Name Type Description
name str

The name of the task.

function Callable[..., Any]

The task function.

depends_on dict[str, PyTree[PNode | PProvisionalNode]]

A list of dependencies of task.

produces dict[str, PyTree[PNode | PProvisionalNode]]

A list of products of task.

markers list[Mark]

A list of markers attached to the task function.

report_sections list[tuple[str, str, str]]

Reports with entries for when, what, and content.

attributes dict[Any, Any]

A dictionary to store additional information of the task.

execute(**kwargs)

Execute the task.

state()

Return the state of the node.

pytask.CollectionMetadata dataclass

A class for carrying metadata from functions to tasks.

Attributes:

Name Type Description
after str | list[Callable[..., Any]]

An expression or a task function or a list of task functions that need to be executed before this task can.

id_ str | None

An id for the task if it is part of a parametrization. Otherwise, an automatic id will be generated. See this tutorial for more information.

is_generator bool

An indicator for whether a task generates other tasks or not.

kwargs dict[str, Any]

A dictionary containing keyword arguments which are passed to the task when it is executed.

annotation_locals dict[str, Any] | None

A snapshot of local variables captured during decoration which helps evaluate deferred annotations later on.

markers list[Mark]

A list of markers that are attached to the task.

name str | None

Use it to override the name of the task that is, by default, the name of the callable.

produces PyTree[Any] | None

Definition of products to parse the function returns and store them. See this how-to guide for more information.