Nodes and Tasks¶
Protocols¶
pytask.PNode
¶
Bases: Protocol
Protocol for nodes.
signature
property
¶
Return the signature of the node.
load(is_product=False)
¶
Return the value of the node that will be injected into the task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
is_product
|
bool
|
Indicates whether the node is loaded as a dependency or as a product. It can
be used to return a different value when the node is loaded with a product
annotation. Then, we usually want to insert the node itself to allow the
user calling |
False
|
save(value)
¶
Save the value that was returned from a task.
state()
¶
Return the state of the node.
The state can be something like a hash or a last modified timestamp. If the node
does not exist, you can also return None.
pytask.PPathNode
¶
pytask.PTask
¶
Bases: Protocol
Protocol for nodes.
signature
property
¶
Return the signature of the node.
execute(**kwargs)
¶
Return the value of the node that will be injected into the task.
state()
¶
Return the state of the node.
The state can be something like a hash or a last modified timestamp. If the node
does not exist, you can also return None.
pytask.PTaskWithPath
¶
pytask.PProvisionalNode
¶
Bases: Protocol
A protocol for provisional nodes.
This type of nodes is provisional since it resolves to actual nodes, pytask.PNode, right before a task is executed as a dependency and after the task is executed as a product.
Provisional nodes are nodes that define how the actual nodes look like. They can be useful when, for example, a task produces an unknown amount of nodes because it downloads some files.
signature
property
¶
Return the signature of the node.
collect()
¶
Collect the objects that are defined by the provisional nodes.
load(is_product=False)
¶
Load a probisional node.
A provisional node will never be loaded as a dependency since it would be collected before.
It is possible to load a provisional node as a dependency so that it can inject basic information about it in the task. For example, pytask.DirectoryNode.load injects the root directory.
Nodes¶
pytask.PathNode
dataclass
¶
Bases: PPathNode
The class for a node which is a path.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the node which makes it identifiable in the DAG. |
path |
NodePath
|
The path to the file. |
attributes |
dict[Any, Any]
|
A dictionary to store additional information of the task. |
signature
property
¶
The unique signature of the node.
from_path(path)
classmethod
¶
Instantiate class from path to file.
load(is_product=False)
¶
Load the value.
save(value)
¶
Save strings or bytes to file.
state()
¶
Calculate the state of the node.
The state is given by the modification timestamp.
pytask.PickleNode
dataclass
¶
Bases: PPathNode
A node for pickle files.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the node which makes it identifiable in the DAG. |
path |
NodePath
|
The path to the file. |
attributes |
dict[Any, Any]
|
A dictionary to store additional information of the task. |
serializer |
Callable[[Any, BinaryIO], None]
|
A function to serialize the object. Defaults to pickle.dump. |
deserializer |
Callable[[BinaryIO], Any]
|
A function to deserialize the object. Defaults to pickle.load. |
signature
property
¶
The unique signature of the node.
from_path(path)
classmethod
¶
Instantiate class from path to file.
load(is_product=False)
¶
Load the value or return the node when used as a product.
save(value)
¶
Serialize and save the value to disk.
state()
¶
Return the current state of the node.
pytask.PythonNode
dataclass
¶
Bases: PNode
The class for a node which is a Python object.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the node. |
value |
Any | NoDefault
|
The value of the node. |
hash |
bool | Callable[[Any], int | str]
|
Whether the value should be hashed to determine the state. Use |
node_info |
NodeInfo | None
|
The infos acquired while collecting the node. |
attributes |
dict[Any, Any]
|
A dictionary to store additional information of the task. |
Examples:
To allow a pytask.PythonNode to hash a dictionary, you need to pass your
own hashing function. For example, from the deepdiff library.
>>> from deepdiff import DeepHash
>>> from pytask import PythonNode
>>> node = PythonNode(name="node", value={"a": 1}, hash=lambda x: DeepHash(x)[x])
Warning
Hashing big objects can require some time.
signature
property
¶
The unique signature of the node.
load(is_product=False)
¶
Load the value.
save(value)
¶
Save the value.
state()
¶
Calculate state of the node.
If hash = False, the function returns "0", a constant hash value, so the
pytask.PythonNode is ignored when checking for a changed state of the task.
If hash is a callable, then use this function to calculate a hash expecting
an integer or string.
If hash = True, the builtin hash() function (link
<https://docs.python.org/3.11/library/functions.html?highlight=hash#hash>_) is
used for all types except strings.
The hash for strings and bytes is calculated using hashlib because
hash("asd") returns a different value every invocation since the hash of
strings is salted with a random integer and it would confuse users. See
object.__hash__ for more information.
pytask.DirectoryNode
dataclass
¶
Bases: PProvisionalNode
The class for a provisional node that works with directories.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the node. |
pattern |
str
|
Patterns are the same as for |
root_dir |
Path | None
|
The pattern is interpreted relative to the path given by |
attributes |
dict[Any, Any]
|
A dictionary to store additional information of the task. |
pytask.parse_dependencies_from_task_function(session, task_path, task_name, node_path, obj)
¶
Parse dependencies from task function.
pytask.parse_products_from_task_function(session, task_path, task_name, node_path, obj)
¶
Parse products from task function.
Raises:
| Type | Description |
|---|---|
NodeNotCollectedError
|
If multiple ways to parse products from the return of the task function are used. |
Tasks¶
pytask.task(name=None, *, after=None, is_generator=False, id=None, kwargs=None, produces=None)
¶
task(name: T) -> TaskDecorated[T]
task(name: str | None = None, *, after: str | Callable[..., Any] | list[Callable[..., Any]] | None = None, is_generator: bool = False, id: str | None = None, kwargs: dict[Any, Any] | None = None, produces: Any | None = None) -> Callable[[T], TaskDecorated[T]]
Decorate a task function.
This decorator declares every callable as a pytask task.
The function also attaches some metadata to the function like parsed kwargs and markers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str | T | None
|
Use it to override the name of the task that is, by default, the name of the task function. Read customize task names for more information. |
None
|
after
|
str | Callable[..., Any] | list[Callable[..., Any]] | None
|
An expression or a task function or a list of task functions that need to be executed before this task can be executed. See after for more information. |
None
|
is_generator
|
bool
|
An indicator whether this task is a task generator. |
False
|
id
|
str | None
|
An id for the task if it is part of a repetition. Otherwise, an automatic id will be generated. See this section for more information. |
None
|
kwargs
|
dict[Any, Any] | None
|
A dictionary containing keyword arguments which are passed to the task function. These can be dependencies or products of the task. Read task kwargs for more information. |
None
|
produces
|
Any | None
|
Use this argument to parse the return of the task function as a product. See this how-to guide or task produces for more information. |
None
|
Examples:
To mark a function without the task_ prefix as a task, attach the decorator.
from pathlib import Path
from typing import Annotated
from pytask import task
@task()
def create_text_file() -> Annotated[str, Path("file.txt")]:
return "Hello, World!"
pytask.Task
dataclass
¶
Bases: PTaskWithPath
The class for tasks which are Python functions.
Attributes:
| Name | Type | Description |
|---|---|---|
base_name |
str
|
The base name of the task. |
path |
Path
|
Path to the file where the task was defined. |
function |
Callable[..., Any]
|
The task function. |
name |
str
|
The name of the task. |
depends_on |
dict[str, PyTree[PNode | PProvisionalNode]]
|
A list of dependencies of task. |
produces |
dict[str, PyTree[PNode | PProvisionalNode]]
|
A list of products of task. |
markers |
list[Mark]
|
A list of markers attached to the task function. |
report_sections |
list[tuple[str, str, str]]
|
Reports with entries for when, what, and content. |
attributes |
dict[Any, Any]
|
A dictionary to store additional information of the task. |
pytask.TaskWithoutPath
dataclass
¶
Bases: PTask
The class for tasks without a source file.
Tasks may have no source file because - they are dynamically created in a REPL. - they are created in a Jupyter notebook.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the task. |
function |
Callable[..., Any]
|
The task function. |
depends_on |
dict[str, PyTree[PNode | PProvisionalNode]]
|
A list of dependencies of task. |
produces |
dict[str, PyTree[PNode | PProvisionalNode]]
|
A list of products of task. |
markers |
list[Mark]
|
A list of markers attached to the task function. |
report_sections |
list[tuple[str, str, str]]
|
Reports with entries for when, what, and content. |
attributes |
dict[Any, Any]
|
A dictionary to store additional information of the task. |
pytask.CollectionMetadata
dataclass
¶
A class for carrying metadata from functions to tasks.
Attributes:
| Name | Type | Description |
|---|---|---|
after |
str | list[Callable[..., Any]]
|
An expression or a task function or a list of task functions that need to be executed before this task can. |
id_ |
str | None
|
An id for the task if it is part of a parametrization. Otherwise, an automatic id will be generated. See this tutorial for more information. |
is_generator |
bool
|
An indicator for whether a task generates other tasks or not. |
kwargs |
dict[str, Any]
|
A dictionary containing keyword arguments which are passed to the task when it is executed. |
annotation_locals |
dict[str, Any] | None
|
A snapshot of local variables captured during decoration which helps evaluate deferred annotations later on. |
markers |
list[Mark]
|
A list of markers that are attached to the task. |
name |
str | None
|
Use it to override the name of the task that is, by default, the name of the callable. |
produces |
PyTree[Any] | None
|
Definition of products to parse the function returns and store them. See this how-to guide for more information. |