lsst.pipe.base  19.0.0-6-gb6b8b0a+10
Public Member Functions | Public Attributes | List of all members
lsst.pipe.base.graphBuilder._PipelineScaffolding Class Reference

Public Member Functions

def __init__ (self, pipeline, registry)
 
def fillDataIds (self, registry, inputCollections, userQuery)
 
def fillDatasetRefs (self, registry, inputCollections, outputCollection, skipExisting=True, clobberExisting=False)
 
def fillQuanta (self, registry, inputCollections, skipExisting=True)
 
def makeQuantumGraph (self)
 

Public Attributes

 tasks
 
 dimensions
 

Detailed Description

A helper data structure that organizes the information involved in
constructing a `QuantumGraph` for a `Pipeline`.

Parameters
----------
pipeline : `Pipeline`
    Sequence of tasks from which a graph is to be constructed.  Must
    have nested task classes already imported.
universe : `DimensionUniverse`
    Universe of all possible dimensions.

Raises
------
GraphBuilderError
    Raised if the task's dimensions are not a subset of the union of the
    pipeline's dataset dimensions.

Notes
-----
The scaffolding data structure contains nested data structures for both
tasks (`_TaskScaffolding`) and datasets (`_DatasetScaffolding`), with the
latter held by `_DatasetScaffoldingDict`.  The dataset data structures are
shared between the pipeline-level structure (which aggregates all datasets
and categorizes them from the perspective of the complete pipeline) and the
individual tasks that use them as inputs and outputs.

`QuantumGraph` construction proceeds in five steps, with each corresponding
to a different `_PipelineScaffolding` method:

1. When `_PipelineScaffolding` is constructed, we extract and categorize
   the DatasetTypes used by the pipeline (delegating to
   `PipelineDatasetTypes.fromPipeline`), then use these to construct the
   nested `_TaskScaffolding` and `_DatasetScaffolding` objects.

2. In `fillDataIds`, we construct and run the "Big Join Query", which
   returns related tuples of all dimensions used to identify any regular
   input, output, and intermediate datasets (not prerequisites).  We then
   iterate over these tuples of related dimensions, identifying the subsets
   that correspond to distinct data IDs for each task and dataset type.

3. In `fillDatasetRefs`, we run follow-up queries against all of the
   dataset data IDs previously identified, populating the
   `_DatasetScaffolding.refs` lists - except for those for prerequisite
    datasets, which cannot be resolved until distinct quanta are
    identified.

4. In `fillQuanta`, we extract subsets from the lists of `DatasetRef` into
   the inputs and outputs for each `Quantum` and search for prerequisite
   datasets, populating `_TaskScaffolding.quanta`.

5. In `makeQuantumGraph`, we construct a `QuantumGraph` from the lists of
   per-task quanta identified in the previous step.

Definition at line 341 of file graphBuilder.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.pipe.base.graphBuilder._PipelineScaffolding.__init__ (   self,
  pipeline,
  registry 
)

Definition at line 395 of file graphBuilder.py.

Member Function Documentation

◆ fillDataIds()

def lsst.pipe.base.graphBuilder._PipelineScaffolding.fillDataIds (   self,
  registry,
  inputCollections,
  userQuery 
)
Query for the data IDs that connect nodes in the `QuantumGraph`.

This method populates `_TaskScaffolding.dataIds` and
`_DatasetScaffolding.dataIds` (except for those in `prerequisites`).

Parameters
----------
registry : `lsst.daf.butler.Registry`
    Registry for the data repository; used for all data ID queries.
inputCollections : `~collections.abc.Mapping`
    Mapping from dataset type name to an ordered sequence of
    collections to search for that dataset.  A `defaultdict` is
    recommended for the case where the same collections should be
    used for most datasets.
userQuery : `str`, optional
    User-provided expression to limit the data IDs processed.

Definition at line 467 of file graphBuilder.py.

◆ fillDatasetRefs()

def lsst.pipe.base.graphBuilder._PipelineScaffolding.fillDatasetRefs (   self,
  registry,
  inputCollections,
  outputCollection,
  skipExisting = True,
  clobberExisting = False 
)
Perform follow up queries for each dataset data ID produced in
`fillDataIds`.

This method populates `_DatasetScaffolding.refs` (except for those in
`prerequisites`).

Parameters
----------
registry : `lsst.daf.butler.Registry`
    Registry for the data repository; used for all data ID queries.
inputCollections : `~collections.abc.Mapping`
    Mapping from dataset type name to an ordered sequence of
    collections to search for that dataset.  A `defaultdict` is
    recommended for the case where the same collections should be
    used for most datasets.
outputCollection : `str`
    Collection for all output datasets.
skipExisting : `bool`, optional
    If `True` (default), a Quantum is not created if all its outputs
    already exist.
clobberExisting : `bool`, optional
    If `True`, overwrite any outputs that already exist.  Cannot be
    `True` if ``skipExisting`` is.

Raises
------
ValueError
    Raised if both `skipExisting` and `clobberExisting` are `True`.
OutputExistsError
    Raised if an output dataset already exists in the output collection
    and both ``skipExisting`` and ``clobberExisting`` are `False`.  The
    case where some but not all of a quantum's outputs are present and
    ``skipExisting`` is `True` cannot be identified at this stage, and
    is handled by `fillQuanta` instead.

Definition at line 525 of file graphBuilder.py.

◆ fillQuanta()

def lsst.pipe.base.graphBuilder._PipelineScaffolding.fillQuanta (   self,
  registry,
  inputCollections,
  skipExisting = True 
)
Define quanta for each task by splitting up the datasets associated
with each task data ID.

This method populates `_TaskScaffolding.quanta`.

Parameters
----------
registry : `lsst.daf.butler.Registry`
    Registry for the data repository; used for all data ID queries.
inputCollections : `~collections.abc.Mapping`
    Mapping from dataset type name to an ordered sequence of
    collections to search for that dataset.  A `defaultdict` is
    recommended for the case where the same collections should be
    used for most datasets.
skipExisting : `bool`, optional
    If `True` (default), a Quantum is not created if all its outputs
    already exist.

Definition at line 602 of file graphBuilder.py.

◆ makeQuantumGraph()

def lsst.pipe.base.graphBuilder._PipelineScaffolding.makeQuantumGraph (   self)
Create a `QuantumGraph` from the quanta already present in
the scaffolding data structure.

Definition at line 683 of file graphBuilder.py.

Member Data Documentation

◆ dimensions

lsst.pipe.base.graphBuilder._PipelineScaffolding.dimensions

Definition at line 407 of file graphBuilder.py.

◆ tasks

lsst.pipe.base.graphBuilder._PipelineScaffolding.tasks

Definition at line 396 of file graphBuilder.py.


The documentation for this class was generated from the following file: