A helper data structure that organizes the information involved in
constructing a `QuantumGraph` for a `Pipeline`.
Parameters
----------
pipeline : `Pipeline`
Sequence of tasks from which a graph is to be constructed. Must
have nested task classes already imported.
universe : `DimensionUniverse`
Universe of all possible dimensions.
Raises
------
GraphBuilderError
Raised if the task's dimensions are not a subset of the union of the
pipeline's dataset dimensions.
Notes
-----
The scaffolding data structure contains nested data structures for both
tasks (`_TaskScaffolding`) and datasets (`_DatasetScaffolding`), with the
latter held by `_DatasetScaffoldingDict`. The dataset data structures are
shared between the pipeline-level structure (which aggregates all datasets
and categorizes them from the perspective of the complete pipeline) and the
individual tasks that use them as inputs and outputs.
`QuantumGraph` construction proceeds in five steps, with each corresponding
to a different `_PipelineScaffolding` method:
1. When `_PipelineScaffolding` is constructed, we extract and categorize
the DatasetTypes used by the pipeline (delegating to
`PipelineDatasetTypes.fromPipeline`), then use these to construct the
nested `_TaskScaffolding` and `_DatasetScaffolding` objects.
2. In `fillDataIds`, we construct and run the "Big Join Query", which
returns related tuples of all dimensions used to identify any regular
input, output, and intermediate datasets (not prerequisites). We then
iterate over these tuples of related dimensions, identifying the subsets
that correspond to distinct data IDs for each task and dataset type.
3. In `fillDatasetRefs`, we run follow-up queries against all of the
dataset data IDs previously identified, populating the
`_DatasetScaffolding.refs` lists - except for those for prerequisite
datasets, which cannot be resolved until distinct quanta are
identified.
4. In `fillQuanta`, we extract subsets from the lists of `DatasetRef` into
the inputs and outputs for each `Quantum` and search for prerequisite
datasets, populating `_TaskScaffolding.quanta`.
5. In `makeQuantumGraph`, we construct a `QuantumGraph` from the lists of
per-task quanta identified in the previous step.
Definition at line 351 of file graphBuilder.py.
def lsst.pipe.base.graphBuilder._PipelineScaffolding.fillDatasetRefs |
( |
|
self, |
|
|
|
registry, |
|
|
|
collections, |
|
|
|
run, |
|
|
* |
skipExisting = True |
|
) |
| |
Perform follow up queries for each dataset data ID produced in
`fillDataIds`.
This method populates `_DatasetScaffolding.refs` (except for those in
`prerequisites`).
Parameters
----------
registry : `lsst.daf.butler.Registry`
Registry for the data repository; used for all data ID queries.
collections : `lsst.daf.butler.CollectionSearch`
Object representing the collections to search for input datasets.
run : `str`, optional
Name of the `~lsst.daf.butler.CollectionType.RUN` collection for
output datasets, if it already exists.
skipExisting : `bool`, optional
If `True` (default), a Quantum is not created if all its outputs
already exist in ``run``. Ignored if ``run`` is `None`.
Raises
------
OutputExistsError
Raised if an output dataset already exists in the output run
and ``skipExisting`` is `False`. The case where some but not all
of a quantum's outputs are present and ``skipExisting`` is `True`
cannot be identified at this stage, and is handled by `fillQuanta`
instead.
Definition at line 534 of file graphBuilder.py.