lsst.obs.base  tickets.DM-23835-g2f59a1585e
Public Member Functions | Public Attributes | Static Public Attributes | List of all members
lsst.obs.base.ingest.RawIngestTask Class Reference
Inheritance diagram for lsst.obs.base.ingest.RawIngestTask:

Public Member Functions

def getDatasetType (self)
 
def __init__ (self, Optional[RawIngestConfig] config=None, *Butler butler, **Any kwds)
 
RawFileData extractMetadata (self, str filename)
 
List[RawExposureDatagroupByExposure (self, Iterable[RawFileData] files)
 
RawExposureData collectDimensionRecords (self, RawExposureData exposure)
 
RawExposureData expandDataIds (self, RawExposureData data)
 
Iterator[RawExposureDataprep (self, files, Optional[Pool] pool=None, int processes=1)
 
def insertDimensionData (self, Mapping[str, List[DimensionRecord]] records)
 
List[DatasetRef] ingestExposureDatasets (self, RawExposureData exposure, Optional[Butler] butler=None)
 
def run (self, files, Optional[Pool] pool=None, int processes=1)
 

Public Attributes

 butler
 
 universe
 
 instrument
 
 camera
 
 datasetType
 

Static Public Attributes

 ConfigClass = RawIngestConfig
 

Detailed Description

Driver Task for ingesting raw data into Gen3 Butler repositories.

This Task is intended to be runnable from the command-line, but it doesn't
meet the other requirements of CmdLineTask or PipelineTask, and wouldn't
gain much from being one.  It also wouldn't really be appropriate as a
subtask of a CmdLineTask or PipelineTask; it's a Task essentially just to
leverage the logging and configurability functionality that provides.

Each instance of `RawIngestTask` writes to the same Butler.  Each
invocation of `RawIngestTask.run` ingests a list of files.

Parameters
----------
config : `RawIngestConfig`
    Configuration for the task.
butler : `~lsst.daf.butler.Butler`
    Butler instance.  Ingested Datasets will be created as part of
    ``butler.run`` and associated with its Collection.
kwds
    Additional keyword arguments are forwarded to the `lsst.pipe.base.Task`
    constructor.

Other keyword arguments are forwarded to the Task base class constructor.

Definition at line 172 of file ingest.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.obs.base.ingest.RawIngestTask.__init__ (   self,
Optional[RawIngestConfig]   config = None,
*Butler  butler,
**Any  kwds 
)

Definition at line 208 of file ingest.py.

Member Function Documentation

◆ collectDimensionRecords()

RawExposureData lsst.obs.base.ingest.RawIngestTask.collectDimensionRecords (   self,
RawExposureData  exposure 
)
Collect the `DimensionRecord` instances that must be inserted into
the `~lsst.daf.butler.Registry` before an exposure's raw files may be.

Parameters
----------
exposure : `RawExposureData`
    A structure containing information about the exposure to be
    ingested.  Should be considered consumed upon return.

Returns
-------
exposure : `RawExposureData`
    An updated version of the input structure, with
    `RawExposureData.records` populated.

Definition at line 345 of file ingest.py.

◆ expandDataIds()

RawExposureData lsst.obs.base.ingest.RawIngestTask.expandDataIds (   self,
RawExposureData  data 
)
Expand the data IDs associated with a raw exposure to include
additional metadata records.

Parameters
----------
exposure : `RawExposureData`
    A structure containing information about the exposure to be
    ingested.  Must have `RawExposureData.records` populated. Should
    be considered consumed upon return.

Returns
-------
exposure : `RawExposureData`
    An updated version of the input structure, with
    `RawExposureData.dataId` and nested `RawFileData.dataId` attributes
    containing `~lsst.daf.butler.ExpandedDataCoordinate` instances.

Definition at line 400 of file ingest.py.

◆ extractMetadata()

RawFileData lsst.obs.base.ingest.RawIngestTask.extractMetadata (   self,
str  filename 
)
Extract and process metadata from a single raw file.

Parameters
----------
filename : `str`
    Path to the file.

Returns
-------
data : `RawFileData`
    A structure containing the metadata extracted from the file,
    as well as the original filename.  All fields will be populated,
    but the `RawFileData.dataId` attribute will be a minimal
    (unexpanded) `DataCoordinate` instance.

Notes
-----
Assumes that there is a single dataset associated with the given
file.  Instruments using a single file to store multiple datasets
must implement their own version of this method.

Definition at line 220 of file ingest.py.

◆ getDatasetType()

def lsst.obs.base.ingest.RawIngestTask.getDatasetType (   self)
Return the DatasetType of the Datasets ingested by this Task.

Definition at line 202 of file ingest.py.

◆ groupByExposure()

List[RawExposureData] lsst.obs.base.ingest.RawIngestTask.groupByExposure (   self,
Iterable[RawFileData files 
)
Group an iterable of `RawFileData` by exposure.

Parameters
----------
files : iterable of `RawFileData`
    File-level information to group.

Returns
-------
exposures : `list` of `RawExposureData`
    A list of structures that group the file-level information by
    exposure.  The `RawExposureData.records` attributes of elements
    will be `None`, but all other fields will be populated.  The
    `RawExposureData.dataId` attributes will be minimal (unexpanded)
    `DataCoordinate` instances.

Definition at line 319 of file ingest.py.

◆ ingestExposureDatasets()

List[DatasetRef] lsst.obs.base.ingest.RawIngestTask.ingestExposureDatasets (   self,
RawExposureData  exposure,
Optional[Butler]   butler = None 
)
Ingest all raw files in one exposure.

Parameters
----------
exposure : `RawExposureData`
    A structure containing information about the exposure to be
    ingested.  Must have `RawExposureData.records` populated and all
    data ID attributes expanded.
butler : `lsst.daf.butler.Butler`, optional
    Butler to use for ingest.  If not provided, ``self.butler`` will
    be used.

Returns
-------
refs : `list` of `lsst.daf.butler.DatasetRef`
    Dataset references for ingested raws.

Definition at line 535 of file ingest.py.

◆ insertDimensionData()

def lsst.obs.base.ingest.RawIngestTask.insertDimensionData (   self,
Mapping[str, List[DimensionRecord]]  records 
)
Insert dimension records for one or more exposures.

Parameters
----------
records : `dict` mapping `str` to `list`
    Dimension records to be inserted, organized as a mapping from
    dimension name to a list of records for that dimension.  This
    may be a single `RawExposureData.records` dict, or an aggregate
    for multiple exposures created by concatenating the value lists
    of those dictionaries.

Returns
-------
refs : `list` of `lsst.daf.butler.DatasetRef`
    Dataset references for ingested raws.

Definition at line 502 of file ingest.py.

◆ prep()

Iterator[RawExposureData] lsst.obs.base.ingest.RawIngestTask.prep (   self,
  files,
Optional[Pool]   pool = None,
int   processes = 1 
)
Perform all ingest preprocessing steps that do not involve actually
modifying the database.

Parameters
----------
files : iterable over `str` or path-like objects
    Paths to the files to be ingested.  Will be made absolute
    if they are not already.
pool : `multiprocessing.Pool`, optional
    If not `None`, a process pool with which to parallelize some
    operations.
processes : `int`, optional
    The number of processes to use.  Ignored if ``pool`` is not `None`.

Yields
------
exposure : `RawExposureData`
    Data structures containing dimension records, filenames, and data
    IDs to be ingested (one structure for each exposure).

Definition at line 446 of file ingest.py.

◆ run()

def lsst.obs.base.ingest.RawIngestTask.run (   self,
  files,
Optional[Pool]   pool = None,
int   processes = 1 
)
Ingest files into a Butler data repository.

This creates any new exposure or visit Dimension entries needed to
identify the ingested files, creates new Dataset entries in the
Registry and finally ingests the files themselves into the Datastore.
Any needed instrument, detector, and physical_filter Dimension entries
must exist in the Registry before `run` is called.

Parameters
----------
files : iterable over `str` or path-like objects
    Paths to the files to be ingested.  Will be made absolute
    if they are not already.
pool : `multiprocessing.Pool`, optional
    If not `None`, a process pool with which to parallelize some
    operations.
processes : `int`, optional
    The number of processes to use.  Ignored if ``pool`` is not `None`.

Returns
-------
refs : `list` of `lsst.daf.butler.DatasetRef`
    Dataset references for ingested raws.

Notes
-----
This method inserts all records (dimensions and datasets) for an
exposure within a transaction, guaranteeing that partial exposures
are never ingested.

Definition at line 563 of file ingest.py.

Member Data Documentation

◆ butler

lsst.obs.base.ingest.RawIngestTask.butler

Definition at line 210 of file ingest.py.

◆ camera

lsst.obs.base.ingest.RawIngestTask.camera

Definition at line 217 of file ingest.py.

◆ ConfigClass

lsst.obs.base.ingest.RawIngestTask.ConfigClass = RawIngestConfig
static

Definition at line 198 of file ingest.py.

◆ datasetType

lsst.obs.base.ingest.RawIngestTask.datasetType

Definition at line 218 of file ingest.py.

◆ instrument

lsst.obs.base.ingest.RawIngestTask.instrument

Definition at line 212 of file ingest.py.

◆ universe

lsst.obs.base.ingest.RawIngestTask.universe

Definition at line 211 of file ingest.py.


The documentation for this class was generated from the following file: