lsst.obs.base
tickets.DM-23835-g2f59a1585e
|
Public Member Functions | |
def | getDatasetType (self) |
def | __init__ (self, Optional[RawIngestConfig] config=None, *Butler butler, **Any kwds) |
RawFileData | extractMetadata (self, str filename) |
List[RawExposureData] | groupByExposure (self, Iterable[RawFileData] files) |
RawExposureData | collectDimensionRecords (self, RawExposureData exposure) |
RawExposureData | expandDataIds (self, RawExposureData data) |
Iterator[RawExposureData] | prep (self, files, Optional[Pool] pool=None, int processes=1) |
def | insertDimensionData (self, Mapping[str, List[DimensionRecord]] records) |
List[DatasetRef] | ingestExposureDatasets (self, RawExposureData exposure, Optional[Butler] butler=None) |
def | run (self, files, Optional[Pool] pool=None, int processes=1) |
Public Attributes | |
butler | |
universe | |
instrument | |
camera | |
datasetType | |
Static Public Attributes | |
ConfigClass = RawIngestConfig | |
Driver Task for ingesting raw data into Gen3 Butler repositories. This Task is intended to be runnable from the command-line, but it doesn't meet the other requirements of CmdLineTask or PipelineTask, and wouldn't gain much from being one. It also wouldn't really be appropriate as a subtask of a CmdLineTask or PipelineTask; it's a Task essentially just to leverage the logging and configurability functionality that provides. Each instance of `RawIngestTask` writes to the same Butler. Each invocation of `RawIngestTask.run` ingests a list of files. Parameters ---------- config : `RawIngestConfig` Configuration for the task. butler : `~lsst.daf.butler.Butler` Butler instance. Ingested Datasets will be created as part of ``butler.run`` and associated with its Collection. kwds Additional keyword arguments are forwarded to the `lsst.pipe.base.Task` constructor. Other keyword arguments are forwarded to the Task base class constructor.
def lsst.obs.base.ingest.RawIngestTask.__init__ | ( | self, | |
Optional[RawIngestConfig] | config = None , |
||
*Butler | butler, | ||
**Any | kwds | ||
) |
RawExposureData lsst.obs.base.ingest.RawIngestTask.collectDimensionRecords | ( | self, | |
RawExposureData | exposure | ||
) |
Collect the `DimensionRecord` instances that must be inserted into the `~lsst.daf.butler.Registry` before an exposure's raw files may be. Parameters ---------- exposure : `RawExposureData` A structure containing information about the exposure to be ingested. Should be considered consumed upon return. Returns ------- exposure : `RawExposureData` An updated version of the input structure, with `RawExposureData.records` populated.
RawExposureData lsst.obs.base.ingest.RawIngestTask.expandDataIds | ( | self, | |
RawExposureData | data | ||
) |
Expand the data IDs associated with a raw exposure to include additional metadata records. Parameters ---------- exposure : `RawExposureData` A structure containing information about the exposure to be ingested. Must have `RawExposureData.records` populated. Should be considered consumed upon return. Returns ------- exposure : `RawExposureData` An updated version of the input structure, with `RawExposureData.dataId` and nested `RawFileData.dataId` attributes containing `~lsst.daf.butler.ExpandedDataCoordinate` instances.
RawFileData lsst.obs.base.ingest.RawIngestTask.extractMetadata | ( | self, | |
str | filename | ||
) |
Extract and process metadata from a single raw file. Parameters ---------- filename : `str` Path to the file. Returns ------- data : `RawFileData` A structure containing the metadata extracted from the file, as well as the original filename. All fields will be populated, but the `RawFileData.dataId` attribute will be a minimal (unexpanded) `DataCoordinate` instance. Notes ----- Assumes that there is a single dataset associated with the given file. Instruments using a single file to store multiple datasets must implement their own version of this method.
def lsst.obs.base.ingest.RawIngestTask.getDatasetType | ( | self | ) |
List[RawExposureData] lsst.obs.base.ingest.RawIngestTask.groupByExposure | ( | self, | |
Iterable[RawFileData] | files | ||
) |
Group an iterable of `RawFileData` by exposure. Parameters ---------- files : iterable of `RawFileData` File-level information to group. Returns ------- exposures : `list` of `RawExposureData` A list of structures that group the file-level information by exposure. The `RawExposureData.records` attributes of elements will be `None`, but all other fields will be populated. The `RawExposureData.dataId` attributes will be minimal (unexpanded) `DataCoordinate` instances.
List[DatasetRef] lsst.obs.base.ingest.RawIngestTask.ingestExposureDatasets | ( | self, | |
RawExposureData | exposure, | ||
Optional[Butler] | butler = None |
||
) |
Ingest all raw files in one exposure. Parameters ---------- exposure : `RawExposureData` A structure containing information about the exposure to be ingested. Must have `RawExposureData.records` populated and all data ID attributes expanded. butler : `lsst.daf.butler.Butler`, optional Butler to use for ingest. If not provided, ``self.butler`` will be used. Returns ------- refs : `list` of `lsst.daf.butler.DatasetRef` Dataset references for ingested raws.
def lsst.obs.base.ingest.RawIngestTask.insertDimensionData | ( | self, | |
Mapping[str, List[DimensionRecord]] | records | ||
) |
Insert dimension records for one or more exposures. Parameters ---------- records : `dict` mapping `str` to `list` Dimension records to be inserted, organized as a mapping from dimension name to a list of records for that dimension. This may be a single `RawExposureData.records` dict, or an aggregate for multiple exposures created by concatenating the value lists of those dictionaries. Returns ------- refs : `list` of `lsst.daf.butler.DatasetRef` Dataset references for ingested raws.
Iterator[RawExposureData] lsst.obs.base.ingest.RawIngestTask.prep | ( | self, | |
files, | |||
Optional[Pool] | pool = None , |
||
int | processes = 1 |
||
) |
Perform all ingest preprocessing steps that do not involve actually modifying the database. Parameters ---------- files : iterable over `str` or path-like objects Paths to the files to be ingested. Will be made absolute if they are not already. pool : `multiprocessing.Pool`, optional If not `None`, a process pool with which to parallelize some operations. processes : `int`, optional The number of processes to use. Ignored if ``pool`` is not `None`. Yields ------ exposure : `RawExposureData` Data structures containing dimension records, filenames, and data IDs to be ingested (one structure for each exposure).
def lsst.obs.base.ingest.RawIngestTask.run | ( | self, | |
files, | |||
Optional[Pool] | pool = None , |
||
int | processes = 1 |
||
) |
Ingest files into a Butler data repository. This creates any new exposure or visit Dimension entries needed to identify the ingested files, creates new Dataset entries in the Registry and finally ingests the files themselves into the Datastore. Any needed instrument, detector, and physical_filter Dimension entries must exist in the Registry before `run` is called. Parameters ---------- files : iterable over `str` or path-like objects Paths to the files to be ingested. Will be made absolute if they are not already. pool : `multiprocessing.Pool`, optional If not `None`, a process pool with which to parallelize some operations. processes : `int`, optional The number of processes to use. Ignored if ``pool`` is not `None`. Returns ------- refs : `list` of `lsst.daf.butler.DatasetRef` Dataset references for ingested raws. Notes ----- This method inserts all records (dimensions and datasets) for an exposure within a transaction, guaranteeing that partial exposures are never ingested.
|
static |