lsst.daf.persistence
13.0-17-gd5d205a+2
|
Public Member Functions | |
def | __init__ |
def | __repr__ |
def | defineAlias |
def | getKeys |
def | queryMetadata |
def | datasetExists |
def | get |
def | put |
def | subset |
def | dataRef |
def | __reduce__ |
Static Public Member Functions | |
def | getMapperClass |
Public Attributes | |
log | |
datasetTypeAliasDict | |
storage | |
persistence | |
Butler provides a generic mechanism for persisting and retrieving data using mappers. A Butler manages a collection of datasets known as a repository. Each dataset has a type representing its intended usage and a location. Note that the dataset type is not the same as the C++ or Python type of the object containing the data. For example, an ExposureF object might be used to hold the data for a raw image, a post-ISR image, a calibrated science image, or a difference image. These would all be different dataset types. A Butler can produce a collection of possible values for a key (or tuples of values for multiple keys) if given a partial data identifier. It can check for the existence of a file containing a dataset given its type and data identifier. The Butler can then retrieve the dataset. Similarly, it can persist an object to an appropriate location when given its associated data identifier. Note that the Butler has two more advanced features when retrieving a data set. First, the retrieval is lazy. Input does not occur until the data set is actually accessed. This allows datasets to be retrieved and placed on a clipboard prospectively with little cost, even if the algorithm of a stage ends up not using them. Second, the Butler will call a standardization hook upon retrieval of the dataset. This function, contained in the input mapper object, must perform any necessary manipulations to force the retrieved object to conform to standards, including translating metadata. Public methods: __init__(self, root, mapper=None, **mapperArgs) defineAlias(self, alias, datasetType) getKeys(self, datasetType=None, level=None) queryMetadata(self, datasetType, format=None, dataId={}, **rest) datasetExists(self, datasetType, dataId={}, **rest) get(self, datasetType, dataId={}, immediate=False, **rest) put(self, obj, datasetType, dataId={}, **rest) subset(self, datasetType, level=None, dataId={}, **rest) dataRef(self, datasetType, level=None, dataId={}, **rest) Initialization: The preferred method of initialization is to pass in a RepositoryArgs instance, or a list of RepositoryArgs to inputs and/or outputs. For backward compatibility: this initialization method signature can take a posix root path, and optionally a mapper class instance or class type that will be instantiated using the mapperArgs input argument. However, for this to work in a backward compatible way it creates a single repository that is used as both an input and an output repository. This is NOT preferred, and will likely break any provenance system we have in place. Parameters ---------- root - string .. note:: Deprecated in 12_0 `root` will be removed in TBD, it is replaced by `inputs` and `outputs` for multiple-repository support. A fileysystem path. Will only work with a PosixRepository. mapper - string or instance .. note:: Deprecated in 12_0 `mapper` will be removed in TBD, it is replaced by `inputs` and `outputs` for multiple-repository support. Provides a mapper to be used with Butler. mapperArgs - dict .. note:: Deprecated in 12_0 `mapperArgs` will be removed in TBD, it is replaced by `inputs` and `outputs` for multiple-repository support. Provides arguments to be passed to the mapper if the mapper input arg is a class type to be instantiated by Butler. inputs - RepositoryArgs or string Can be a single item or a list. Provides arguments to load an existing repository (or repositories). String is assumed to be a URI and is used as the cfgRoot (URI to the location of the cfg file). (Local file system URI does not have to start with 'file://' and in this way can be a relative path). outputs - RepositoryArg or string Can be a single item or a list. Provides arguments to load one or more existing repositories or create new ones. String is assumed to be a URI and as used as the repository root.
def lsst.daf.persistence.butler.Butler.__init__ | ( | self, | |
root = None , |
|||
mapper = None , |
|||
inputs = None , |
|||
outputs = None , |
|||
mapperArgs | |||
) |
def lsst.daf.persistence.butler.Butler.__reduce__ | ( | self | ) |
def lsst.daf.persistence.butler.Butler.dataRef | ( | self, | |
datasetType, | |||
level = None , |
|||
dataId = {} , |
|||
rest | |||
) |
Returns a single ButlerDataRef. Given a complete dataId specified in dataId and **rest, find the unique dataset at the given level specified by a dataId key (e.g. visit or sensor or amp for a camera) and return a ButlerDataRef. Parameters ---------- datasetType - str The type of dataset collection to reference level - str The level of dataId at which to reference dataId - dict The data id. **rest Keyword arguments for the data id. Returns ------- dataRef - ButlerDataRef ButlerDataRef for dataset matching the data id
def lsst.daf.persistence.butler.Butler.datasetExists | ( | self, | |
datasetType, | |||
dataId = {} , |
|||
rest | |||
) |
Determines if a dataset file exists. Parameters ---------- datasetType - str The type of dataset to inquire about. dataId - DataId, dict The data id of the dataset. **rest keyword arguments for the data id. Returns ------- exists - bool True if the dataset exists or is non-file-based.
def lsst.daf.persistence.butler.Butler.defineAlias | ( | self, | |
alias, | |||
datasetType | |||
) |
Register an alias that will be substituted in datasetTypes. Paramters --------- alias - str The alias keyword. It may start with @ or not. It may not contain @ except as the first character. datasetType - str The string that will be substituted when @alias is passed into datasetType. It may not contain '@'
def lsst.daf.persistence.butler.Butler.get | ( | self, | |
datasetType, | |||
dataId = None , |
|||
immediate = True , |
|||
rest | |||
) |
Retrieves a dataset given an input collection data id. Parameters ---------- datasetType - str The type of dataset to retrieve. dataId - dict The data id. immediate - bool If False use a proxy for delayed loading. **rest keyword arguments for the data id. Returns ------- An object retrieved from the dataset (or a proxy for one).
def lsst.daf.persistence.butler.Butler.getKeys | ( | self, | |
datasetType = None , |
|||
level = None , |
|||
tag = None |
|||
) |
Get the valid data id keys at or above the given level of hierarchy for the dataset type or the entire collection if None. The dict values are the basic Python types corresponding to the keys (int, float, str). Parameters ---------- datasetType - str The type of dataset to get keys for, entire collection if None. level - str The hierarchy level to descend to. None if it should not be restricted. Use an empty string if the mapper should lookup the default level. tags - any, or list of any Any object that can be tested to be the same as the tag in a dataId passed into butler input functions. Applies only to input repositories: If tag is specified by the dataId then the repo will only be read from used if the tag in the dataId matches a tag used for that repository. Returns ------- Returns a dict. The dict keys are the valid data id keys at or above the given level of hierarchy for the dataset type or the entire collection if None. The dict values are the basic Python types corresponding to the keys (int, float, str).
|
static |
posix-only; gets the mapper class at the path specifed by root (if a file _mapper can be found at that location or in a parent location. As we abstract the storage and support different types of storage locations this method will be moved entirely into Butler Access, or made more dynamic, and the API will very likely change.
def lsst.daf.persistence.butler.Butler.put | ( | self, | |
obj, | |||
datasetType, | |||
dataId = {} , |
|||
doBackup = False , |
|||
rest | |||
) |
Persists a dataset given an output collection data id. Parameters ---------- obj - The object to persist. datasetType - str The type of dataset to persist. dataId - dict The data id. doBackup - bool If True, rename existing instead of overwriting. WARNING: Setting doBackup=True is not safe for parallel processing, as it may be subject to race conditions. **rest Keyword arguments for the data id.
def lsst.daf.persistence.butler.Butler.queryMetadata | ( | self, | |
datasetType, | |||
format, | |||
dataId = {} , |
|||
rest | |||
) |
Returns the valid values for one or more keys when given a partial input collection data id. Parameters ---------- datasetType - str The type of dataset to inquire about. format - str, tuple Key or tuple of keys to be returned. dataId - DataId, dict The partial data id. **rest - Keyword arguments for the partial data id. Returns ------- A list of valid values or tuples of valid values as specified by the format.
def lsst.daf.persistence.butler.Butler.subset | ( | self, | |
datasetType, | |||
level = None , |
|||
dataId = {} , |
|||
rest | |||
) |
Return complete dataIds for a dataset type that match a partial (or empty) dataId. Given a partial (or empty) dataId specified in dataId and **rest, find all datasets that match the dataId. Optionally restrict the results to a given level specified by a dataId key (e.g. visit or sensor or amp for a camera). Return an iterable collection of complete dataIds as ButlerDataRefs. Datasets with the resulting dataIds may not exist; that needs to be tested with datasetExists(). Parameters ---------- datasetType - str The type of dataset collection to subset level - str The level of dataId at which to subset. Use an empty string if the mapper should look up the default level. dataId - dict The data id. **rest Keyword arguments for the data id. Returns ------- subset - ButlerSubset Collection of ButlerDataRefs for datasets matching the data id. Examples ----------- To print the full dataIds for all r-band measurements in a source catalog (note that the subset call is equivalent to: `butler.subset('src', dataId={'filter':'r'})`): >>> subset = butler.subset('src', filter='r') >>> for data_ref in subset: print(data_ref.dataId)