Coverage for python/lsst/daf/butler/registries/sqlPreFlight.py : 71%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
# This file is part of daf_butler. # # Developed for the LSST Data Management System. # This product includes software developed by the LSST Project # (http://www.lsst.org). # See the COPYRIGHT file at the top-level directory of this distribution # for details of code ownership. # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>.
"""Recursively scan units and their optional dependencies, return their names"""
"""Filter out DataUnitJoins that summarize other DataUnitJoins.
Parameters ---------- dataUnitJoins : iterable of `DataUnitJoin`
Returns ------- Iterator for DataUnitJoin which do not summarize any of the DataUnitJoins in the input set. """ # If it summarizes some other joins and all those joins are in the # set of joins then we do not need it. continue
"""Filter result rows that have non-overlapping regions.
Result set generated by query in selectDataUnits() method can include set of regions in each row (encoded as bytes). Due to pixel-based matching some regions may not overlap, this generator method filters rows that have disjoint regions. If result row contains more than two regions (this should not happen with our current schema) then row is filtered if any of two regions are disjoint.
Parameters ---------- rowIter : iterable Iterator for rows returned by the query on registry firstRegionIndex : `int` or ``None`` If not ``None`` then this is the starting position of the regions in the row, all columns starting with this position contain region data. All regions are encoded as bytes. """ count = 0 for row in rowIter: total += 1 regions = [Region.decode(region) for region in row[firstRegionIndex:]] for reg1, reg2 in itertools.combinations(regions, 2): if reg1.relate(reg2) == DISJOINT: break else: count += 1 yield tuple(row[:firstRegionIndex]) _LOG.debug("Total %d rows in result set, %d after region filtering", total, count) else:
"""Class implementing part of preflight solver which extracts units data from registry.
This is an implementation detail only to be used by SqlRegistry class, not supposed to be used anywhere else.
Parameters ---------- schema : `Schema` Schema instance connection : `sqlalchmey.Connection` Connection to use for database access. """
"""Evaluate a filter expression and lists of `DatasetTypes <DatasetType>` and return a set of data unit values.
Returned set consists of combinations of units participating in data transformation from ``neededDatasetTypes`` to ``futureDatasetTypes``, restricted by existing data and filter expression.
Parameters ---------- collections : `list` of `str` An ordered `list` of collections indicating the Collections to search for Datasets. expr : `str` An expression that limits the `DataUnits <DataUnit>` and (indirectly) the Datasets returned. neededDatasetTypes : `list` of `DatasetType` The `list` of `DatasetTypes <DatasetType>` whose DataUnits will be included in the returned column set. Output is limited by the the Datasets of these DatasetTypes which already exist in the registry. futureDatasetTypes : `list` of `DatasetType` The `list` of `DatasetTypes <DatasetType>` whose DataUnits will be included in the returned column set. It is expected that Datasets for these DatasetTypes do not exist in the registry, but presently this is not checked.
Returns ------- header : `tuple` of `tuple` Length of tuple equals the number of columns in the returned result set. Each item is a tuple with two elements - DataUnit name (e.g. "Visit") and unit value name (e.g. "visit"). rows : iterable of `tuple` Result set, this can be a single-pass iterator. Each tuple contains unit values corresponding to units in a header. """
# for now only a single collection is supported raise ValueError("Only single collection is supported by makeDataGraph()")
# Collect unit names in both input and output dataset types
# Build select column list # take link column names, usually there is one
# Extend units set with the "optional" superset from schema, so that # joins work correctly. This may bring more tables into query than # really needed, potential for optimization.
# All DataUnit instances in a subset that we need
# joins for all unit tables continue
# join with tables that we depend upon
# joins between skymap and camera units if dataUnitJoin.lhs.issubset(allUnitNames) and dataUnitJoin.rhs.issubset(allUnitNames)]
# only use most specific joins
# Some `DataUnitJoin`s have an associated region (e.g. they are spatial) # in that case they shouldn't be joined separately in the region lookup.
# TODO: do not know yet how to handle MultiCameraExposureJoin, # skip it for now
# Look at each side of the DataUnitJoin and join it with # corresponding DataUnit tables, including making all necessary # joins for special multi-DataUnit region table(s). for connection in (dataUnitJoin.lhs, dataUnitJoin.rhs): regionHolder = self._schema.dataUnits.getRegionHolder(*connection) if len(connection) > 1: # if one of the joins is with Visit/Sensor then also bring # VisitSensorRegion table in and join it with the units if regionHolder.name in joinedRegionTables: _LOG.debug("region table already joined with units: %s", regionHolder.name) else: _LOG.debug("joining region table with units: %s", regionHolder.name) joinedRegionTables.add(regionHolder.name)
for dataUnitName in connection: dataUnit = self._schema.dataUnits[dataUnitName] _LOG.debug(" joining region table with %s", dataUnitName) for name, col in dataUnit.primaryKeyColumns.items(): _LOG.debug(" joining on column: %s", name) where.append(regionHolder.table.c[name] == col)
# now join region table with join table using PKs of all units _LOG.debug("join %s with %s", dataUnitJoin.name, connection) for colName in self._schema.dataUnits.getPrimaryKeyNames(connection): _LOG.debug(" joining on column: %s", colName) where.append(dataUnitJoin.table.c[colName] == regionHolder.table.c[colName])
# We also have to include regions from each side of the join # into resultset so that we can filter-out non-overlapping # regions. firstRegionIndex = len(header) selectColumns.append(regionHolder.regionColumn)
# join with input datasets to restrict to existing inputs
dsAlias.c["dataset_type_name"] == dsType.name, dsCollAlias.c["collection"] == collection]
# build full query # TODO: potentially transform query from user-friendly expression
# execute and return header and result iterator |