lsst.dax.apdb g4122c88930+b1b0d3f0f8
|
Public Member Functions | |
__init__ (self, ApdbCassandraConfig config) | |
None | __del__ (self) |
Optional[Table] | tableDef (self, ApdbTables table) |
None | makeSchema (self, bool drop=False) |
pandas.DataFrame | getDiaObjects (self, sphgeom.Region region) |
Optional[pandas.DataFrame] | getDiaSources (self, sphgeom.Region region, Optional[Iterable[int]] object_ids, dafBase.DateTime visit_time) |
Optional[pandas.DataFrame] | getDiaForcedSources (self, sphgeom.Region region, Optional[Iterable[int]] object_ids, dafBase.DateTime visit_time) |
list[ApdbInsertId]|None | getInsertIds (self) |
None | deleteInsertIds (self, Iterable[ApdbInsertId] ids) |
ApdbTableData | getDiaObjectsHistory (self, Iterable[ApdbInsertId] ids) |
ApdbTableData | getDiaSourcesHistory (self, Iterable[ApdbInsertId] ids) |
ApdbTableData | getDiaForcedSourcesHistory (self, Iterable[ApdbInsertId] ids) |
pandas.DataFrame | getSSObjects (self) |
None | store (self, dafBase.DateTime visit_time, pandas.DataFrame objects, Optional[pandas.DataFrame] sources=None, Optional[pandas.DataFrame] forced_sources=None) |
None | storeSSObjects (self, pandas.DataFrame objects) |
None | reassignDiaSources (self, Mapping[int, int] idMap) |
None | dailyJob (self) |
int | countUnassociatedObjects (self) |
![]() | |
ConfigurableField | makeField (cls, str doc) |
Public Attributes | |
config | |
Static Public Attributes | |
partition_zero_epoch = dafBase.DateTime(1970, 1, 1, 0, 0, 0, dafBase.DateTime.TAI) | |
![]() | |
ConfigClass = ApdbConfig | |
Protected Member Functions | |
AuthProvider|None | _make_auth_provider (self, ApdbCassandraConfig config) |
Mapping[Any, ExecutionProfile] | _makeProfiles (self, ApdbCassandraConfig config) |
pandas.DataFrame | _getSources (self, sphgeom.Region region, Optional[Iterable[int]] object_ids, float mjd_start, float mjd_end, ApdbTables table_name) |
ApdbTableData | _get_history (self, ExtraTables table, Iterable[ApdbInsertId] ids) |
None | _storeInsertId (self, ApdbInsertId insert_id, dafBase.DateTime visit_time) |
None | _storeDiaObjects (self, pandas.DataFrame objs, dafBase.DateTime visit_time, ApdbInsertId|None insert_id) |
None | _storeDiaSources (self, ApdbTables table_name, pandas.DataFrame sources, dafBase.DateTime visit_time, ApdbInsertId|None insert_id) |
None | _storeDiaSourcesPartitions (self, pandas.DataFrame sources, dafBase.DateTime visit_time, ApdbInsertId|None insert_id) |
None | _storeObjectsPandas (self, pandas.DataFrame records, Union[ApdbTables, ExtraTables] table_name, Optional[Mapping] extra_columns=None, Optional[int] time_part=None) |
pandas.DataFrame | _add_obj_part (self, pandas.DataFrame df) |
pandas.DataFrame | _add_src_part (self, pandas.DataFrame sources, pandas.DataFrame objs) |
pandas.DataFrame | _add_fsrc_part (self, pandas.DataFrame sources, pandas.DataFrame objs) |
int | _time_partition (self, Union[float, dafBase.DateTime] time) |
pandas.DataFrame | _make_empty_catalog (self, ApdbTables table_name) |
cassandra.query.PreparedStatement | _prep_statement (self, str query) |
Iterator[Tuple[cassandra.query.Statement, Tuple]] | _combine_where (self, str prefix, List[Tuple[str, Tuple]] where1, List[Tuple[str, Tuple]] where2, Optional[str] suffix=None) |
List[Tuple[str, Tuple]] | _spatial_where (self, Optional[sphgeom.Region] region, bool use_ranges=False) |
Tuple[List[str], List[Tuple[str, Tuple]]] | _temporal_where (self, ApdbTables table, Union[float, dafBase.DateTime] start_time, Union[float, dafBase.DateTime] end_time, Optional[bool] query_per_time_part=None) |
Protected Attributes | |
_pixelization | |
_keyspace | |
_cluster | |
_session | |
_schema | |
_partition_zero_epoch_mjd | |
Implementation of APDB database on to of Apache Cassandra. The implementation is configured via standard ``pex_config`` mechanism using `ApdbCassandraConfig` configuration class. For an example of different configurations check config/ folder. Parameters ---------- config : `ApdbCassandraConfig` Configuration object.
|
protected |
Add apdb_part column to DiaForcedSource catalog. Notes ----- This method copies apdb_part value from a matching DiaObject record. DiaObject catalog needs to have a apdb_part column filled by ``_add_obj_part`` method and DiaSource records need to be associated to DiaObjects via ``diaObjectId`` column. This overrides any existing column in a DataFrame with the same name (apdb_part). Original DataFrame is not changed, copy of a DataFrame is returned.
|
protected |
Calculate spatial partition for each record and add it to a DataFrame. Notes ----- This overrides any existing column in a DataFrame with the same name (apdb_part). Original DataFrame is not changed, copy of a DataFrame is returned.
|
protected |
Add apdb_part column to DiaSource catalog. Notes ----- This method copies apdb_part value from a matching DiaObject record. DiaObject catalog needs to have a apdb_part column filled by ``_add_obj_part`` method and DiaSource records need to be associated to DiaObjects via ``diaObjectId`` column. This overrides any existing column in a DataFrame with the same name (apdb_part). Original DataFrame is not changed, copy of a DataFrame is returned.
|
protected |
Make cartesian product of two parts of WHERE clause into a series of statements to execute. Parameters ---------- prefix : `str` Initial statement prefix that comes before WHERE clause, e.g. "SELECT * from Table"
|
protected |
Return records from a particular table given set of insert IDs.
|
protected |
Return catalog of DiaSource instances given set of DiaObject IDs. Parameters ---------- region : `lsst.sphgeom.Region` Spherical region. object_ids : Collection of DiaObject IDs mjd_start : `float` Lower bound of time interval. mjd_end : `float` Upper bound of time interval. table_name : `ApdbTables` Name of the table. Returns ------- catalog : `pandas.DataFrame`, or `None` Catalog containing DiaSource records. Empty catalog is returned if ``object_ids`` is empty.
|
protected |
Make Cassandra authentication provider instance.
|
protected |
Make an empty catalog for a table with a given name. Parameters ---------- table_name : `ApdbTables` Name of the table. Returns ------- catalog : `pandas.DataFrame` An empty catalog.
|
protected |
Make all execution profiles used in the code.
|
protected |
Convert query string into prepared statement.
|
protected |
Generate expressions for spatial part of WHERE clause. Parameters ---------- region : `sphgeom.Region` Spatial region for query results. use_ranges : `bool` If True then use pixel ranges ("apdb_part >= p1 AND apdb_part <= p2") instead of exact list of pixels. Should be set to True for large regions covering very many pixels. Returns ------- expressions : `list` [ `tuple` ] Empty list is returned if ``region`` is `None`, otherwise a list of one or more (expression, parameters) tuples
|
protected |
Store catalog of DiaObjects from current visit. Parameters ---------- objs : `pandas.DataFrame` Catalog with DiaObject records visit_time : `lsst.daf.base.DateTime` Time of the current visit.
|
protected |
Store catalog of DIASources or DIAForcedSources from current visit. Parameters ---------- sources : `pandas.DataFrame` Catalog containing DiaSource records visit_time : `lsst.daf.base.DateTime` Time of the current visit.
|
protected |
Store mapping of diaSourceId to its partitioning values. Parameters ---------- sources : `pandas.DataFrame` Catalog containing DiaSource records visit_time : `lsst.daf.base.DateTime` Time of the current visit.
|
protected |
Store generic objects. Takes Pandas catalog and stores a bunch of records in a table. Parameters ---------- records : `pandas.DataFrame` Catalog containing object records table_name : `ApdbTables` Name of the table as defined in APDB schema. extra_columns : `dict`, optional Mapping (column_name, column_value) which gives fixed values for columns in each row, overrides values in ``records`` if matching columns exist there. time_part : `int`, optional If not `None` then insert into a per-partition table. Notes ----- If Pandas catalog contains additional columns not defined in table schema they are ignored. Catalog does not have to contain all columns defined in a table, but partition and clustering keys must be present in a catalog or ``extra_columns``.
|
protected |
Generate table names and expressions for temporal part of WHERE clauses. Parameters ---------- table : `ApdbTables` Table to select from. start_time : `dafBase.DateTime` or `float` Starting Datetime of MJD value of the time range. start_time : `dafBase.DateTime` or `float` Starting Datetime of MJD value of the time range. query_per_time_part : `bool`, optional If None then use ``query_per_time_part`` from configuration. Returns ------- tables : `list` [ `str` ] List of the table names to query. expressions : `list` [ `tuple` ] A list of zero or more (expression, parameters) tuples.
|
protected |
Calculate time partiton number for a given time. Parameters ---------- time : `float` or `lsst.daf.base.DateTime` Time for which to calculate partition number. Can be float to mean MJD or `lsst.daf.base.DateTime` Returns ------- partition : `int` Partition number for a given time.
int lsst.dax.apdb.apdbCassandra.ApdbCassandra.countUnassociatedObjects | ( | self | ) |
Return the number of DiaObjects that have only one DiaSource associated with them. Used as part of ap_verify metrics. Returns ------- count : `int` Number of DiaObjects with exactly one associated DiaSource. Notes ----- This method can be very inefficient or slow in some implementations.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
None lsst.dax.apdb.apdbCassandra.ApdbCassandra.dailyJob | ( | self | ) |
Implement daily activities like cleanup/vacuum. What should be done during daily activities is determined by specific implementation.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
None lsst.dax.apdb.apdbCassandra.ApdbCassandra.deleteInsertIds | ( | self, | |
Iterable[ApdbInsertId] | ids | ||
) |
Remove insert identifiers from the database. Parameters ---------- ids : `iterable` [`ApdbInsertId`] Insert identifiers, can include items returned from `getInsertIds`. Notes ----- This method causes Apdb to forget about specified identifiers. If there are any auxiliary data associated with the identifiers, it is also removed from database (but data in regular tables is not removed). This method should be called after successful transfer of data from APDB to PPDB to free space used by history.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
Optional[pandas.DataFrame] lsst.dax.apdb.apdbCassandra.ApdbCassandra.getDiaForcedSources | ( | self, | |
sphgeom.Region | region, | ||
Optional[Iterable[int]] | object_ids, | ||
dafBase.DateTime | visit_time | ||
) |
Return catalog of DiaForcedSource instances from a given region. Parameters ---------- region : `lsst.sphgeom.Region` Region to search for DIASources. object_ids : iterable [ `int` ], optional List of DiaObject IDs to further constrain the set of returned sources. If list is empty then empty catalog is returned with a correct schema. If `None` then returned sources are not constrained. Some implementations may not support latter case. visit_time : `lsst.daf.base.DateTime` Time of the current visit. Returns ------- catalog : `pandas.DataFrame`, or `None` Catalog containing DiaSource records. `None` is returned if ``read_forced_sources_months`` configuration parameter is set to 0. Raises ------ NotImplementedError May be raised by some implementations if ``object_ids`` is `None`. Notes ----- This method returns DiaForcedSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by ``read_forced_sources_months`` config parameter, w.r.t. ``visit_time``. If ``object_ids`` is empty then an empty catalog is always returned with the correct schema (columns/types). If ``object_ids`` is `None` then no filtering is performed and some of the returned records may be outside the specified region.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
ApdbTableData lsst.dax.apdb.apdbCassandra.ApdbCassandra.getDiaForcedSourcesHistory | ( | self, | |
Iterable[ApdbInsertId] | ids | ||
) |
Return catalog of DiaForcedSource instances from a given time period. Parameters ---------- ids : `iterable` [`ApdbInsertId`] Insert identifiers, can include items returned from `getInsertIds`. Returns ------- data : `ApdbTableData` Catalog containing DiaForcedSource records. In addition to all regular columns it will contain ``insert_id`` column. Notes ----- This part of API may not be very stable and can change before the implementation finalizes.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
pandas.DataFrame lsst.dax.apdb.apdbCassandra.ApdbCassandra.getDiaObjects | ( | self, | |
sphgeom.Region | region | ||
) |
Return catalog of DiaObject instances from a given region. This method returns only the last version of each DiaObject. Some records in a returned catalog may be outside the specified region, it is up to a client to ignore those records or cleanup the catalog before futher use. Parameters ---------- region : `lsst.sphgeom.Region` Region to search for DIAObjects. Returns ------- catalog : `pandas.DataFrame` Catalog containing DiaObject records for a region that may be a superset of the specified region.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
ApdbTableData lsst.dax.apdb.apdbCassandra.ApdbCassandra.getDiaObjectsHistory | ( | self, | |
Iterable[ApdbInsertId] | ids | ||
) |
Return catalog of DiaObject instances from a given time period including the history of each DiaObject. Parameters ---------- ids : `iterable` [`ApdbInsertId`] Insert identifiers, can include items returned from `getInsertIds`. Returns ------- data : `ApdbTableData` Catalog containing DiaObject records. In addition to all regular columns it will contain ``insert_id`` column. Notes ----- This part of API may not be very stable and can change before the implementation finalizes.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
Optional[pandas.DataFrame] lsst.dax.apdb.apdbCassandra.ApdbCassandra.getDiaSources | ( | self, | |
sphgeom.Region | region, | ||
Optional[Iterable[int]] | object_ids, | ||
dafBase.DateTime | visit_time | ||
) |
Return catalog of DiaSource instances from a given region. Parameters ---------- region : `lsst.sphgeom.Region` Region to search for DIASources. object_ids : iterable [ `int` ], optional List of DiaObject IDs to further constrain the set of returned sources. If `None` then returned sources are not constrained. If list is empty then empty catalog is returned with a correct schema. visit_time : `lsst.daf.base.DateTime` Time of the current visit. Returns ------- catalog : `pandas.DataFrame`, or `None` Catalog containing DiaSource records. `None` is returned if ``read_sources_months`` configuration parameter is set to 0. Notes ----- This method returns DiaSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by ``read_sources_months`` config parameter, w.r.t. ``visit_time``. If ``object_ids`` is empty then an empty catalog is always returned with the correct schema (columns/types). If ``object_ids`` is `None` then no filtering is performed and some of the returned records may be outside the specified region.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
ApdbTableData lsst.dax.apdb.apdbCassandra.ApdbCassandra.getDiaSourcesHistory | ( | self, | |
Iterable[ApdbInsertId] | ids | ||
) |
Return catalog of DiaSource instances from a given time period. Parameters ---------- ids : `iterable` [`ApdbInsertId`] Insert identifiers, can include items returned from `getInsertIds`. Returns ------- data : `ApdbTableData` Catalog containing DiaSource records. In addition to all regular columns it will contain ``insert_id`` column. Notes ----- This part of API may not be very stable and can change before the implementation finalizes.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
list[ApdbInsertId] | None lsst.dax.apdb.apdbCassandra.ApdbCassandra.getInsertIds | ( | self | ) |
Return collection of insert identifiers known to the database. Returns ------- ids : `list` [`ApdbInsertId`] or `None` List of identifiers, they may be time-ordered if database supports ordering. `None` is returned if database is not configured to store insert identifiers.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
pandas.DataFrame lsst.dax.apdb.apdbCassandra.ApdbCassandra.getSSObjects | ( | self | ) |
Return catalog of SSObject instances. Returns ------- catalog : `pandas.DataFrame` Catalog containing SSObject records, all existing records are returned.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
None lsst.dax.apdb.apdbCassandra.ApdbCassandra.makeSchema | ( | self, | |
bool | drop = False |
||
) |
Create or re-create whole database schema. Parameters ---------- drop : `bool` If True then drop all tables before creating new ones.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
None lsst.dax.apdb.apdbCassandra.ApdbCassandra.reassignDiaSources | ( | self, | |
Mapping[int, int] | idMap | ||
) |
Associate DiaSources with SSObjects, dis-associating them from DiaObjects. Parameters ---------- idMap : `Mapping` Maps DiaSource IDs to their new SSObject IDs. Raises ------ ValueError Raised if DiaSource ID does not exist in the database.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
None lsst.dax.apdb.apdbCassandra.ApdbCassandra.store | ( | self, | |
dafBase.DateTime | visit_time, | ||
pandas.DataFrame | objects, | ||
Optional[pandas.DataFrame] | sources = None , |
||
Optional[pandas.DataFrame] | forced_sources = None |
||
) |
Store all three types of catalogs in the database. Parameters ---------- visit_time : `lsst.daf.base.DateTime` Time of the visit. objects : `pandas.DataFrame` Catalog with DiaObject records. sources : `pandas.DataFrame`, optional Catalog with DiaSource records. forced_sources : `pandas.DataFrame`, optional Catalog with DiaForcedSource records. Notes ----- This methods takes DataFrame catalogs, their schema must be compatible with the schema of APDB table: - column names must correspond to database table columns - types and units of the columns must match database definitions, no unit conversion is performed presently - columns that have default values in database schema can be omitted from catalog - this method knows how to fill interval-related columns of DiaObject (validityStart, validityEnd) they do not need to appear in a catalog - source catalogs have ``diaObjectId`` column associating sources with objects
Reimplemented from lsst.dax.apdb.apdb.Apdb.
None lsst.dax.apdb.apdbCassandra.ApdbCassandra.storeSSObjects | ( | self, | |
pandas.DataFrame | objects | ||
) |
Store or update SSObject catalog. Parameters ---------- objects : `pandas.DataFrame` Catalog with SSObject records. Notes ----- If SSObjects with matching IDs already exist in the database, their records will be updated with the information from provided records.
Reimplemented from lsst.dax.apdb.apdb.Apdb.
Optional[Table] lsst.dax.apdb.apdbCassandra.ApdbCassandra.tableDef | ( | self, | |
ApdbTables | table | ||
) |
Return table schema definition for a given table. Parameters ---------- table : `ApdbTables` One of the known APDB tables. Returns ------- tableSchema : `felis.simple.Table` or `None` Table schema description, `None` is returned if table is not defined by this implementation.
Reimplemented from lsst.dax.apdb.apdb.Apdb.