Define and execute a calculation on a ParquetTable
The `__call__` method accepts a `ParquetTable` object, and returns the
result of the calculation as a single column. Each functor defines what
columns are needed for the calculation, and only these columns are read
from the `ParquetTable`.
The action of `__call__` consists of two steps: first, loading the
necessary columns from disk into memory as a `pandas.DataFrame` object;
and second, performing the computation on this dataframe and returning the
result.
To define a new `Functor`, a subclass must define a `_func` method,
that takes a `pandas.DataFrame` and returns result in a `pandas.Series`.
In addition, it must define the following attributes
* `_columns`: The columns necessary to perform the calculation
* `name`: A name appropriate for a figure axis label
* `shortname`: A name appropriate for use as a dictionary key
On initialization, a `Functor` should declare what filter (`filt` kwarg)
and dataset (e.g. `'ref'`, `'meas'`, `'forced_src'`) it is intended to be
applied to. This enables the `_get_cols` method to extract the proper
columns from the parquet file. If not specified, the dataset will fall back
on the `_defaultDataset`attribute. If filter is not specified and `dataset`
is anything other than `'ref'`, then an error will be raised when trying to
perform the calculation.
As currently implemented, `Functor` is only set up to expect a
`ParquetTable` of the format of the `deepCoadd_obj` dataset; that is, a
`MultilevelParquetTable` with the levels of the column index being `filter`,
`dataset`, and `column`. This is defined in the `_columnLevels` attribute,
as well as being implicit in the role of the `filt` and `dataset` attributes
defined at initialization. In addition, the `_get_cols` method that reads
the dataframe from the `ParquetTable` will return a dataframe with column
index levels defined by the `_dfLevels` attribute; by default, this is
`column`.
The `_columnLevels` and `_dfLevels` attributes should generally not need to
be changed, unless `_func` needs columns from multiple filters or datasets
to do the calculation.
An example of this is the `lsst.pipe.tasks.functors.Color` functor, for
which `_dfLevels = ('filter', 'column')`, and `_func` expects the dataframe
it gets to have those levels in the column index.
Parameters
----------
filt : str
Filter upon which to do the calculation
dataset : str
Dataset upon which to do the calculation
(e.g., 'ref', 'meas', 'forced_src').
Definition at line 51 of file functors.py.