lsst.pipe.base  21.0.0-10-g560fb7b+5f2ca89aed
Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes | List of all members
lsst.pipe.base.cmdLineTask.TaskRunner Class Reference
Inheritance diagram for lsst.pipe.base.cmdLineTask.TaskRunner:
lsst.pipe.base.cmdLineTask.ButlerInitializedTaskRunner lsst.pipe.base.cmdLineTask.LegacyTaskRunner

Public Member Functions

def __init__ (self, TaskClass, parsedCmd, doReturnResults=False)
 
def prepareForMultiProcessing (self)
 
def run (self, parsedCmd)
 
def makeTask (self, parsedCmd=None, args=None)
 
def precall (self, parsedCmd)
 
def __call__ (self, args)
 
def runTask (self, task, dataRef, kwargs)
 

Static Public Member Functions

def getTargetList (parsedCmd, **kwargs)
 

Public Attributes

 TaskClass
 
 doReturnResults
 
 config
 
 log
 
 doRaise
 
 clobberConfig
 
 doBackup
 
 numProcesses
 
 timeout
 

Static Public Attributes

int TIMEOUT = 3600*24*30
 

Detailed Description

Run a command-line task, using `multiprocessing` if requested.

Parameters
----------
TaskClass : `lsst.pipe.base.Task` subclass
    The class of the task to run.
parsedCmd : `argparse.Namespace`
    The parsed command-line arguments, as returned by the task's argument
    parser's `~lsst.pipe.base.ArgumentParser.parse_args` method.

    .. warning::

       Do not store ``parsedCmd``, as this instance is pickled (if
       multiprocessing) and parsedCmd may contain non-picklable elements.
       It certainly contains more data than we need to send to each
       instance of the task.
doReturnResults : `bool`, optional
    Should run return the collected result from each invocation of the
    task? This is only intended for unit tests and similar use. It can
    easily exhaust memory (if the task returns enough data and you call it
    enough times) and it will fail when using multiprocessing if the
    returned data cannot be pickled.

    Note that even if ``doReturnResults`` is False a struct with a single
    member "exitStatus" is returned, with value 0 or 1 to be returned to
    the unix shell.

Raises
------
ImportError
    Raised if multiprocessing is requested (and the task supports it) but
    the multiprocessing library cannot be imported.

Notes
-----
Each command-line task (subclass of `lsst.pipe.base.CmdLineTask`) has a
task runner. By default it is this class, but some tasks require a
subclass. See the manual :ref:`creating-a-command-line-task` for more
information. See `CmdLineTask.parseAndRun` to see how a task runner is
used.

You may use this task runner for your command-line task if your task has a
``runDataRef`` method that takes exactly one argument: a butler data
reference. Otherwise you must provide a task-specific subclass of
this runner for your task's ``RunnerClass`` that overrides
`TaskRunner.getTargetList` and possibly
`TaskRunner.__call__`. See `TaskRunner.getTargetList` for details.

This design matches the common pattern for command-line tasks: the
``runDataRef`` method takes a single data reference, of some suitable name.
Additional arguments are rare, and if present, require a subclass of
`TaskRunner` that calls these additional arguments by name.

Instances of this class must be picklable in order to be compatible with
multiprocessing. If multiprocessing is requested
(``parsedCmd.numProcesses > 1``) then `runDataRef` calls
`prepareForMultiProcessing` to jettison optional non-picklable elements.
If your task runner is not compatible with multiprocessing then indicate
this in your task by setting class variable ``canMultiprocess=False``.

Due to a `python bug`__, handling a `KeyboardInterrupt` properly `requires
specifying a timeout`__. This timeout (in sec) can be specified as the
``timeout`` element in the output from `~lsst.pipe.base.ArgumentParser`
(the ``parsedCmd``), if available, otherwise we use `TaskRunner.TIMEOUT`.

By default, we disable "implicit" threading -- ie, as provided by
underlying numerical libraries such as MKL or BLAS. This is designed to
avoid thread contention both when a single command line task spawns
multiple processes and when multiple users are running on a shared system.
Users can override this behaviour by setting the
``LSST_ALLOW_IMPLICIT_THREADS`` environment variable.

.. __: http://bugs.python.org/issue8296
.. __: http://stackoverflow.com/questions/1408356/

Definition at line 94 of file cmdLineTask.py.

Constructor & Destructor Documentation

◆ __init__()

def lsst.pipe.base.cmdLineTask.TaskRunner.__init__ (   self,
  TaskClass,
  parsedCmd,
  doReturnResults = False 
)

Definition at line 174 of file cmdLineTask.py.

Member Function Documentation

◆ __call__()

def lsst.pipe.base.cmdLineTask.TaskRunner.__call__ (   self,
  args 
)
Run the Task on a single target.

Parameters
----------
args
    Arguments for Task.runDataRef()

Returns
-------
struct : `lsst.pipe.base.Struct`
    Contains these fields if ``doReturnResults`` is `True`:

    - ``dataRef``: the provided data reference.
    - ``metadata``: task metadata after execution of run.
    - ``result``: result returned by task run, or `None` if the task
      fails.
    - ``exitStatus``: 0 if the task completed successfully, 1
      otherwise.

    If ``doReturnResults`` is `False` the struct contains:

    - ``exitStatus``: 0 if the task completed successfully, 1
      otherwise.

Notes
-----
This default implementation assumes that the ``args`` is a tuple
containing a data reference and a dict of keyword arguments.

.. warning::

   If you override this method and wish to return something when
   ``doReturnResults`` is `False`, then it must be picklable to
   support multiprocessing and it should be small enough that pickling
   and unpickling do not add excessive overhead.

Definition at line 380 of file cmdLineTask.py.

◆ getTargetList()

def lsst.pipe.base.cmdLineTask.TaskRunner.getTargetList (   parsedCmd,
**  kwargs 
)
static
Get a list of (dataRef, kwargs) for `TaskRunner.__call__`.

Parameters
----------
parsedCmd : `argparse.Namespace`
    The parsed command object returned by
    `lsst.pipe.base.argumentParser.ArgumentParser.parse_args`.
kwargs
    Any additional keyword arguments. In the default `TaskRunner` this
    is an empty dict, but having it simplifies overriding `TaskRunner`
    for tasks whose runDataRef method takes additional arguments
    (see case (1) below).

Notes
-----
The default implementation of `TaskRunner.getTargetList` and
`TaskRunner.__call__` works for any command-line task whose
``runDataRef`` method takes exactly one argument: a data reference.
Otherwise you must provide a variant of TaskRunner that overrides
`TaskRunner.getTargetList` and possibly `TaskRunner.__call__`.
There are two cases.

**Case 1**

If your command-line task has a ``runDataRef`` method that takes one
data reference followed by additional arguments, then you need only
override `TaskRunner.getTargetList` to return the additional
arguments as an argument dict. To make this easier, your overridden
version of `~TaskRunner.getTargetList` may call
`TaskRunner.getTargetList` with the extra arguments as keyword
arguments. For example, the following adds an argument dict containing
a single key: "calExpList", whose value is the list of data IDs for
the calexp ID argument:

.. code-block:: python

    def getTargetList(parsedCmd):
        return TaskRunner.getTargetList(
            parsedCmd,
            calExpList=parsedCmd.calexp.idList
        )

It is equivalent to this slightly longer version:

.. code-block:: python

    @staticmethod
    def getTargetList(parsedCmd):
        argDict = dict(calExpList=parsedCmd.calexp.idList)
        return [(dataId, argDict) for dataId in parsedCmd.id.idList]

**Case 2**

If your task does not meet condition (1) then you must override both
TaskRunner.getTargetList and `TaskRunner.__call__`. You may do this
however you see fit, so long as `TaskRunner.getTargetList`
returns a list, each of whose elements is sent to
`TaskRunner.__call__`, which runs your task.

Definition at line 253 of file cmdLineTask.py.

◆ makeTask()

def lsst.pipe.base.cmdLineTask.TaskRunner.makeTask (   self,
  parsedCmd = None,
  args = None 
)
Create a Task instance.

Parameters
----------
parsedCmd
    Parsed command-line options (used for extra task args by some task
    runners).
args
    Args tuple passed to `TaskRunner.__call__` (used for extra task
    arguments by some task runners).

Notes
-----
``makeTask`` can be called with either the ``parsedCmd`` argument or
``args`` argument set to None, but it must construct identical Task
instances in either case.

Subclasses may ignore this method entirely if they reimplement both
`TaskRunner.precall` and `TaskRunner.__call__`.

Reimplemented in lsst.pipe.base.cmdLineTask.ButlerInitializedTaskRunner.

Definition at line 315 of file cmdLineTask.py.

◆ precall()

def lsst.pipe.base.cmdLineTask.TaskRunner.precall (   self,
  parsedCmd 
)
Hook for code that should run exactly once, before multiprocessing.

Notes
-----
Must return True if `TaskRunner.__call__` should subsequently be
called.

.. warning::

   Implementations must take care to ensure that no unpicklable
   attributes are added to the TaskRunner itself, for compatibility
   with multiprocessing.

The default implementation writes package versions, schemas and
configs, or compares them to existing files on disk if present.

Definition at line 349 of file cmdLineTask.py.

◆ prepareForMultiProcessing()

def lsst.pipe.base.cmdLineTask.TaskRunner.prepareForMultiProcessing (   self)
Prepare this instance for multiprocessing

Optional non-picklable elements are removed.

This is only called if the task is run under multiprocessing.

Definition at line 193 of file cmdLineTask.py.

◆ run()

def lsst.pipe.base.cmdLineTask.TaskRunner.run (   self,
  parsedCmd 
)
Run the task on all targets.

Parameters
----------
parsedCmd : `argparse.Namespace`
    Parsed command `argparse.Namespace`.

Returns
-------
resultList : `list`
    A list of results returned by `TaskRunner.__call__`, or an empty
    list if `TaskRunner.__call__` is not called (e.g. if
    `TaskRunner.precall` returns `False`). See `TaskRunner.__call__`
    for details.

Notes
-----
The task is run under multiprocessing if `TaskRunner.numProcesses`
is more than 1; otherwise processing is serial.

Definition at line 202 of file cmdLineTask.py.

◆ runTask()

def lsst.pipe.base.cmdLineTask.TaskRunner.runTask (   self,
  task,
  dataRef,
  kwargs 
)
Make the actual call to `runDataRef` for this task.

Parameters
----------
task : `lsst.pipe.base.CmdLineTask` class
    The class of the task to run.
dataRef
    Butler data reference that contains the data the task will process.
kwargs
    Any additional keyword arguments.  See `TaskRunner.getTargetList`
    above.

Notes
-----
The default implementation of `TaskRunner.runTask` works for any
command-line task which has a ``runDataRef`` method that takes a data
reference and an optional set of additional keyword arguments.
This method returns the results generated by the task's `runDataRef`
method.

Reimplemented in lsst.pipe.base.cmdLineTask.LegacyTaskRunner.

Definition at line 473 of file cmdLineTask.py.

Member Data Documentation

◆ clobberConfig

lsst.pipe.base.cmdLineTask.TaskRunner.clobberConfig

Definition at line 180 of file cmdLineTask.py.

◆ config

lsst.pipe.base.cmdLineTask.TaskRunner.config

Definition at line 177 of file cmdLineTask.py.

◆ doBackup

lsst.pipe.base.cmdLineTask.TaskRunner.doBackup

Definition at line 181 of file cmdLineTask.py.

◆ doRaise

lsst.pipe.base.cmdLineTask.TaskRunner.doRaise

Definition at line 179 of file cmdLineTask.py.

◆ doReturnResults

lsst.pipe.base.cmdLineTask.TaskRunner.doReturnResults

Definition at line 176 of file cmdLineTask.py.

◆ log

lsst.pipe.base.cmdLineTask.TaskRunner.log

Definition at line 178 of file cmdLineTask.py.

◆ numProcesses

lsst.pipe.base.cmdLineTask.TaskRunner.numProcesses

Definition at line 182 of file cmdLineTask.py.

◆ TaskClass

lsst.pipe.base.cmdLineTask.TaskRunner.TaskClass

Definition at line 175 of file cmdLineTask.py.

◆ TIMEOUT

int lsst.pipe.base.cmdLineTask.TaskRunner.TIMEOUT = 3600*24*30
static

Definition at line 171 of file cmdLineTask.py.

◆ timeout

lsst.pipe.base.cmdLineTask.TaskRunner.timeout

Definition at line 184 of file cmdLineTask.py.


The documentation for this class was generated from the following file: