lsst.pipe.base  13.0-11-gdf6a56c+1
 All Classes Namespaces Files Functions Variables Pages
Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes | List of all members
lsst.pipe.base.cmdLineTask.TaskRunner Class Reference
Inheritance diagram for lsst.pipe.base.cmdLineTask.TaskRunner:
lsst.pipe.base.cmdLineTask.ButlerInitializedTaskRunner

Public Member Functions

def __init__
 Construct a TaskRunner. More...
 
def prepareForMultiProcessing
 
def run
 Run the task on all targets. More...
 
def makeTask
 Create a Task instance. More...
 
def precall
 
def __call__
 Run the Task on a single target. More...
 

Static Public Member Functions

def getTargetList
 Return a list of (dataRef, kwargs) for TaskRunner. More...
 

Public Attributes

 TaskClass
 
 doReturnResults
 
 config
 
 log
 
 doRaise
 
 clobberConfig
 
 doBackup
 
 numProcesses
 
 timeout
 

Static Public Attributes

int TIMEOUT = 3600
 

Detailed Description

Run a command-line task, using multiprocessing if requested.

Each command-line task (subclass of CmdLineTask) has a task runner. By
default it is this class, but some tasks require a subclass. See the
manual "how to write a command-line task" in the pipe_tasks documentation
for more information. See CmdLineTask.parseAndRun to see how a task runner
is used.

You may use this task runner for your command-line task if your task has
a run method that takes exactly one argument: a butler data reference.
Otherwise you must provide a task-specific subclass of this runner for
your task's `RunnerClass` that overrides TaskRunner.getTargetList and
possibly TaskRunner.\_\_call\_\_. See TaskRunner.getTargetList for
details.

This design matches the common pattern for command-line tasks: the run
method takes a single data reference, of some suitable name. Additional
arguments are rare, and if present, require a subclass of TaskRunner that
calls these additional arguments by name.

Instances of this class must be picklable in order to be compatible with
multiprocessing. If multiprocessing is requested
(parsedCmd.numProcesses > 1) then run() calls prepareForMultiProcessing
to jettison optional non-picklable elements. If your task runner is not
compatible with multiprocessing then indicate this in your task by setting
class variable canMultiprocess=False.

Due to a python bug [1], handling a KeyboardInterrupt properly requires
specifying a timeout [2]. This timeout (in sec) can be specified as the
"timeout" element in the output from ArgumentParser (the "parsedCmd"), if
available, otherwise we use TaskRunner.TIMEOUT.

[1] http://bugs.python.org/issue8296
[2] http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool)

Definition at line 107 of file cmdLineTask.py.

Constructor & Destructor Documentation

def lsst.pipe.base.cmdLineTask.TaskRunner.__init__ (   self,
  TaskClass,
  parsedCmd,
  doReturnResults = False 
)

Construct a TaskRunner.

Warning
Do not store parsedCmd, as this instance is pickled (if multiprocessing) and parsedCmd may contain non-picklable elements. It certainly contains more data than we need to send to each instance of the task.
Parameters
TaskClassThe class of the task to run
parsedCmdThe parsed command-line arguments, as returned by the task's argument parser's parse_args method.
doReturnResultsShould run return the collected result from each invocation of the task? This is only intended for unit tests and similar use. It can easily exhaust memory (if the task returns enough data and you call it enough times) and it will fail when using multiprocessing if the returned data cannot be pickled.

Note that even if doReturnResults is False a struct with a single member "exitStatus" is returned, with value 0 or 1 to be returned to the unix shell.

Exceptions
ImportErrorif multiprocessing requested (and the task supports it) but the multiprocessing library cannot be imported.

Definition at line 145 of file cmdLineTask.py.

Member Function Documentation

def lsst.pipe.base.cmdLineTask.TaskRunner.__call__ (   self,
  args 
)

Run the Task on a single target.

This default implementation assumes that the 'args' is a tuple containing a data reference and a dict of keyword arguments.

Warning
if you override this method and wish to return something when doReturnResults is false, then it must be picklable to support multiprocessing and it should be small enough that pickling and unpickling do not add excessive overhead.
Parameters
argsArguments for Task.run()
Returns
:
  • None if doReturnResults false
  • A pipe_base Struct containing these fields if doReturnResults true:
    • dataRef: the provided data reference
    • metadata: task metadata after execution of run
    • result: result returned by task run, or None if the task fails

Definition at line 347 of file cmdLineTask.py.

def lsst.pipe.base.cmdLineTask.TaskRunner.getTargetList (   parsedCmd,
  kwargs 
)
static

Return a list of (dataRef, kwargs) for TaskRunner.

__call__.

Parameters
parsedCmdthe parsed command object (an argparse.Namespace) returned by ArgumentParser.parse_args.
**kwargsany additional keyword arguments. In the default TaskRunner this is an empty dict, but having it simplifies overriding TaskRunner for tasks whose run method takes additional arguments (see case (1) below).

The default implementation of TaskRunner.getTargetList and TaskRunner.__call__ works for any command-line task whose run method takes exactly one argument: a data reference. Otherwise you must provide a variant of TaskRunner that overrides TaskRunner.getTargetList and possibly TaskRunner.__call__. There are two cases:

(1) If your command-line task has a run method that takes one data reference followed by additional arguments, then you need only override TaskRunner.getTargetList to return the additional arguments as an argument dict. To make this easier, your overridden version of getTargetList may call TaskRunner.getTargetList with the extra arguments as keyword arguments. For example, the following adds an argument dict containing a single key: "calExpList", whose value is the list of data IDs for the calexp ID argument:

1 \@staticmethod
2 def getTargetList(parsedCmd):
3  return TaskRunner.getTargetList(
4 parsedCmd,
5 calExpList=parsedCmd.calexp.idList
6  )

It is equivalent to this slightly longer version:

1 \@staticmethod
2 def getTargetList(parsedCmd):
3  argDict = dict(calExpList=parsedCmd.calexp.idList)
4  return [(dataId, argDict) for dataId in parsedCmd.id.idList]

(2) If your task does not meet condition (1) then you must override both TaskRunner.getTargetList and TaskRunner.__call__. You may do this however you see fit, so long as TaskRunner.getTargetList returns a list, each of whose elements is sent to TaskRunner.__call__, which runs your task.

Definition at line 239 of file cmdLineTask.py.

def lsst.pipe.base.cmdLineTask.TaskRunner.makeTask (   self,
  parsedCmd = None,
  args = None 
)

Create a Task instance.

Parameters
[in]parsedCmdparsed command-line options (used for extra task args by some task runners)
[in]argsargs tuple passed to TaskRunner.__call__ (used for extra task arguments by some task runners)

makeTask() can be called with either the 'parsedCmd' argument or 'args' argument set to None, but it must construct identical Task instances in either case.

Subclasses may ignore this method entirely if they reimplement both TaskRunner.precall and TaskRunner.__call__

Definition at line 292 of file cmdLineTask.py.

def lsst.pipe.base.cmdLineTask.TaskRunner.precall (   self,
  parsedCmd 
)
Hook for code that should run exactly once, before multiprocessing

Must return True if TaskRunner.\_\_call\_\_ should subsequently be
called.

@warning Implementations must take care to ensure that no unpicklable
    attributes are added to the TaskRunner itself, for compatibility
    with multiprocessing.

The default implementation writes package versions, schemas and
configs, or compares them to existing files on disk if present.

Definition at line 320 of file cmdLineTask.py.

def lsst.pipe.base.cmdLineTask.TaskRunner.prepareForMultiProcessing (   self)
Prepare this instance for multiprocessing

Optional non-picklable elements are removed.

This is only called if the task is run under multiprocessing.

Definition at line 189 of file cmdLineTask.py.

def lsst.pipe.base.cmdLineTask.TaskRunner.run (   self,
  parsedCmd 
)

Run the task on all targets.

The task is run under multiprocessing if numProcesses > 1; otherwise processing is serial.

Returns
a list of results returned by TaskRunner.__call__, or an empty list if TaskRunner.__call__ is not called (e.g. if TaskRunner.precall returns False). See TaskRunner.__call__ for details.

Definition at line 198 of file cmdLineTask.py.

Member Data Documentation

lsst.pipe.base.cmdLineTask.TaskRunner.clobberConfig

Definition at line 176 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.config

Definition at line 173 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.doBackup

Definition at line 177 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.doRaise

Definition at line 175 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.doReturnResults

Definition at line 172 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.log

Definition at line 174 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.numProcesses

Definition at line 178 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.TaskClass

Definition at line 171 of file cmdLineTask.py.

int lsst.pipe.base.cmdLineTask.TaskRunner.TIMEOUT = 3600
static

Definition at line 143 of file cmdLineTask.py.

lsst.pipe.base.cmdLineTask.TaskRunner.timeout

Definition at line 180 of file cmdLineTask.py.


The documentation for this class was generated from the following file: