Custom Query (87 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (67 - 69 of 87)

Ticket Resolution Summary Owner Reporter
#84 fixed Job service stalls divyashri.bhat@gmail.com divyashri.bhat@gmail.com
Description

While using the job service running at http://emmy9.casa.umass.edu:8003, I ran into the following problem:

  1. When there are many "Running" processes, the job service service stalls and puts the jobs in "Pending" status. To try to identify the source of this problem, I looked at the logs of the "Running" Processes.

The EC tries to connect to an RC which is either not up or does not exist and stays in that state while still showing the job status as "Running".

STDOUT: 11:26:21 INFO  OmfEc::Experiment: Experiment: dbhat-2014-04-11T10-18-13-05-00 starts
STDOUT: 11:26:21 INFO  OmfEc::Experiment: Configure 'nodea-labwikicrashtest' to join 'Source1'
STDOUT: 11:26:21 INFO  OmfEc::Experiment: Configure 'nodeb-labwikicrashtest' to join 'Source2'
STDOUT: 11:26:21 INFO  OmfEc::Experiment: Configure 'nodec-labwikicrashtest' to join 'Source3'

To resolve this problem, I tried:

  1. delete all jobs with status as "Running" but they were only waiting for an RC to connect.
  2. restart the job service on emmy9.

After this the experiments were ran successfully.

I am not sure if all of these resources are listed in the AMQP database.

But, suppose these resources are listed in the AMQP database and are deleted by the experimenter or Aggregate Manager, and at a later time, the experimenter tries to connect to these resources that do not actually exist:

  1. How long will the EC wait for these RCs to connect?
  2. With several such jobs, will job service continue to block and thus, prevent other experiments from running?
#85 fixed Make clean-up of job service easier if something goes wrong jack.hong@nicta.com.au johren@bbn.com
Description

Discussed at the 3/31/14 meeting. When something goes wrong in the job service, it is sometimes hard to tell what is happening without cleaning up some of the jobs. This should be a little easier to do.

#86 fixed New requirements for iRODS and git repositories jack.hong@nicta.com.au johren@bbn.com
Description

Currently, the Labwiki scripts are being stored and accessed in git repositories on emmy9. This was done for at least two reasons:

  1. The ODBC issues in iRODS (now resolved)
  2. Performance of listing scripts stored in iRODS

One of the problems with using the git repositories on emmy9 is that users do not have access to their scripts other than through Labwiki. If they want to save them off, they have to cut and paste them from Labwiki.

One of the advantages of using git is that the scripts can be versioned. However, this is not a real benefit current because the users do not have direct access to the git repository.

A few options were discussed at the 4/15 meeting. This ticket will be used to track the discussions of these requirements to solve these problems.

Option 1: possibly rsync git repository on emmy9 to iRODS

  • this will not really give versioning benefit

Option 2: dump script and config parameters to iRODS at the same time as dumping the data

  • this keeps the scripts archived with the data
  • still does not provide versioning

Jack agreed to take a look at effort required for Option #2.

Note: See TracQuery for help on using queries.