Custom Query (87 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (40 - 42 of 87)

Ticket Resolution Summary Owner Reporter
#62 fixed Labwiki assumes that GENI Portal username is the same as iRODS username jack.hong@nicta.com.au johren@bbn.com
Description

Labwiki uses the GENI Portal username to access iRODS. While the GENI Portal tries to get the same username from iRODS, it is not always possible if the username is already taken. In this case, it appends a number to the end. We need to have a way to map the iRODS username to the GENI Portal username.

#96 fixed Job Service VM almost out of memory and disk space divyashri.bhat@gmail.com dbhat@bbn.com
Description

I will take down production job service VM (gimi4.casa.umass.edu) for a short downtime to increase memory and disk space.

http://www.gpolab.bbn.com/experiment-support/gimiservices/

Memory will be increased to 6GB and Disk Space to 10GB.

This downtime will take place between 10:30PM and 11:00PM EDT and should not last more than 10 minutes. I will update this ticket when the task is complete and Job Service is restarted.

#84 fixed Job service stalls divyashri.bhat@gmail.com divyashri.bhat@gmail.com
Description

While using the job service running at http://emmy9.casa.umass.edu:8003, I ran into the following problem:

  1. When there are many "Running" processes, the job service service stalls and puts the jobs in "Pending" status. To try to identify the source of this problem, I looked at the logs of the "Running" Processes.

The EC tries to connect to an RC which is either not up or does not exist and stays in that state while still showing the job status as "Running".

STDOUT: 11:26:21 INFO  OmfEc::Experiment: Experiment: dbhat-2014-04-11T10-18-13-05-00 starts
STDOUT: 11:26:21 INFO  OmfEc::Experiment: Configure 'nodea-labwikicrashtest' to join 'Source1'
STDOUT: 11:26:21 INFO  OmfEc::Experiment: Configure 'nodeb-labwikicrashtest' to join 'Source2'
STDOUT: 11:26:21 INFO  OmfEc::Experiment: Configure 'nodec-labwikicrashtest' to join 'Source3'

To resolve this problem, I tried:

  1. delete all jobs with status as "Running" but they were only waiting for an RC to connect.
  2. restart the job service on emmy9.

After this the experiments were ran successfully.

I am not sure if all of these resources are listed in the AMQP database.

But, suppose these resources are listed in the AMQP database and are deleted by the experimenter or Aggregate Manager, and at a later time, the experimenter tries to connect to these resources that do not actually exist:

  1. How long will the EC wait for these RCs to connect?
  2. With several such jobs, will job service continue to block and thus, prevent other experiments from running?
Note: See TracQuery for help on using queries.