Custom Query (87 matches)
Results (10 - 12 of 87)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#84 | fixed | Job service stalls | ||
Description |
While using the job service running at http://emmy9.casa.umass.edu:8003, I ran into the following problem:
The EC tries to connect to an RC which is either not up or does not exist and stays in that state while still showing the job status as "Running". STDOUT: 11:26:21 INFO OmfEc::Experiment: Experiment: dbhat-2014-04-11T10-18-13-05-00 starts STDOUT: 11:26:21 INFO OmfEc::Experiment: Configure 'nodea-labwikicrashtest' to join 'Source1' STDOUT: 11:26:21 INFO OmfEc::Experiment: Configure 'nodeb-labwikicrashtest' to join 'Source2' STDOUT: 11:26:21 INFO OmfEc::Experiment: Configure 'nodec-labwikicrashtest' to join 'Source3'
To resolve this problem, I tried:
After this the experiments were ran successfully.
I am not sure if all of these resources are listed in the AMQP database.
But, suppose these resources are listed in the AMQP database and are deleted by the experimenter or Aggregate Manager, and at a later time, the experimenter tries to connect to these resources that do not actually exist:
|
|||
#88 | worksforme | LabWiki crash: with bad file descriptor message | ||
Description |
The production LabWiki (emmy9.casa.umass.edu:4000) is currently being used by students to run experiments. We noticed a few crashes with the same error as below: DEBUG development::LabWiki::LWWidget: Calling 'on_stop_experiment on 'LabWiki::Plugin::Experiment::ExperimentWidget' widget DEBUG development::LabWiki::Plugin::Experiment::ExperimentWidget: STOP EXPERIMENT as requested>>> {:action=>"stop_experiment", :col=>"execute", :sid=>"s6751972_4585340"} DEBUG development::LabWiki::Plugin::Experiment::Experiment: SEND job stop request to http://emmy9.casa.umass.edu:8003/jobs/841cc39d-58ef-471d-aad4-7df138b71e60>>> DEBUG development::LabWiki::Plugin::Experiment::Util::RetryHandler: canceled - #<Proc:0x00000003cea5d8@/var/lib/omfwebapps/lw_gec19/plugins/labwiki_experiment_plugin/lib/labwiki/plugin/experiment/log_adapter.rb:31> DEBUG development::LabWiki::Plugin::Experiment::Util::RetryHandler: canceled - #<Proc:0x00000003cea1a0@/var/lib/omfwebapps/lw_gec19/plugins/labwiki_experiment_plugin/lib/labwiki/plugin/experiment/ec_adapter.rb:32> terminate called after throwing an instance of 'std::runtime_error' what(): unable to add new descriptor: Bad file descriptor Aborted (core dumped) The maximum file descriptor size for emmy9 seems to be quite high gimiadmin@emmy9:~$ cat /proc/sys/fs/file-max 1212548 Is there a maximum limit defined in gems used by LabWiki?? |
|||
#89 | fixed | LabWiki: emmy9:4000 response slows down | ||
Description |
LabWiki? on http://emmy9.casa.umass.edu:4000 response slows down with more than 10 experiments running. top returns: top - 11:46:51 up 6 days, 18:31, 3 users, load average: 9.31, 7.79, 6.78 Tasks: 613 total, 3 running, 609 sleeping, 0 stopped, 1 zombie Cpu(s): 4.2%us, 2.0%sy, 0.0%ni, 86.5%id, 7.2%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 12291136k total, 11220328k used, 1070808k free, 644620k buffers Swap: 12569596k total, 1234756k used, 11334840k free, 1227576k cached The response is considerably slow when many experiments are running. Is there anything specific we should me monitoring or logging when the socket timeout occurs? |