Opened 9 years ago

Closed 8 years ago

#60 closed task (fixed)

Support long running experiments in Labwiki

Reported by: johren@bbn.com Owned by: jack.hong@nicta.com.au
Priority: major Milestone: GEC19
Component: Portal Version: Sprint6
Keywords: Cc:
Dependencies:

Description

We have had many questions on how to access graphs and status for long running experiments after logging out and logging back in.

Email discussion on this topic:

On 19/09/2013, at 8:46 AM, Jeanne Ohren wrote:

>
> Hi all,
>
> We have had a few questions lately (one from Josh Smift and recently from Niky and Manu) about
> being able to have a long running experiment and occasionally check in on graphs of the experiment
> results to see how things are going rather than wait until the experiment completes to view the
> full results.  This would likely mean logging out of Labwiki one day and logging back in the next
> to watch the graphs.
>
> I'm trying to figure out if there is a way to get this to work with the current Labwiki functionality.
> It currently doesn't work to just log back in and find the executing experiment.

This is one of the things which is high on our feature list and we are half-way there. We have a plan on how to do it, but haven't fully implemented it. There is a quick solution which would survive the re-login, but not a restart of Labwiki. The solution we have in mind would also survive a LW restart.
>
> One of the ideas batted around was being able to start the experiment (call it ExpX) and then later run another
> OEDL script that only has the graph(s) defined which accesses the data from ExpX.  Is this currently possible?
> If not, is there any other way to get this sort of functionality other than dumping the data periodically to
> iRODS and using some other tool to create graphs from that data?

One of the major features of OMF6 is the clean decoupling of EC and the experiment.I'm using OMF6 to manage a production system which needs to work 24/7 and has been doing that for a while. Thierry, Jack, and Christoph just finished an extension to that for an embedded, distributed system.

Now the short answer to what can be done with a minor step is to persist the "experiment proxy" in the LW OMF plug-in. The current limitation is that you cannot easily change the graph description AFTER an experiment started. However, the graph description themselves are stored inside LW and stay there even after an experiment finishes. Right now those objects are associated with a session, and not a user. What we would need to do and that should be relatively easy is to associate it with the originating user.

I'll talk with Jack about that.

Cheers,
-max

Change History (6)

comment:1 Changed 8 years ago by johren@bbn.com

Owner: changed from somebody to Thierry.Rakotoarivelo@nicta.com.au
Version: BacklogSprint5

comment:2 Changed 8 years ago by thierry.rakotoarivelo@nicta.com.au

Owner: changed from Thierry.Rakotoarivelo@nicta.com.au to thierry.rakotoarivelo@nicta.com.au
Status: newassigned

We discussed this ticket with Jack on 09/01/14.

The scenarios discussed in this ticket are a bit different (but not incompatible) with the scenarios for long-running experiment that we previously discussed within the OMF team.

In the context of OMF, a long-running experiment is one where the Experiment Controller (EC) instance is stopped sometime after the experiment has started, but the resources in that experiment keep doing whatever they are supposed to do. Then at a later time, another instance of EC is started and configured so that it can communicate with these resources, and effectively continue to 'run' the experiment, e.g. send new commands to them or request new information from them.

In the context of this ticket, a long-running experiment is one where the user is logging out of LW and the experiment keeps going, then at a later time the user can log back in LW and the see the experiment's progress again through any defined graph and log. In that description, nothing actually requires the EC instance to be stopped.

Our discussion concluded that we can implement the requested feature as follow:

  • LW starts an EC instance for each new experiment requested by a user
  • LW does not stop the EC instance if the user requesting the experiment is logging out
  • LW does not stop the EC instance if it is shutting down, i.e. the EC instance is run in a forked process
  • LW does store in its internal DB:
    • the experiment metadata (e.g. ID, configured properties and their values, script names, path to the EC log file)
    • the graph description and related SQL queries required to get the graph data
  • User may log out and/or LW may reboot
  • User log back in and enter in the 'Execution' panel the ID of the long-running experiment
  • LW retrieves in its internal DB the metadata for that experiment, along with the graph defs and SQL queries
  • LW display the metadata and redraw the graphs

The above steps will be implemented and tested before the code freeze for GEC19.

For reference the long-running experiment feature tickets for OMF are tracked here: http://mytestbed.net/issues/1574

comment:3 Changed 8 years ago by thierry.rakotoarivelo@nicta.com.au

Owner: changed from thierry.rakotoarivelo@nicta.com.au to jack.hong@nicta.com.au
Status: assignednew

Assigning this to Jack for now has he is implementing the steps described above.

comment:4 Changed 8 years ago by thierry.rakotoarivelo@nicta.com.au

Here are the tickets on the LW dev page which track this feature:

comment:5 Changed 8 years ago by johren@bbn.com

Version: Sprint5Sprint6

comment:6 Changed 8 years ago by johren@bbn.com

Resolution: fixed
Status: newclosed

This was demonstrated at GEC19

Note: See TracTickets for help on using tickets.