[[PageOutline]] = GENI Storage and Archive Service (GSAS) = Last updated by Harry on 5/23/13 [[br]] == Participating Projects and Organizations == [http://groups.geni.net/geni/wiki/DigitalObjectRegistry Project 1663: Digital Object Registry (CNRI)] [[BR]] [http://groups.geni.net/geni/wiki/GIMI Project 1856: GIMI I&M Tools (UMass Amherst, RENCI and others)] [[BR]] == Technical Contacts == Giridhar Manepalli (CNRI) (mailto:gmanepalli@cnri.reston.va.us / 703 620 8990) [[BR]] Shu Huang (RENCI) (mailto:shuang@renci.org) [[BR]] '''GPO System Engineer:''' [mailto:hmussman@bbn.com Harry Mussman] [[BR]] '''GPO Software Engineer:''' [mailto:johren@BBN.COM Jeanne Ohren] [[BR]] == 1) GENI Policy for Sharing Research Results == CNRI is drafting a GENI policy for sharing research results with the community, that follows evolving community and NSF practice. [[br]] The GSAS will provide the mechanisms necessary to implement the policy. [[br]] Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 1 || Establish GENI policy on sharing research results || || || || || || || 1a || DRAFT policy ( 2 pages) || Larry/Giridhar || 6/15/13 || || || || || 1b || Review with GPO || Larry/Giridhar || 6/20/13 || || || || || 1c || Review with community || Larry/Giridhar || GEC17 (how?)|| || || || || 1d || || || || || || || || 1e || || || || || || || == 2) Goals for GENI Storage and Archive Service (GSAS) == Goals for GSAS: [[BR]] 1. A structured place to store all of the objects (artifacts) for an experiment, with descriptors (metadata), that is easy to access, with short to medium term storage, and the ability to search. (Note: this goes well beyond just measurement data objects.) [[BR]] 2. A separate long-term archive, with controlled access from the outside world, using a DOI (handle) as a persistent identifier [[BR]] 3. Include most of the functionality provided by the Measurement Data Archive (MDA) prototype, built by CNRI. [[BR]] [http://groups.geni.net/geni/attachment/wiki/GSAS/082112b_MDASrvc_Figures.vsd MeasurementDataArchive prototype] [[br]] 4. Establish multiple federated iRODS services, starting at RENCI and UMass Amherst, and operate for GENI users (experimenters). 5. Establish persistent accounts for each user, and use icommands to store and retrieve objects (artifacts) for each user in the storage service. 6. Establish authentication for each user based on username/password, certificates, and also proxy (delegated) certificates. [[BR]] 7. Establish a directory structure in the storage service for each user to accommodate multiple experiments, and a directory structure for each experiment (consider "bag") to include all objects (artifacts) associated with that experiment, including one or more descriptors (metadata) within XML files (following the GENI descriptor schema). [[BR]] 8. Provide multiple interfaces (including icommand and web) to allow an authenticated user to view, search and curate their objects (artifacts). [[BR]] 9. Provide interface to allow a user to define an object (artifact) to be archived (where the object (artifact) may range from a large directory to a single file), include a descriptor (following the GENI !ObjectDescriptor Schema), assign a persistent Digital Object Identifier (DOI, or "handle"), and decide when to push it to archive service. [[BR]] 10. Establish an archive service that provides long-term and reliable storage, with public access via a DOI from the global handle service. [[BR]] 11. Include a search function in the archive service, so that an outside user can search for and then retrieve an object, but allow the object’s owner to disable search, so that an outside user needs the DOI of the object to retrieve it. [[BR]] Goals for GENI !ObjectDescriptor Schema: [[BR]] 12. Useful for all types of objects, not just !MeasurementData objects. [[BR]] 13. Keep it simple, with the minimum number of mandatory fields. [[BR]] 14. Where possible, values for fields should be automatically generated by Experiment Management Tools. [[BR]] Use of “!DataCite Schema for the Publication and Citation of Research Data”: [[BR]] 15. When an object (artifact) is archived in the Archive Service with public access from the outside world via the Internet, using a DOI (handle) as a persistent identifier, and include descriptors (metadata) that follows !DataCite Schema (ref) [[BR]] == 2) Overview of the GENI Storage and Archive Service (GSAS) == Document: An overview of the structure and use of the GSAS is contained in this document: [[BR]] GENI Storage and Archive Service: Configuration of Service, Structure of Directories and Files, and Use Cases [[br]] 1) Goals [[BR]] 2) Configuration [[BR]] 3) Use Cases [[BR]] 4) Structure of Directories and Files in the GSAS [[BR]] 5) Access to the Structure in GSAS [[BR]] 6) Adding Descriptors [[BR]] 7) Searching Structure in GSAS [[BR]] 8) Creating a Bag and a .tar File [[BR]] 9) Archiving an Object [[BR]] 10) Overview of v1.x GENI !ObjectDescriptor Schemas [[BR]] This document is based on the early Measurement Data Archive (MDA) service prototype developed by CNRI, and many discussions within the GENI I&M community. [[BR]] [http://groups.geni.net/geni/attachment/wiki/GSAS/051613_iRODS_Figures.vsd GSAS Structure and Use Cases figure (visio)] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/Visio-051613_iRODS_Figures.pdf GSAS Structure and Use Cases figure (pdf)] [[br]] Versions of document: [http://groups.geni.net/geni/attachment/wiki/GSAS/042613%20v1.1_GENIObjectDescriptor%20Schema.docx 042613 v1.1 GENI ObjectDescriptor Schema] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/042913%20%20v1.2_GENIObjectDescriptor%20Schema.docx 042913 v1.2 GENI ObjectDescriptor Schema] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/042913b%20%20v1.2_GENIObjectDescriptor%20Schema.docx 042913b v1.2 GENI ObjectDescriptor Schema] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/042913c%20%20v1.2_GENIObjectDescriptor%20Schema.docx 042913c v1.2 GENI ObjectDescriptor Schema] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/060613%20%20v1.3_GENIObjectDescriptor%20Schema.docx v1.3 GENI ObjectDescriptor Schema] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/060613schemaonly%20%20v1.3_GENIObjectDescriptor%20Schema.docx 060613 (schema only) v1.3 GENI ObjectDescriptor Schema] [[br]] Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 2 || "Configuration, Structure and Use Cases" document || || || || || || || 2a || Review v1.1 || Group || 4/29/13 || completed || || || || 2b || Draft v1.2 || Harry || 5/1/13 || completed || || || || 2c || Review v1.2 || Group || 5/8/13 || completed || || || || 2d || Review with GPO || Harry/Jeanne || 5/30/13 || || || || || 2e || Review with GEMINI (status mtg) || Harry/Jeanne || 5/28/13 || || || || || 2f || Review with GIMI (Cong, Keileigh) || Harry/Jeanne || 5/28/13 || || || || || 2g || || || || || || || || 3 || Resolve design issues || || || || || || || 3a || Issue 2.1: How are persistent accounts established for each user in iRODSs? || Shu, Jeanne, Tom/Aaron || 5/22/13 || completed || || Per GIMI workflow discussion: use POST to restful interface on iRODS || || 3b || Issue 2.2: How are storage capacity limits established and enforced for each IRODS user? Are older objects (artifacts) flagged for removal? || Shu || || || || || || 3c || Issue 2.3: How are archive capacity limits established and enforced for each IRODS user? Are older objects (artifacts) flagged for removal? || Shu || || || || || || 3d || Issue 5.1: Where is the proxy certificate created? How is the proxy certificate transferred to the GEMINI Measurement Store service? || Ezra || || || || || || 3e || Issue 5.2: What happens if the proxy certificate expires? Is the user notified? How can they load an updated proxy certificate? || Ezra || || || || || || 3f || Issue 5.3: How is the target information transferred to the service? || Ezra || || || || || || 3g || Issue 5.4: How is the iticket transferred to the GIMI portal service? || Cong || 5/22/13 || completed || || Per GIMI workflow discussion: init script gets, pushes || || 3h || Issue 5.5: What happens if the iticket certificate expires? Is the user notified? How can they load an updated proxy certificate? || Cong || || || || || || 3i || Issue 5.6: How is all of the target information transferred to the service agent? || Cong || 5/22/13 || completed || || Per GIMI workflow discussion: init script gets, pushes || || 3j || Issue 6.1: Need to establish rules if there is a discrepancy in descriptors. || Shu || || || || || || 3k || Issue 6.2: Need to establish rules for changing or removing metadata.xml files. || Shu || || || || || || 3l || Issue 7.1: When using a browser in the Experiment Management Environment (or elsewhere) to view artifacts (files and directories) in the GSAS, how will the associated descriptors (metadata) will be displayed? || Shu || || || || || || 3m || Issue 8.1: After the bag and .tar file have been created and used, is there some cleanup that should be done? || Shu || || || || || || 3n || Issue 8.2: After changes have been made to directories and files, what is the process for recreating the bag and .tar file? || Shu || || || || || || 3o || Issue 10.1: Is there a way to derive the descriptors in an archive.xml file from descriptors in the other types of metadata.xml files, or at least an initial set of descriptors for the archive.xml file? || Giridhar || || || || || || 3p || || || || || || || || 3q || || || || || || || == 3) GENI !ObjectDescriptor Schema == CNRI is developing a comprehensive GENI !ObjectDescriptor Schema, that fully implements the schema outlined in the GSAS overview document. [[br]] Versions of .xsd schema: [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/GENIObject.xsd v1.1 GENIObject] [[br]] Examples of v1.1 metadata.xml files: [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/project1.xml Project1] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/Experiment1.xml Experiment1] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/projectSerialized.xml ProjectSerialized] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/Step1.xml Step1] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/Artifact1.xml Artifact1] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/ArchiveOfProject.xml ArchiveofProject] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/ArchiveOfExperiment.xml ArchiveofExperiment] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS?action=new ArchiveofStep] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/ArchiveOfArtifact.xml ArchiveofArtifact] [[br]] Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 4 || "GENI !ObjectDescriptor Schema" || || || || || || || 4a || Review v1.1 || Group || 4/29/13 || completed || || || || 4b || Issue v1.2 || Giridhar || 6/1/13 (or earlier?) || || || Need to include controlled vocabularies; need to update metadata.xml file examples || || 4c || Review with group || Giridhar || ? || || || || || 4d || Review with GPO || Giridhar || ? || || || || || 4e || Review with community || Giridhar || GEC17 || || || || || 4f || || || || || || || || 3o || Issue 10.1: Is there a way to derive the descriptors in an archive.xml file from descriptors in the other types of metadata.xml files, or at least an initial set of descriptors for the archive.xml file? || Giridhar || || || || || == 4) !DataCite Schema == When GENI research results are shared with the research community, they will use the !DataCite metadata schema, which has been established for the research community. References: [http://groups.geni.net/geni/attachment/wiki/GSAS/DataCite-MetadataKernel_v2.2.pdf document] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/DataCite-metadata.xsd metadata example] [[br]] == 5) iRODS Design and Deployment == As part of the GIMI project, RENCI is developing and deploying iRODS, with special extensions to implement the GSAS, as specified in the overview document. [[br]] References: [[br]] [http://groups.geni.net/geni/attachment/wiki/GIMI/iRODS_Fact_Sheet-0907c.pdf iRODS fact sheet] [[BR]] [http://groups.geni.net/geni/attachment/wiki/GIMI/iRODS_Overview_0903.pdf iRODS overview] [[BR]] [http://groups.geni.net/geni/attachment/wiki/GIMI/irods-gec14-1.pptx iRODS configuration] [[BR]] [http://groups.geni.net/geni/attachment/wiki/GIMI/041712%20%20gimi_use_cases.pptx iRODS use cases] [[BR]] Plan: [[br]] [http://groups.geni.net/geni/attachment/wiki/GIMI/011513%20%20iRODS%20GENI%20plan.docx 011513 iRODS plan] [[BR]] [http://groups.geni.net/geni/attachment/wiki/GSAS/gimi-irods-4-15-13b.docx 041513 addendum to the iRODS plan] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/gimi-irods-5-1-13b.docx 050113 addendum to the iRODS plan] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/gimi-irods-5-9-13b.docx 050913 addendum to the iRODS plan] [[br]] Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 5 || "iRODS" || || || || || || || 5a || Establish multiple federated iRODS services, starting at RENCI and UMass Amherst, and operate for GENI users (experimenters) || Shu and Cong || GEC14 || completed || || || || 5b || Establish persistent accounts for each user, use icommands to store and retrieve measurement data objects for each user in a storage service || Shu and Cong || GEC14 || completed || || || || 5c || Establish authentication for each user based on certificates, and also proxy (delegated) certificates. || Shu and Jeanne || GEC15 || completed || || || || 5d || Establish directory structure in storage service for each user to accommodate multiple experiments, and directory structure for each experiment (consider "bag") to include all objects (artifacts) associated with that experiment, including one or more descriptors (metadata) within XML files (following the GENI descriptor schema) || Shu, Harry and Giridhar || GEC15 || completed || || || || 5e || Provide multiple interfaces (including icommand, restful http, and web) to allow authenticated user to view, search and curate their objects (artifacts) || Shu || GEC15 || completed || || restful http can come later || || 5f || Provide interface to allow user to define an object to be archived, where the object may range from a large directory to a single file, use bagit command to from object, include a descriptor following the GENI descriptor schema; then push it to archive service; finally receive assigned persistent Digital Object Identifier (DOI, or "handle"), and include it in the storage service descriptor. || Shu || GEC16 || in progress || || || || 5g || Establish an archive service that provides long-term and reliable storage, with public access via a DOI from the global handle service, with the option to make the object searchable (or not) from a public interface on the archive service. || Shu || GEC17 || in progress, discussed with Antoine || || || || 5h || Add rules to process incoming descriptors in xml files, and store info in iCAT (3-4 weeks) || Shu, Antoine || GEC17 || ticket [http://groups.geni.net/gimi/ticket/18 #18] || S1, S2 || Need example metadata.xml files, including key values; start with v1.1 examples, then update to v2 examples. Optionally, have iRODS validate xml schema (syntax) and verify mandatory elements || || 5i || Add rules to move object to public archive service, and make any changes (3-5 weeks) || Shu, Antoine || GEC18 || || || || || 5j || Integrate archive service into handle service (1 week) || Shu, Antoine || GEC18 || || || || || 5k || Provide interface to iRODS to allow User who is authenticated to GENI CH Portal, to create an iRODS user account for themselves || Shu, Mike, Tom/Aaron || GEC17 || ticket [http://groups.geni.net/gimi/ticket/19 #19] || S2 || Work with Tom/Aaron to define API; GENI CH Portal acts as a GENI user; use POST to restful interface, and provide only basic functions by GEC17; later, add more functions || || 5l || Provide navigation from GENI CH Portal to iRODS GUI User Interface, identifying User, forwarding other information (what?). and allowing SSO || Shu, Tom/Aaron || GEC17 || ticket [http://groups.geni.net/gimi/ticket/20 #20] || S3 || || || 5m || || || || || || || || 5n || || || || || || || || 3a || Issue 2.1: How are persistent accounts established for each user in iRODSs? || Shu, Jeanne, Tom/Aaron || 5/22/13 || completed || || Per GIMI workflow discussion: use POST to restful interface on iRODS || || 3b || Issue 2.2: How are storage capacity limits established and enforced for each IRODS user? Are older objects (artifacts) flagged for removal? || Shu || || || || || || 3c || Issue 2.3: How are archive capacity limits established and enforced for each IRODS user? Are older objects (artifacts) flagged for removal? || Shu || || || || || || 3j || Issue 6.1: Need to establish rules if there is a discrepancy in descriptors. || Shu || || || || || || 3k || Issue 6.2: Need to establish rules for changing or removing metadata.xml files. || Shu || || || || || || 3l || Issue 7.1: When using a browser in the Experiment Management Environment (or elsewhere) to view artifacts (files and directories) in the GSAS, how will the associated descriptors (metadata) will be displayed? || Shu || || || || || || 3m || Issue 8.1: After the bag and .tar file have been created and used, is there some cleanup that should be done? || Shu || || || || || || 3n || Issue 8.2: After changes have been made to directories and files, what is the process for recreating the bag and .tar file? || Shu || || || || || == 6) Access to GSAS from !UserWorkspace using Experiment Management Tools == As explained in the overview of the GSAS, an experimenter (user) can access the GSAS from their !UserWorkspace using Experiment Management Tools [[br]] An "Experiment Artifact Management Tool" is required that includes: * iclient to access the GSAS * ability to manage the experimenter's artifacts, and push them to the GSAS * ability to formulate descriptors, put them into metadata.xml files, and push them to the GSAS * ability to create a bag and .tar file to form an object, and update them as needed * ability to archive an object, and update archived objects Who is going to prototype such a tool? part of GIMI project? when will it be available? [[br]] Versions: [[br]] Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 6 || "Experiment Artifact Management Tool" || || || || || || || 6a || || Cong || || || || see GIMI || || 6b || || || || || || || || 6c || || || || || || || || 6d || || || || || || || == 7) Access to GSAS from GEMINI I&M Tools == As explained in the overview of the GSAS, a service acting on behalf of an experimenter (user) can access the GSAS using an agent holding a proxy certificate. This is the arrangement that is being used within the GEMINI I&M tools. [[br]] An early version of the agent is available, but it must now be extended to push all measurement data objects, and associated metadata, as specified in the overview of the GSAS. The issues identified in the overview must be resolved. [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/040313_GEMINI_Figures.vsd Access from GEMINI to GSAS] [[br]] Versions: [[br]] Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 7 || "GEMINI agent to access GSAS" || || || || || || || 7a || || Ezra || || || || || || 7b || || || || || || || || 7c || || || || || || || || 7d || || || || || || || || 3d || Issue 5.1: Where is the proxy certificate created? How is the proxy certificate transferred to the GEMINI Measurement Store service? || Ezra || || || || || || 3e || Issue 5.2: What happens if the proxy certificate expires? Is the user notified? How can they load an updated proxy certificate? || Ezra || || || || || || 3f || Issue 5.3: How is the target information transferred to the service? || Ezra || || || || || == 8) Access to GSAS from GIMI I&M Tools == As explained in the overview of the GSAS, a service acting on behalf of an experimenter (user) can access the GSAS using an agent holding a an iticket. This is the arrangement that is being used within the GIMI I&M tools. [[br]] An prototype version of the agent must be built, to push all measurement data objects, and associated metadata, as specified in the overview of the GSAS. The issues identified in the overview must be resolved. [[br]] Who will do this? Cong? [[Br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/091012d_GIMI_Figures.vsd Access from GIMI to GSAS] [[br]] Versions: [[br]] Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 8 || "GIMI agent to access GSAS" || || || || || || || 8a || Use iclient in Exper Mgmt Environ to access iRODS, using GENI User credentials (cert and priv key) || Cong, Keileigh, Jeanne || GEC17 || || || || || 8b || Use iclient to setup directory structure in iRODS for experiment || Cong, Keileigh, Jeanne || GEC17 || || || || || 8c || Use iclient to push basic descriptor files to iRODS: project, experiment, step || Cong, Keileigh, Jeanne || GEC17 || || || Start with v1.1 schema, then update to v2 schema || || 8d || Use iclient to get iticket from iRODS for experiment directory || Cong, Keileigh, Jeanne || GEC17 || || || || || 8e || Use iclient to push artifact and descriptor files from GIMI Portal agent to iRODS, using an iticket, where GIMI Portal agent is a registered user of iRODS, authnticated with its own cert/priv key. || Cong, Keileigh, Jeanne || GEC17 || || || || || 8f || || || || || || || || 8g || || || || || || || || 3g || Issue 5.4: How is the iticket transferred to the GIMI portal service? || Cong || 5/22/13 || completed || || Per GIMI workflow discussion: init script gets, pushes || || 3h || Issue 5.5: What happens if the iticket certificate expires? Is the user notified? How can they load an updated proxy certificate? || Cong || || || || || || 3i || Issue 5.6: How is all of the target information transferred to the service agent? || Cong || 5/22/13 || completed || || Per GIMI workflow discussion: init script gets, pushes || == 9) Acceptance Tests == Integrated acceptance tests must be done between all tools.agents and the GSAS. [[br]] Jeanne will coordinate. Task list: [[br]] || '''ID''' || '''Description''' || '''Who''' || '''Due''' || '''Status''' || '''Demos''' || '''Notes''' || || 9 || "Integration of tools/agents with GSAS" || || || || || || || 9a || || Jeanne || || || || || || 9b || || || || || || || || 9c || || || || || || || || 9d || || || || || || || == 10) Tutorials and Experimenter Support == == A) Technical References == [http://groups.geni.net/geni/attachment/wiki/GSAS/082112b_MDASrvc_Figures.vsd MeasurementDataArchive prototype] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/042913_iRODS_Figures.vsd GSAS Configuration and Use Cases Figure (visio)] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/Visio-042913_iRODS_Figures.pdf GSAS Configuration and Use Cases Figure (pdf)] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/040313_GEMINI_Figures.vsd Access from GEMINI to GSAS] [[br]] [http://groups.geni.net/geni/attachment/wiki/GSAS/091012d_GIMI_Figures.vsd Access from GIMI to GSAS] [[br]] [wiki:TestTutorialExperimentStoryboard I&M Tools: Basic Test/Tutorial/Experiment Storyboard] [[BR]] [wiki:TestTutorialExperimentWorkflow I&M Tools: Basic Test/Tutorial/Experiment Workflow] [[BR]]