This draft of an MDOD is an interpretation based
on the initial draft MDOD presented by Harry Mussmann
at GEC11 plus suggested modifications by IU.
This draft version has also been influenced by comments
on the MDOD from the Instrumentation and Measurement
group at GEC11, draft MDODs by Jason Zurawski, and
documentation from the IF-MAP group.
Version 0.2 of this schema has also been influenced by
discussions at GEC13 with the Instrumentation and Measurement
group, well as general feedback and summary of MDOD
status and issues by Giridhar Manepalli (CNRI) at GEC13, and
the NetKarma provenance repository and schema.
author: Scott Jensen, Indiana University
Should the MDOD be a single measuremet stream (e.g., from a single MP),
a collection of measurements, or measurements plus derived products
such as analysis or presentations prepared based on transformations of
the measurement data. Initial drafts of the MDOD described as all of the
measurements for an experiment.
This draft assumes that the MDOD is a collection of measurements,
provenance, and derived products or transformations of the measurement data,
that describe an experiment or related set of experiments. There could
be multiple data objects in the MDOD that are derived from
subsets of the measurement data where some subsets overlap, but the
relationships between data objects are not strictly a tree (i.e., an MDO
could have multiple results derived from it within the MDOD).
This schema currently has 5 top-level elements, but only minimal identification is required:
identification: This is the identification for the MDOD as a collection.
provenance: This is an OPM provenance graph - such as the graph
generated by NetKarma for an experiment. For an aggregate
this can represent how the MDOD itself is created.
security: optional element for setting policies. Can also be set
at the underlying dataDescriptors or inheritied there from
the MDOD level.
dataDescriptor: There would be a seperate instance of this element for
each MDO or derived data product described within the MDOD.
mdodReference: An MDOD could include other MDODs by reference either
locally or based on a URL.
When an MDOD represents a bundle that is archived and shared
it would be assigned a DOI identifier. Otherwise it can
have an internal ID.
For an internal ID, the format could be required to be consistent
with the draft at GEC11 where an ID follows the format:
domain:subdomain+object_type+object_name
The dataDescriptor is for measurement or other data objects
local to the MDOD that are not stored or accessible from their
own MDOD. The data descriptor could represent data stored in
another location.
The dataDescriptor maps to the descriptor in the draft MDOD version 0.2.1
All of the content of the descriptorSecurity element is optional.
Should there instead be only nested MDODs?
Both the MDOD itself and the descriptor have identification elements. The
identification section within the data descriptor would pertain to a single
data object whereas the MDOD identification section relates to the MDOD as a
whole, which can represent a set of measurements and derived data products.
Since the dataDescriptor's identification should be populated automatically
if possible (users do not enter metadata), the abstract and subject are moved
to the MDOD level and eliminated here.
The original MDOD had the type and value separate with path, url,
and other as the types and text for the value. Here they are a
choice and type is indicated by which element is used.
Contact was made optional. If the data being described is local,
the contact could be redundant.
Is the scope necessary? a path would be local, and any other value
would be external?
Are locators other than paths or URLs needed?
The original MDOD schema had three options for the scope of the locator:
global, per_association, and within_holder. Are these needed? If path
based it's local and if URL based it would be global. Is there a need
for other alternatives such as association?
In this draft the policy and method elements are sourced strings. This
approach would accomodate standardized policies within GENI that could be
specified based on a controlled vocabulary, but the ability to express
more complex policies may be desirable.
The MDOD can describe both measurements and transformations such
as the analysis or a presentation generated from the measurement
data, or even an external publication of the results.
If these different types are described by fundementally different
metadata, the dataDescription should contain additional alternatives
other than the measurement event or analysis event.
A prior version of the measurement event was based on the MDOD
version 0.2.1 draft discussed at GEC11. There was discussion as
to whether it should be extended to capture different measurement
tools and vendor extentions that allow for future development.
Initial versions included more detailed elements such as flowrate and size.
The MDOD and dataDescriptor should describe what was captured, not
the measurements themselves, so do we need that level of detail?
Interpretation method is included as a string based on a controlled vocabulary
which is the source. Do we need to extend this further to be machine readable?
Would the interpretation method need to be able to specify configuration parameters?
The analysis even would capture derived products such as an
analysis based on measurement data or a presentation.
The path based reference should include the ID of the referenced MDOD.
Is there an ID available to identify users in GENI?
Should there be a project ID or other association?
Do we need an enum for type of contact (e.g., user, operator, aggregate provider)
Policies in the draft MDOD from GEC11 had an enum as to whether there is a policy
and an accommpanying optional description. Instead of a description, should
users be able to provide a URL where the poliiy is? The use of a URL to the
policy instead of the policy itself.
A version element is included in case a policy URL only reflects the
current policy, or contains multiple versions of a policy.
The sourced string type allows keywords, data types, and
other elements in the schema to be specified based on
cotrolled vocabularies created by communities internal
or external to GENI. To the extent that a value is based
on a controlled vocabulary, the defining source, prefferably
a resolvable URI, would be included.
If not based on a controlled vocabulary, should the
attribute instead be optional or should the value "NONE" be
used as the source.