Changes between Version 31 and Version 32 of URNConsolidation

07/13/16 16:53:52 (3 years ago)



  • URNConsolidation

    v31 v32  
    1 [[PageOutline(1-3)]]
    3 = GENI Internet2 Switch Consolidation Procedure =
    5 This page defines the steps required to update stitching to handle PoP device consolidation that is taking place in Internet2 AL2S. This consolidation effort will replace existing AL2S Brocade devices with Juniper devices, and will converge the two distinct devices that currently provide L2 and L3 services into a single converged Juniper device in locations where AL2S services exist.  These steps outline the actions required at the GENI rack, AL2S AM, and at the SCS servers to incorporate URN changes (due to port changes) resulting from the consolidation.
    7 The steps include examples based on details from previous switch consolidation and their effect on GENI stitching sites connected to this switch.
    9 == 1. Generate Tickets and check for conflicts with upcoming GENI events ==
    11 === Create GMOC tickets ===
    12 Open tickets with GMOC for the scheduled maintenance events listing all affected GENI resources as soon as we receive notice of the scheduled days (this comes in an email from Eric Boyd NLT 1 week before the outage.  Confirm that GMOC generates corresponding requests to Internet2 Engineering (GRNOC).  GMOC tickets should notify operators and experimenters lists.  Although the outage is only scheduled for 1 day on the Internet2 schedule, the GENI ticket may require a longer outage.  The ticket should include a warning that the outage may be extended at the end of the day if there are issues with any updates (include this in your initial email to GMOC).  Adam Williams will coordinate efforts for GMOC, but initial requests should go to the usual address.
    14 Note that Internet2 schedules both an IP and an AL2S outage (usually on different days) for each PoP consolidation.  The IP event has no related GENI URN work needed, and will simply result in the GENI resources being unreachable (because the entire device is disconnected).  The GMOC should create tickets for both events, since they both have GENI impact, and the rack admins should see the tickets if they read their GENI operators email.
    16 Note that GMOC should check the GENI calendar for any conflicting events that are scheduled to overlap, and send followup email if they find any.  Rather than wait for this to happen,check the existing GENI tickets or calendar yourself for conflicts and notify any affected event coordinator via email directly.
    18 === Check the test SCS for affected sites and generate warning emails ===
    20 The GMOC may not have records of GENI connections for some nodes that are only supported on the test SCS (e.g. CloudLab).  If the scheduled maintenance will affect a test SCS node (rack, switch etc.), email the contact for that node directly and cc: gpo-infra, informing them of the scheduled outage, and asking them to be available to test connectivity after the update.  Add the test SCS node owner contact to any status update emails you send during the outage.  (Sometimes test SCS nodes are no longer in use, so the owner may indicate they can be retired instead of revised.  Include retiring resources as part of the outage work.)
    22 === Changing the Schedule and Escalation ===
    24 Internet2 won't change their schedule, but we can work with affected sites and event contacts to try to priortize the work to minimize the outage impact for priority sites if needed. 
    26 If the consolidation event goes longer than the scheduled outage ticket lasts, be sure to email updates to the GMOC and to anyone who was contacted via email (from the test SCS or event lists) as soon as you know an extension is needed.  Update the same list if you need to extend more than once.  If the event will continue to the next day, indicate when work will start again on the next day in your update.  You should send updates to the GMOC ticket for any significant events or changes that happen during the maintenance (version updates, bad hardware etc.).  Do not send these type of updates only to an ops email list, because that info won't get out to resource owners.
    28 If there are any significant problems during the event, escalate to Heidi Dempsey ( while you work on them (in addition to noting them in the ticket).
    30 == 2. Identify Affected Stitching Endpoints ==
    32 The GENI aggregate advertisement includes a ''stitching'' section which defines how VLANs are to be connected and which VLANs are associated with that stitching site.  To determine the impact of a consolidation on stitching, you must start by collecting the AL2S aggregate advertisement and reviewing its stitching definitions using the omni tool:
    33 {{{
    34    omni -a al2s listresources -o
    35 }}}
    36 Review the content of the stitching section in the output file rspec-al2s-internet2-edu.xml and see if there are any sites affected for the switch being consolidated. 
    38 For example there were several stitching endpoints for "" in the AL2S Advertisement:
    39 {{{
    40  <stitch:node id="">
    41  <stitch:port id="">
    42  <stitch:link id="">
    43  <stitch:port id="">
    44  <stitch:link id="">
    45  <stitch:link id="">
    46  <stitch:link id="">
    47  <stitch:link id="">
    48  <stitch:link id="">
    49  <stitch:port id="">
    50  <stitch:link id="">
    51 }}}
    53 From the above list we will request the "stitch:link id" to be updated. The "stitch:port id" transitions are implicit. In this example, there are 6 stitching endpoints requiring updates (2 InstaGENI, 2 ExoGENI, 1 OpenGENI, 1 network aggregate(iMinds) and 1 fixed endpoint (host-gpolab). 
    55 In GENI Network Stitching, a fixed endpoint is a resource that is not
    56 a GENI aggregate but still supports stitching.  Fixed endpoints are
    57 statically configured in the SCS servers to capture stitching
    58 information and are generally set up for specific demonstrations, or
    59 peering points. Fixed endpoints require no special SCS update or
    60 configuration, simply update the AL2S Advertisement and the fixed
    61 endpoint change will take effect.
    63 == 3. Define Stitching Configuration Changes ==
    65 Review Internet2 announced changes for switch names and ports. Based on the information, identify the changes to be made to stitching definitions for the stitching endpoints identified in the previous step.
    67 For example, using details from the consolidation email from Internet2 for the New York Switch:
    69 {{{
    70  Old Hostname:
    71  New Hostname:
    72         'Old Interface'                       'New Interface'
    73  100GigabitEthernet1/1   100GE                   et-3/1/0.0
    74  100GigabitEthernet1/2   100GE                   et-3/3/0.0
    75  100GigabitEthernet3/1   100GE                   et-7/1/0.0
    76  100GigabitEthernet5/2   100GE                   et-7/3/0.0
    77  100GigabitEthernet7/1   100GE                   et-4/1/0.0
    78  100GigabitEthernet7/2   100GE                   et-4/3/0.0
    79  10GigabitEthernet15/1   10GE                    xe-3/0/0.0
    80  10GigabitEthernet15/4   10GE                    xe-3/0/1.0
    81  10GigabitEthernet15/5   10GE                    xe-3/0/2.0
    82  10GigabitEthernet15/7   10GE                    xe-3/0/3.0
    83 }}}
    84 From the check of the AL2S stitching Advertisement, we know that there are seven stitching sites impacted by this URN transition. Define a list of each of the expected changes. The table below highlights each of the transitions:
    85 ||'''Old URN                                             || ''' New URN ''' ||
    86 ||      || ||
    87 ||      || ||
    88 ||      || ||
    89 ||      || ||
    90 || || ||
    91 ||    || ||
    92 || || ||
    94 Note that Internet2 may change the port assignments or port names in the course of their work with the hardware, which happens before the GENI scheduled maintenance begins.  Internet2 engineering ops team must notify the respective GENI operations teams as soon as possible via the GMOC ticket when such a change occurs. GENI operations teams will then update the various configurations to reflect this new change. The GMOC is responsible for coordinating with Internet2's engineering ops team for changes such as this.
    96 == 4. Request Stitching Changes from GENI Aggregates Operations Teams ==
    98 URN transition requires co-ordination with various teams. Get positive confirmation in email before the scheduled outage that at least one person from any affected ops team will be available at the time the scheduled outage begins through the end of the scheduled outage.  Remember to get confirmations from the contacts for any resources affected in the test SCS as well, because they may not be part of any of the usual ops teams.  Following are the teams/contributors that handle the transition based on the type of aggregate:
    99   * InstaGENI: ( Request is handled by Hussam Nasir (
    100   * ExoGENI: ( Request is handled by Mert Cevik ( or Ilya Baldin (
    101   * OpenGENI: Marshall Brinn ( and Regina Rosales-Hain (
    102   * iMinds - Brecht Vermeulen (>
    103   * host-gpolab - Was configured locally and are to be made to those local definitions.
    104   * AL2S GENI Aggregate: GMOC requests( AL2S Advertisement updates handled by Luke Fowler ( or AJ Ragusa (, cc: both on initial request.
    105   * Internet2 Production SCS: GMOC requests( Updates handled by Luke Fowler ( or AJ Ragusa (, cc: both on initial request.
    106   * Test SCS - Xi Yang (
    108 Note: All aggregates' advertisements '''must''' be updated before the SCS servers. The SCS discovers the new stitching path information from the aggregates stitching advertisements.  SCS is statically configured for fixed endpoints.
    110 === 4a. Change Request Details ===
    112 Based on the existing Stitching information and the announced changes, generate a list of new link IDs to be used at each site.
    114 Following is an example from the New York transition, where GPO IG and NYSERNet URNs changes were requested from InstaGENI Team:
    115 {{{
    116 Link ID:
    117 Remote Link ID:
    118 VLAN Range:       3596-3600,3706-3732,3746-3749
    120 Link ID:
    121 Remote Link ID:
    122 VLAN Range:       1700-1719
    123 }}}
    125 GPO EG URNs changes were requested from the ExoGENI Team:
    126 {{{
    127 Link ID:
    128 Remote Link ID:
    129 VLAN Range:       3741,3736-3739
    130 }}}
    132 GPO OG URNs changes were request from OpenGENI Team:
    133 {{{
    134 Link ID:
    135 Remote Link ID:
    136 VLAN Range:       2611-2630
    137 }}}
    139 Wall2 iMinds URN changes were requested from the Iminds Team:
    140 {{{
    141 Link ID:
    142 Remote Link ID:
    143 VLAN Range:       1125-1164
    144 }}}
    146 AL2S Aggregate URN Changes were reqeusted from Internet2 via the GMOC:
    147 {{{
    148 Link ID:
    149 Remote Link ID:
    150 VLAN Range:       1125-1164
    152 Link ID:
    153 Remote Link ID:
    154 VLAN Range:       2611-2630
    156 Link ID:
    157 Remote Link ID:
    158 VLAN Range:       3741,3736-3739
    160 Link ID:
    161 Remote Link ID:
    162 VLAN Range:       3596-3600,3706-3732,3746-3749
    164 Link ID:
    165 Remote Link ID:
    166 VLAN Range:       1700-1719
    168 Link ID:
    169 Remote Link ID:
    170 VLAN Range:       2646
    172 Link ID:
    173 Remote Link ID:
    174 VLAN Range:       3581-3595
    176 }}}
    178 === 4b. Submit Change Requests to Teams ===
    180 Send Email to each of the teams to request the above changes. For example, for the New York Switch updates change request, emails were sent to these aggregate teams: IG, EG, OG, iMinds and Internet2 AL2S.
    182 As a courtesy, copy the rack admin contact(s) or email list from the [ Operators] page on these requests.  They don't have to take any action, but they may want to know that their racks will be potentially unable to stitch for a period of time during the scheduled outage.
    184 Also copy the GENI Monitoring team (, and With the exception of the ATLA consolidation, this work should not require any immediate action for monitoring, but the folks at UKY may want to note the "retired" URNs in their database, and to pay extra attention to their monitoring site during these transitions.
    186 ''' ''Once the requested changes are completed, verify that the requested changes appear in each of the GENI aggregates stitching advertisement.''' ''
    187 {{{
    188 $ for i in gpo-ig gpo-og gpo-eg nysernet-ig al2s wall2 ; do stitcher listresources -a $i -o; done
    189 }}}
    192 Review all listresources output files to verify that the correct URN is in place for each advertisement.
    194 ==== InstaGENI Update Details ====
    196 InstaGENI updates follow this approach:
    198   1. Ask, which maps to Hussam ( running the commands below on the rack boss node.
    199   2. Or, a site contact may be asked to log into boss node and run these commands (Hussam does not maintain a few dev racks, or racks that being provisioned by GPO that are not yet completed).
    200   3. Or, the engineer coordinating the scheduled maintenancce, can reqeust an admin account on the boss node for this work from the rack ower(via the web UI for the site, e.g. for gpo-ig).  Once that account is approved, you can run the necessary commands for the update remotely.
    202 ''Note: Options 2 and 3 are not likely to happen, as option 1 has always taken place as expected. '' 
    204 The InstaGENI changes that will made to the external network definition for AL2S will be used for the stitching configuration. Below is an old example of the commands used to modify the URN for the uwashington-ig external network.  These commands are executed on the InstaGENI '' '''boss''' '' node:
    205 {{{
    206  mysql tbdb -e 'update external_networks set external_interface="" where network_id="ion"'
    208  mysql tbdb -e 'update external_networks set external_wire="" where network_id="ion"'
    209 }}}
    211 ''Note: Be aware of potential line wrapping pitfalls.''
    213 In the above example, these commands assume that in the "external_networks" table on the boss node, there exists an entry named "ion" in the "network_id" column and an associated URN ending in "<InstaGENI Sitename>-ig". On most racks, these values will be either "ion" (for racks existing during the ION age) or "al2s" (for racks existing during the AL2S age). To determine the value for "network_id" for a given InstaGENI rack, use the following:
    215 {{{
    216 mysql tbdb -e 'select * from external_networks'
    217 }}}
    221 == 5. Request SCS Servers Update ==
    223 In order for GENI Network Stitching to pick up these path configuration changes, an SCS update must be run.  There are two SCS systems:
    224   - Test SCS maintained by Xi Yang (
    225   - Production SCS maintained by the Internet2 ( Updates handled by Luke Fowler ( or AJ Ragusa (
    227 The Production and Test SCS include stitching information for different sets of aggregates.  To find out which SCS knows about which aggregates, issue the following GENI tools commands:
    229 For the Production SCS:
    230 {{{
    231  python ~/gcf/src/gcf/omnilib/stitch/ --listaggregates --scs_url >scs-prod
    232 }}}
    233 Look for the aggregates identified in the earlier steps. For example for the New York switch consolidation effort, the 'listaggregates' function shows that the GPO IG, GPO EG, and NYSERnet IG sites are known to the Production SCS.
    235 For the Test SCS:
    236 {{{
    237 python gcf/gcf-current/src/gcf/omnilib/stitch/ --listaggregates --scs_url > scs-test
    238 }}}
    239 Look for the aggregates identified in the earlier steps. For example for the New York switch consolidation effort, the 'listaggregates' function shows that sites GPO IG, GPO EG, GPO OG, NYSERnet IG, iMinds, and Umass are known to the test SCS.
    241 Send a request to:
    242  - the GMOC to the Production SCS
    243  - to Xi to update the Test SCS.
    246 == 6. Validate Updated Stitching ==
    248 When the updates are completed for all Aggregates and for the SCS servers, testing takes place to verify the URN changes. Validation includes:
    250  * Verify Advertisement for AL2s and GENI aggregate that were updated. If the new URN is missing from the '' '''stitching''' '' section, contact the appropriate aggregate team. [[BR]]
    252  * Create stitched slivers with the production SCS that uses each of the rack aggregates that were updated and connects it to a remote stitching site.   Login in to one node for each sliver and leave some ping traffic running.  '''DO NOT''' delete these slivers used later in monitoring verification. If Production SCS reports unknown path, contact Luke or AJ about updating the production SCS. [[BR]]
    254  * Create stitched slivers with the test SCS, which can be done by using the omni/stitcher option '' "--scsURL" '' that uses each of the rack aggregates that were updated and connect them to a remote stitching site.    Login in to one node for each sliver and leave some ping traffic running.  '''DO NOT''' delete these slivers used later in monitoring verification. If Test SCS reports unknown path contact Xi about updating the Test SCS. [[BR]]
    256  * Update the GENI aggregate page for GENI Aggregate ( to capture the new stitching ''' ''link'' ''' details. [[BR]]
    258  * Review the [ Operators] page to replace any instances of old URNs or old switch/port names.  Check the network drawings as well as the text.  It is OK to add notes to the network drawing section, because revising the drawings usually requires getting a new drawing from the site, which takes longer than the scheduled outage.[[BR]]
    260  * Review the GENI VLAN Delegation page at, to make sure that instances of the old switch name no longer appear. If old instances appear, send email to GPO Infrastructure group ( and cc: Ali Sydney ( to make corrections.
    262  * Update " to replace any modified interface information, see example from Salt Lake update: [[BR]] 
    264     ||''' Router'''           ||''' Interface''' ||''' Site ''' ||''' VLAN Range''' ||
    265     ||  || eth7/1          || utah-stitch || 2100-3499        ||
    267 With the new switch details:
    269     ||''' Router'''           ||''' Interface''' ||''' Site ''' ||''' VLAN Range''' ||
    270     ||  || et-4/3/0        || utah-stitch || 2100-3499      ||[[BR]]
    272  * GENI Monitoring URN Validation. Login into and use the '' '''search''' '' feature to find all data relating to the new AL2S switch, for example "". Make sure the following are returned:
    273   * a switch is listed with the new name "",
    274   * interface statistics are available for the new switch,
    275   * VLAN are being reported for the new switch[[BR]]   
    277  * Report back about test finding or any outstanding/unresolved issues.
    280 == 7. Update and Close Tickets ==
    282 Assuming all tests are successful, update and close all tickets by emailing the GMOC and any individual resource owners who were contacted but not included in the tickets.  If there are outstanding issues that are significant, leave the ticket open until they are resolved.  If there are smaller outstanding issues, close the maintenance tickets, and open new tickets with the appropriate owners to track and resolve, ideally before the next maintenance.
    284 If this process needs revision to account for events that occurred during the maintenance, email the ops teams and follow up with discussion or revision as appropriate.
     2This page moved to