Artur Barczyk's notes from the Architecture Workshop

 

[Summary of the discussion, presentations are posted and not commented on here.] Review of the (current) LHC schedule:

  •   2012: LHC running until November 2012.

  •   2013-2014: LHC shutdown, restart in late 2014.

    In 2012, expectations are that the bandwidth demand might grow up to 2x wrt 2011. The impact of the 2013/2014 shutdown will be mainly on the LHCOPN, as no new data will be produced, but analysis of the 2011-2012 data samples will continue, including reprocessing of data sets.

    In Europe, the local Layer 2 domain implemented by GEANT and 4 NRENs (RENATER, DFN, GARR, RedIRIS) is already considered production infrastructure by the sites using it. Efforts have failed so far to interconnect this VPLS domain to other parts of the LHCONE at Layer 2. Same applies to the VPLS implementation in Internet2.

    It has quickly become clear that both a long-term strategy as well as a short-term solution for global connectivity is needed. For the short-term fix, the current Layer2 approach will be complemented with Virtual Router instances to interconnect the Layer 2 domains. While GEANT and Internet2 have successfully implemented Layer 2 domains through VPLS, it is not clear at this point why these domains cannot be connected at Layer 2. Using Layer 3 to interconnect them is the quick fix for the short term. To keep the VPN aspect of the LHCONE, virtual routing instances will be used (VRF).

    For the long term, several new technologies have been presented (see the slides). Especially Carrier Ethernet (based at lower level on PBB-TE and/or MPLS-TP) is seen as the enabling technology for global Layer 2 services. IEEE’s Shortest Path Bridging and/or IETF’s TRILL are interesting technologies for solving the multipath problem, but so far foreseen for data center environment, and would necessitate more R&D. Software-Defined Networking has high potential to solve all the above issues, but also here the technology is brand new, although with strong industry support. There was agreement that these are the right directions for the LHCONE in the long-term, and an SDN-based LHCONE is the target.

    Overall, five activities have been defined (the names in brackets indicate advocates of the activities, i.e. charged with organizing the activity).

  1. Quick-fix using VRF (Eric Boyd, Mian Usman)
    A VRF infrastructure will be established ASAP. Based on the experience of the GEANT LHCONE node, once the VRF mesh is working LHCONE will likely become “production” infrastructure for the LHC very quickly. Initial VRF peers will be the current L2 islands (Internet2 at Starlight and MANLAN; GEANT, and CERN. ESnet?? NetherLight attaching to several of the VRF routers over circuits that they supply. That is, the Dutch/Nordic LHCONE node will not have a VRF device. It was agreed from the beginning that only networks that can provide transit will provide the VRF infrastructure.

  2. Layer 2 multipath using 8021.aq or TRILL (Ronald van der Pol, Gerben van Malenstein) There was a lot of agreement that Activity 2 and 3 (OpenFlow) are the likely future of LHCONE, but even the advocates agreed that this is in the future.

  3. Openflow (Eric Boyd)
    This is seen as a likely adjunct to or even implementation of multipath (activity 2) as well as the

lower layers of dynamic circuits (activity 4 below). Openflow is also one of the technologies behind Software-Defined Networking.

  1. Point-to-point virtual circuits pilot (Lars Fischer, Gerben van Malenstein)
    There was general agreement that there would be production IDCs in GEANT and several NRENs within the next 6 mo. In addition to ESnet and Internet2 (and USLHCNet). Initially using IDCP, migrating to NSI as soon as available (the NSI is making fast progress, c.f. Lars’ slides).

  2. Diagnostic infrastructure (Eric Boyd, Richard Hughes-Jones)
    This is seen as a deployment and interoperability testing exercise. Goal is to have each site connected to LHCONE be able to test and-to-end with all other LHCONE sites.

  3. Overarching activity: Determine what impact LHCONE will have, if any, on the LHC software stacks and sites. Involve the experiment technical leads. (Bill Johnston, Artur Barczyk) Information is needed on how LHCONE will impact / integrate with LHC software and sites. For many sites, LHCONE is expected to be transparent to the software, but not transparent to the site network configuration. What will be the use of ephemeral (bandwidth-on-demand) use of virtual circuits? How will addressing be handled in this case?

Activity 1 will provide short-term implementation, or “fix” of the current pilot, while activities 2-5 will lead to prototype implementations, and will eventually need to merge into a “next generation” service pilot.

The target date for the start of migration from the VRF-based solution is mid-2013/early-2014, c.f. milestones. Potential roadblocks and necessary changes in direction have to be identified within one year from now.

Milestones

To set the milestones, we have worked backwards from the target of full deployment in time for the LHC restart in late 2014, i.e. production-readiness by early/mid-2014. Given the R&D nature of the activities 2-5, more detailed milestones will be defined in Berkeley. Put in chronological order:

Jan 2012: VRF solution operational. At the LHCONE meeting in Berkeley: activity leaders to report on timescales for their relative pilots: what can be achieved by a) July 1, 2013, b) January 1, 2014.

End of 2012: identify potential show stoppers, change in directions Mid-2013-early 2014: phased migration from VRF to the new infrastructure Late 2014: LHC restart, full production use of the new infrastructure 

 

Submitted by David Foster on Mon, 12/05/2011 - 09:57