For Completeness My Notes from the Washington Meeting

 

See slides for the detailed presentation content. Here are some of the things that were said and discussed.

 

http://indico.cern.ch/conferenceDisplay.py?confId=131550

 

Dave Lambert

  • I2 is interested in LHCONE and it represents the future of Global Science Networking.
  • The globalization of science and networking brings great opportunities.

 

Rob Vizteke

  • 100Gbps/wave, 8.8Tb networking being constructed by I2 where they own the fibers. 20 fiber pair, with long term (20-30year) IRU.
  • Will be native 40/100 Gbps network with 10Gbps muxed up.
  • IP/MPLS backbone.
  • Sliceable network with Openflow, but commitment is for a production L2 service.
  • Partnering with ESNet on a testbed facility.
  • New L2 service instantiated as a distributed open exchange. 30 Node US, 40 node total being constructed.
  • How this affects the business model is to be studied but I2 is committed to this approach.

 

How to maintain the diversity as we have with 100G waves? Many less 100G waves of course.

  • Mix of 10/10 and 4/10 on many paths.
  • Will mix and match to spread over transponder paths.
  • Will be a while before 10G waves are across 100G paths.
  • This is a valid concern and problem.

 

How to engineer that the network is used and bring it to the community?

  • One of the purposes of the ESNet testbed.
  • I2 is trying to demystify the network and make sure it is used. Partly this is done through lower access costs but it will take time.
  • Substantial outreach is needed but it is starting and various workshops are planned and starting.
  • Labs have to re-architect their networks to separate out the large science flows from the general purpose IP flows.

 

LHCONE Status

  • Route servers are at CERN and MANLAN with 2 more being tested in GEANT.
  • CERN has assigned an address block for the GEANT connectors for LHCONE.
  • There will need to be a clear governance strategy for the address block maintenance.

 

Open Exchange Points

  • Surfnet waiting for the CIENA dual-carrier 100G transponders for the GVA-AMS connectivity.
  • 2 European prototypes
    • The Open Exchange prototype
    • The GEANT prototype (an aggregator network or distributed exchange?)
    • These should be connected which was agreed in Paris.
  • Exchange points allow many different types of connectors including commercial connectors potentially.
    • Commercial companies may offer services.
  • Anyone can connect to an exchange but not necessarily to LHCONE as a project.
  • For interconnected exchanges anyone can connect to anyone else by mutual agreement. The exchange point owner does not dictate this.
  • Does the link provider for connectivity between exchanges dictate who can exchange traffic?
    • No but links can have a restricted purpose e.g. “LHC Only”
    • These links will be a scarce resource and there are issues of policy, cost and scalability.
    • User communities (e.g. LHC)  could have dedicated links dedicated for their purposes between exchange points. This is essentially a traffic engineering problem. The exchange point infrastructure has to be able to enforce such policies.
  • Is an open exchange non-blocking?
    • For a single exchange point yes. But a distributed exchange is a more difficult problem because of scarce resources interconnecting exchange points.
    • Transparency from the clients view point is the real requirement.
  • What is LHCONE?
    • LHCONE is not an open exchange point in this context as it is only open to the LHC community.
    • LHCONE is an overlay on existing open exchange points with specific policy.
    • Some resources may be dedicated to LHCONE (e.g. TA circuits).

 

LHCONE – North American Components

  • 3 nodes in NA – Starlight, Manlan, Maclean
  • Interconnected via VPLS and L2 VPN’s on existing networks.
  • Lots of T2’s already connected for getting to the T1’s.
  • I2 will also act as an aggregator network to allow sites to come in over that path also.
  • Architecture allows shared VLAN and P2P dynamic services and I2 would use a combination of this.
  • Chicago
    • FNAL will provide a 6509 for policy implementation
    • LHCOPN should be untouched and not be broken
  • North America is building a transparent distributed exchange.
    • If you come into separate nodes its like as though you both came into the same switch from a policy standpoint.
  • Washington
  • The switches in the exchange point implement not only the policies but also are fundamental to the services.
    • Some sites will come through the general purpose exchange point part and will need to be patched to the LHCONE switch if there is a dedicated one per exchange point.
  • Unresolved questions
    • If a link policy-wise look like a node, how is contention handled?
    • If we are using a general purpose exchange how do we implement the policies for the community?
    • How is that handled across exchange points?
  • Need to revisit the architecture document and look at the policy associated with each service.
  • How Canada and Asian aggregation would be stitched into the US infrastructure has yet to be clearly defined.
    • Toronto is the first target and then aggregate the rest of the T2’s in Canada.
  • Additional hardware in the exchange points is appropriate to get the work going which can be removed later.
  • In starlight the commercial providers don’t interwork with each other, they are there to serve the community.

 

TA Connectivity in LHCONE

  • An additional TA circuit (NY-Paris) on behalf of Garr, DFN and Renater is foreseen to be implemented by Dante for circuit purposes.
  • Experiments are performing baseline tests today before moving to LHCONE.
  • The idea is to have multiple exchange points in Europe and US with multiple links for redundancy and multipoint services.
    • But technology is limiting this at present using layer-2 between regions. L3 would not be a problem.
      • FabricPath from CISCO is available now. Handles a scalable mesh of L2 links.
      • Other pre-standard technologies are available.
    • Is the problem the internal network or how the services are presented at the edges?
      • This would insulate the core from the end customers.
    • The idea of a distributed exchange point with the TA inbetween was not really considered in the architecture document.
  • Some countries have stated that the routed IP services are sufficient and they should be able to continue to do the physics.
    • But will they continue to be competitive?
  • The interest in LHCONE has grown and there is growing demand for sites to be connected.
    • But how can this be managed and controlled to make sure the priority is to satisfy the science?
    • What are the success criteria?

 

LHCONE Implementation in GEANT

  • New switches in the GEANT PoP’s will be available around a year from now.
  • Perfsonar work is being done in collaborations with the US.
  • First priority is the shared VLAN service.
  • VPLS is a problem TA because link instability will cause topology instability.
    • VPLS is constrained inside Europe.
  • Costs are minimal inside Europe for the prototype
    • Ports, route servers.
    • Operations costs are yet to be made clear.
  • All connectors need to peer with all route servers on all continents (as it’s a shared VLAN service)
    • Why do we need so many route servers?
      • Geant connected sites will rely heavily on them.
    • Essentially the route server is the control plane for the L3.
    • A more in depth discussion is needed.
    • 2 issues
      • announcing routes and availability
      • but does not eliminate the agreement between the sites as to how the traffic will be in fact exchanged.
  • Germany has a 10G ring around the 5 core DE T2’s.
  • This is integrated into the MPLS infrastructure around all sites.
  • Prototype connectivity for the shared VLAN service is proposed to be a bonded 2x10G links one from GEANT and one from USLHCNet.
  • There are some significant shortcomings in the loop-free shared VLAN service in terms of resilience.
    • What is the fix? Move to P2P only? To be decided.
    • Another issue that there is a lot of potential bandwidth that cannot be used.
    • Should the T2-T1 be a P2P service and T2-T2 be a routed infrastructure?
      • FTS dynamic probing is inadequate.
    • Are we simply moving towards a private IP infrastructure?
      • This may eliminate some intractable problems.
      • Port costs may become prohibitive.
  • Are the different approaches in Europe being presented fairly?
    • P2P can use any of the available TA circuits.
    • The shared VLAN can only use a subset of that available and proposed as the 2x10G bonded circuits.

 

LHCONE in Asia

  • If ASGC network connects in Chicago and Amsterdam we would introduce a loop if that was used for the shared VLAN service.
  • The ASGC network could be considered as an aggregator network and connect as that.

 

Challenges of the Shared VLAN service – Edoardo Martelli

  • Original idea was the shared VLAN should be small but now the ambition has grown to be much bigger. This is a risk to stability.
  • Should have a small core network run by one operator.
    • 2 nodes in Europe, 2 in the US
  • All the rest are aggregation networks.
  • Separate devices collocated at open exchange points dedicated to LHCONE.
  • Only for shared VLAN service, the P2P services across many open exchanges is still valid.
  • Aggregator networks only connect to a single core node.
  • The core could be a shared VLAN or a routed network.
    • The problem of unwanted traffic between sites in a L3 core could be fixed with BGP communities.
  • Proposal is to replace a single global shared VLAN with a hierarchy of shared VLAN’s.
  • So the whole thing could in fact be a layer 3 network.
    • This re-casts LHCONE as a private internet. But this would be a major change.
  • There is a natural LHC break in mid 2013.
    • Can engineer something now and review later but it must be future proof and we need to keep the confidence of the user communities.
  • USLHCNet could run the core and create a resilient TA infrastructure based on layer-1 as it does today.
    • Requires dedicated TA capacity to the core.
  • I2 could run the core infrastructure possibly?

 

Open Lightpath Governance – Bill St Arnaud

  • Next meeting for discussing this will be in Rio.
  • Establishing an Open Lightpath exchange in London is a priority.
    • Provides an additional landing point for international capacity.
  • Important to raise the awareness with funding bodies.
  • The importance of open exchanges is relevant to other sciences as well as more and more large data sciences start.

 

Discussion

  • There are 2 main issues
    • Governance
    • Success Criteria
  • What is the starting point for the prototype?
  • Governance
    • Governance of what?
      • What is the change management process?
      • What is the capacity planning process?
        • Who is using what capacity.
        • How capacity is being used is more complex at L2 and P2P.
        • Needed to defend future funding requests.
      • There are many stakeholders.
      • Activities have been consensus driven so far.
    • Architecture document identifies
      • Policy and requirements for participation
      • Cost sharing
      • Resource allocation
    • Steering committee (can this be done?)
    • Technical committee (how this can be done)
    • Capacity will have to be used equitably from Atlas and CMS.
      • How would circuit reservations be balanced?
        • Could introduce a number of other complex issues.
      • Need information gathering on usage.
    • Need to agree on the committees.
      • Steering
      • Tech architecture
      • Operations
      • Users and stakeholders
    • Need to agree on the questions to the committees.
    • Need to agree on the constituent groups.
      • Network Providers
      • Exchange Operators
      • Experiments
      • Tiers
      • Continents

 

Decision: Everyone to sign up for their favorite committee on the whiteboard

 

Success criteria

  • Experiments want to benchmark todays performance and show improvements.
  • Certain European sites have problems with some us sites today.
  • Need to separate network component from other end-end effects. Active monitoring, e.g perfsonar, are important in order to drill down and isolate problems.
  • Funding agencies need to know how it impacts the science.
    • Is it a good service from the end site point of view
    • Is this a reasonable service from the network providers view
  • Need to establish a rough consensus on the goals.
    • Long term objective
    • Short term goals
  • E.g. Set of target services and an implementation planning on the timeframe of a few months.
  • LHCONE was also proposed as a defensive measure to protect GP IP services from high science traffic.
  • Measures of service availability is important.
  • The criteria need to be defined as a function of the specific services being offered.
  • There is an issue in matching network issues up to the user experience.
  • Need a one page document on success criteria.
    • A one page brief is being prepared by ATLAS. Probably equally applicable to CMS.
    • This can be the starting point. To be circulated to the mailing list.
    • Objective is to have this worked out before the next LHCOPN meeting.

 

Transatlantic resilience

  • Be careful on the term prototype. Users engaging in using it will expect it to continue and be reliable even if they will accept some problems.
  • If this is not a prototype leading to something then its more a testbed and that will be hard to get user involvement.
  • New core architecture from I2 showing a SONET mesh connected to the general R&E edge routers.
  • USLHCNet proposes creating a L1 redundant infrastructure for TA connectivity to which the exchange points can connect.
    • Nordunet says they can provide that
    • I2 could also provide that.
    • GEANT proposes some variant
  • This is for the P2MP service (shared VLAN). The P2P service would be able to do anything.
  • This is a “now” solution. In the medium term it could be replaced.
  • GEANT can also provide resilience on TA links.
  • Could we imagine a situation where there is a single infrastructure and operation run by one entity where there are different people contributing to one architecture.
  • There is a lot of self interest on terminating links on their own boxes. The open exchanges was supposed to fix this.
    • But there are multiple “open” exchanges. Netherlight and the GEANT PoP in Amsterdam (in the context of LHCONE, the GEANT PoP is considered to be “open”)
  • GEANT proposal combines USLHCNET infrastructure and NREN owned resources as well as connection to Netherlight.
  • The shared VLAN infrastructure is important to be managed and controlled by a single entity.
  • But this does not scale and we need distributed management and responsibility. The shared VLAN may require this as an exception.
  • There are big operations questions in this complex multiple domain networks.

 

Shared VLAN Service

  • Do we really need it.
  • If not, dynamic capabilities would be available in the US and for Europe it would be static for the next 9 months or so.
    • Uptake would be slower and have to work with the sites.
  • Shared VLAN poses a lot of technical issues
    • Concentrate on dynamic IP service
    • Build an IP backbone instead
  • Another option is to use local VLAN’s and then route between local VLAN’s.
    • Would support multiple paths between segments.
  • Charge to the technical architecture group
    • P2P Service
    • The interconnected local VLAN service (Joes solution)
  • But … we re-introduce routers
    • Potentially adds additional cost
  • Target is by the next OPN meeting to have major progress on the prototypes including Joes solution and also make progress on P2P
    • Technical architecture group to work on this.

Science DMZ

  • The T1 sites in the US and some other sites including T2’s are doing the DMZ approach.
  • Talking and convincing the right people in the sites is a challenge.
    • It adds complexity, including having DNS services that need to respond differently for internal and external queries.
    • Can lead to a big reduction and simplification of firewall rules.
    • All IPv6 hosts can run a firewall and the security issues should be moved back to the end hosts.

Submitted by David Foster on Tue, 07/12/2011 - 17:01