WP 6 - High Bandwidth on Demand - draft of MS601

This page is a work-area for the WP6 participants to work on the Milestone-601 document. It is not (yet) the final document.

“Milestone 601 will announce the “Overall BoD system architechture” which describes the BoD application in terms of requirements, functionality and evaluation criteria. This will include the selection of the underlying BoD systems based on scalability, reach (participating NRENs and paths to radio telescopes), openness (public specification and standard track), interoperation with other BoD systems and expected robustness.”

MS601 draft document

1.0 Introduction

In the EXPReS project, NRENs together with radio astronomical institutes built a network for real-time e-VLBI observations with EVN telescopes. This network is actually carried by the NREN backbones, and GEANT, with support from GLIF for the intercontinental lightpaths. These connections, most of them lightpaths, are currently permanently allocated to e-VLBI and run from the various observatories to JIVE. With new telescope backends such as DBBCs, many EVN telescopes will be able to support higher datarates than the current 1Gb/s, which in turn will increase the sensitivity of the observations. Keeping these networking resources permanently allocated, especially at the bandwidths e-VLBI will need in the future, would be an inefficient use of these networking resources. More flexibility in the topology of the network would enable the use of other telescopes and correlators, and enable experiments with distributed correlation. The high bandwidths required for e-VLBI means that this traffic cannot generally be handled by regular routed networking connections.

At the same time, many NRENs are deploying 'bandwidth-on-demand' systems that allow end-users themselves to provision a dedicated path within their network. This presents an opportunity to build a much more flexible e-VLBI network, that is only instantiated as needed for observations. The interesting challenge here is that the international character of e-VLBI observations requires networking paths that each span several networking administration domains. Inter-domain Bandwidth-on-Demand services are just starting to be deployed, and e-VLBI will be one of its first major users.

This document describes the architecture of the BoD system that will be developed and deployed for managing the on-demand connections for e-VLBI and Lofar LTA data. We evaluate the currently available BoD systems and compare them against the current and future requirements for e-VLBI.

2.0 Requirements

We identified the following requirements:

Ability to schedule reservations for Bandwidth-on-Demand paths, both at very short notice (1 hour) but also months in advance.
Guaranteed bandwith with low packetloss
Connectivity for e-VLBI: 1.024 Gb/s (per station) initially, increasing to 4 Gb/s during the NEXPReS project.
Current e-VLBI locations:
- Netherlands (WSRT)
- Sweden (Onsola Space Observatory)
- Finland (Metsähovi)
- Spain (Yebes)
- UK (e-Merlin telescopes)
- Germany (Effelsberg)
- Italy (Medicina, possibly Noto, Sardinia)
- Australia (ATNF telescopes: Parkes, ATCA, Mopra)
- Puerto Rico (Arecibo)
- South Africa (Hartebeesthoek)
- Poland (Torun)
- China (Sheshan/Shanghai, but would be great to connect Urumqi, Kunmin etc.)
Connectivity for LOFAR LTA stations: 10Gb/s per station
Planned LOFAR LTA locations:
- Groningen
- Dwingeloo
- Jülich
- Amsterdam
The testing system (task 3) should be able to generate traffic at 1024Mb/s, 4Gb/s and 10Gb/s
The testing system should generate traffic that mimics real e-VLBI traffic as closely as possible

3.0 Overview of current BoD systems (as applicable to this project)

A brief description of each entry should be contributed. Entries should also include the following information: Who is developing the system in question, where (which NRENs) it is deployed, how the specifications are developed and published (open or not?), interoperation with other BoD systems (where applicable).

3.1 Transport technologies

Several technologies exist that can support dynamic paths, these are only introduced briefly here for background, with links to more thorough descriptions.

SDH

Synchronous Digital Hierarchy (SDH) and its North-American variant SONET currently carry most of the lightpaths used by JIVE. In an SDH system, the networking capacity on a wavelength is subdivided in timeslots, and paths are established by assigning fixed timeslots to a connection. The advantages of this approach are that it offers guaranteed bandwidth and very low latency, as en-route equipment doesn't have to look at the content of the traffic that is being transported.

Metro Ethernets

Although 'regular' Ethernet isn't well suited for long-haul datacommunication, the ubiquity of Ethernet enabled equipment does make it a very attractive to adapt it to wide-area networking. Metro Ethernet is an emerging standard that describes an Ethernet variant specifically targeted for MAN and WAN networks.

OTN
MPLS

Multiprotocol Label Switching (MPLS) encapsulates networking packets by prefixing a destination 'label' to the payload. En-route networking equipment only has to decode the label to make a fast routing decision. In this way, MPLS can offer both short latencies and the advantages of statistical multiplexing.

3.2 BoD management systems and interfaces

OpenDRAC

OpenDRAC is a Network Resource Manager created by Ciena (formerly Nortel) in collaboration with SURFnet. It is being used inside SURFnet6 since December 2008 as a building block of SURFnet’s Dynamic Lightpath (BoD) service. OpenDRAC was originally created as a single-domain NRM. Since April 2010 OpenDRAC became open source.

Fig. 1: OpenDRAC

In order to cope with inter-domain BoD reservations, OpenDRAC participated in the Automated GOLE project to demonstrate inter-domain capabilities in October 2010. This demonstration showed multiple GLIF Open Lightpath Exchanges (GOLEs) creating inter-domain lightpaths dynamically. As per today, OpenDRAC configures both Ciena OME6500 (SONET/SDH) and Force10 (Ethernet) equipment. And due to the modular nature of the software, it is relatively simple to add support for other Layer 1 for Layer 2 network elements. OpenDRAC can be invoked by a user via a web GUI, or via Web Services, of which an example application is available.

OSCARS

The On-demand Secure Circuits and Advance Reservation System (OSCARS) is developed by the American Energy and Sciences Network (ESnet). OSCARS can schedule virtual circuits on a Layer 2/Layer 3 infrastructure. At this moment ESnet uses OSCARS version 0.5 in a production environment. A much more modular version of OSCARS is under development and will be released as version 0.6 mid 2011. The aim of this new version is to make it possible to use it in a production environment and at the same time to be used as vehicle for network research (for example the testing of new path computation algorithms). Therefore the OSCARS Reservation Manager consists of several modular components each with its own specific task, for example Scheduling, Topology, Path Finding, and AAA; all of which are offered as web services on the outside.

Fig 2: OSCARS

OSCARS uses OSPF-TE for topology en resource discovery, RSVP-TE for signaling and provisioning, and MPLS for packet switching. When OSCARS has calculated a virtual circuit, RSVP is being used for the hop-by-hop provisioning and signaling. ESnet developped the IDCP protocol together with Dante, Internet2 en CalTech as part of the DICE calloboration to enable inter domain circuit setup in a homogeneous OSCARS network. OSCARS uses the Fenius API to talk to other network resource management systems like OpenDRAC.

DCN

3.3 Inter-domain BoD management systems

Autobahn

The Automated Bandwidth Allocation across Heterogeneous Networks (AutoBAHN) NRM is developed during the Géant2 project. AutoBAHN aims to couple existing NRM’s to make inter domain circuit provisioning possible. AutoBAHN supports both Layer 1 and Layer 2 circuits. AutoBAHN revolves around the Inter-Domain Manager (IDM) who is responsible for all inter-domain operations like path finding, scheduling and authorization. But not all functionality if fully implemented; for example, the authorization is based on eduGAIN which is still heavily under development. The IDM communicates with the local Domain Manager (DM), in its turn the DM implements domain specific proxies to interact with a range of generic or vendor specific technologies for data plane control.

Fig 3.: AutoBAHN

Since the release of the AutoBAHN software mid 2008 by the Geant2 project, there has not been much uptake in actual installations inside network domains. But as part of the Geant3 project the use of AutoBAHN is being pushed, sources mention that 4 or 5 European National Research and Education Networks have plans to deploy AutoBAHN sometime during 2011.1

IDCP

The InterDomain Controller Protocol (IDCP) is developed as part of the DICE collaboration. This protocol has been implemented and is in use by multiple organizations for dynamically provisioning network resources across multiple administrative domains. This protocol supports an architecture for dynamic networking, the concept by which network resources (i.e. bandwidth, VLAN number, etc) are requested by users, automatically provisioned by software, and released when they are no longer needed. This contrasts more traditional “static” networking where network configurations are manually made by network operators and usually stay in place for long periods of time.

This work was originally developed as part of the DICE Control Plane Working Group. DICE is an acronym for Dante, Internet2, CANARIE, and ESnet. However, there are now a larger set of organizations involved in this work.

As the name suggests, the IDC protocol specifically addresses issues related to dynamically requested resources that traverse domain boundaries. In both the static and dynamic case there must be extensive coordination between each domain to provision resources. In the static case this requires frequent communication between network operators making manual configurations and can take weeks to complete depending on the task. In the dynamic case, the IDC protocol automates this coordination and allows for provisioning in seconds or minutes. Interactions between domains are handled using messages defined in the protocol.

The IDC protocol defines messages for reserving network resources, signaling resource provisioning, gathering information about previously requested resources, and basic topology exchange. These messages are defined in a SOAP web service format. Since all messages are defined using SOAP, the protocol also utilizes a few external web service protocols and XML descriptions for features such as security and topology description. Also, the complete list of supported messages defined by the IDC protocol is contained within a Web Services Description Language (WSDL) . At present only the OSCARS NRM fully implements the IDCP protocol. Although in the past translators were written to allow inter domain provision calls to be made to other NRM’s. But these translators needed to be rewritten every time changes were made to the IDCP or one of the NRM’s. To overcome this problem, the Fenius protocol was introduced that is now implemented by all major NRM’s. It is believed that once Fenius is superceded by the NSI protocol the IDCP will become obsolete.

Fenius

The standardization of the Network Service Interface (NSI) progressed slowly during the past two years. Within the GLIF community people were ready to show they were capable of handling inter-domain dynamic services in a pre-production environment. As a result, the Automated GOLE Pilot was setup in February 2010. The goal of this pilot was to demonstrate inter-domain capabilities of GOLEs and networks with an intermediate interdomain protocol solution: Fenius. Fenius is designed and built for demonstration purposes and to provide the NSI-working group input on experiences. For example, topology distribution and decentralized (chain) path finding were not implemented. All GOLEs and networks in this project (MANLAN, StarLight, NORDUnet, NetherLight, CERNLight, ESnet, Internet2, JGN2, UvA) participated, bearing in mind that NSI would be the final solution for handling inter-domain reservations.

NSI

Network Service Interface (NSI) is an inter-domain protocol that is being standardized inside Open Grid Forum (OGF). It is expected that an preliminary draft of this protocol will be available in the first half of 2011, after which it is likely that all current Automated GOLE NRM’s that implement Fenius will adopt to this new standard soon thereafter.

The Automated GOLE

As prove of concept the use of Fenius in a heterogeneous NRM inter domain setup was demonstrated in the Automated GOLE pilot during GLIF 2010 and Super Computing 2010. The Automated GOLE Pilot is a project initiated by the GLIF community to create an infrastructure of multiple GOLEs that allow automated user agents to request VLAN connections from a terminus at any one of the GOLEs, across the multi-domain GOLE fabric, to another edge terminus likewise attached to some other GOLE. Automated agents within the application and within the networks, communicating to realize the end-to-end connection, perform the entire process. Participating Automated-GOLEs (A-GOLEs) are Ethernet switching nodes that have control software that re-configures the GOLE switches along a selected path to establish a dedicated VLAN between the two end points. This VLAN can be reserved in advance for a specified time, and is provisioned with dedicated capacity and performance characteristics guaranteed between the two end points. The A-GOLE control plane currently uses the open source FENIUS software (developed by ESnet) to translate between a diverse set of provisioning packages such as DCN/OSCARS, AutoBahn, Argia/UCLP, G-lambda, and DRAC. The project will inform and incorporate the OGF Network Services Interface (NSI) standards as they are finalized and released. The Participating A-GOLE Pilot operators are: NetherLight (NL - Amsterdam) StarLight (US - Chicago) MANLAN (US - New York City) JGN2Light (JP – Tokyo) CernLight (CH - Geneva) NorthernLight (DK – Copenhagen) CzechLight (CZ – Prague) PSNCLight (PL – Posnan) University of Amsterdam (NL – Amsterdam) Participating networks attached to the GOLEs and integrated into the A-GOLE control framework are: Internet2 ION (US) KDDI (JP) CESNet (CZ) AIST (JP) Essex (UK)

3.4 Mapping of JIVE/LOFAR sites to current and expected BoD systems

4.0 Evaluation criteria

4.1 BoD Path testing

From our experience with e-VLBI, it's important to test a path in advance of an observation to detect any abnormalities and have enough time left for possible repairs. The traffic used for testing should mimic the real e-VLBI traffic as close as possible but the VLBI recording equipment to generate 'real' traffic is often not available during testing time as it might be in use for another kind of observation at the telescope. A dedicated bandwidth-testing system will help alleviate this problem and make more thorough and standardized testing of paths possible. The test-traffic generator should be situated close to the VLBI recording equipment, so that the test traffic utilizes as much of the same networking path as real traffic would. The simulated traffic should be the same in terms of packet size, source/destination IP and port numbers, and packet spacing. One of the decisions we still have to make is whether an FPGA based system or a high-performance PC would be best suited to generate this traffic, and receive it at the other end.

There are two modes of operation: real-time e-VLBI with data transported as they arrive from the telescopes receiver to the central processor (the correlator at JIVE) for correlation and subsequent processing and storage mode – where data are recorded at the remote telescope (or in the cloud - see WP8) and then subsequently transported on to the correlator. Both processes require verification of the network capability on timescales which are sufficient to allow for corrections to be made before the radio astronomy data are transmitted. The first process is very time critical and will also need link characteristics checking on timescales of minutes to around 1 hr before the observations start. The second is not so time critical and could be planned well in advance.

The time at which the BOD is needed is generally planned well in advance (weeks) i.e. in scheduled observations, however target of opportunity observations could result in very short notice (hours or less). Both of these possibilities should be allowed for in scheduling tests and checks on the network.

Tests are needed to:

Confirm connectivity
Find RTT and keep a log (useful for discovering path changes)
Check on available bandwidth and performance: Throughput, latency and packet loss.

The main point is to confirm that the BOD links will give the required performance demanded by the VLBI observations: i.e. primarily the bandwidth required (this ultimately dictates the signal to noise of the observations, observations of weak sources will required the maximum bandwidth available.

Achieving this bandwidth may results in latency and packet loss. Links with excessive or variable latency will cause problems on correlation. In practice delays of up to 1 second can be allowed for the correlator – but allowable changes between packets are much smaller. Packet loss rates of up to around 1 % can be acceptable, since this only proportionately effects the final signal to noise ratio (Spencer et al. 2004).

Figure 1: A suitable networking configuration for testing

The switches (fitted with 10 GE ports) could be ones already in operation at the sites. There will need to be CX-4 and optical interfaces. Switch availability will be checked before implantation of the test system, several sites including JBO and Onsala already have suitable switches. The interface with the storage systems is part of WP8.

The are two possible ways to implement the testing system, depending on how the project proceeds:

A PC based system using a high specification PC running Linux with 10 GE output cards (see e.g. Hughes-Jones and Kershaw 2007). The main programme is UDPMon which gives suitable diagnostics. The PC must be able to run high speed real time data transfers and therefore some testing of commercial systems will be required.
Using a specialised FPGA device e.g. iNetTest (Hughes-Jones and Hargreaves 2009). Additional firmware and transfer to a more capable platform (ROACH boards produced by the University of Berkeley) may be required to give full functionality. Commercial devices also exist .

The plan is to first investigate suitable PCs, and implement a trial system. Examination of the detailed requirements of the iNetTest system will also take place, with the decision to implementation new firmware or buy commercial devices to be taken in the light of knowledge gained.

4.2 Quality metrics

A listing of possible metrics to track

Network availability of the BoD paths
Failed reservations
Aggregate e-VLBI throughput

4.3 User feedback

Incorporating user feedback into the evaluation of the BoD architecture will be difficult. The end user ar the astronomer who are using the VLBI array, but as they only receive the correlated data, they're quite far removed from the data as it is transported on the network. Another view to take would be that the end user is JIVE, who are running the correlator and have a direct view of the incoming data stream. At JIVE however, we can generate more quantitative metrics as listed above under 4.2.

JIVE Wiki

Table of Contents

WP 6 - High Bandwidth on Demand - draft of MS601

MS601 draft document

1.0 Introduction

2.0 Requirements

3.0 Overview of current BoD systems (as applicable to this project)

3.1 Transport technologies

3.2 BoD management systems and interfaces

3.3 Inter-domain BoD management systems

3.4 Mapping of JIVE/LOFAR sites to current and expected BoD systems

4.0 Evaluation criteria

4.1 BoD Path testing

4.2 Quality metrics

4.3 User feedback

5.0 System Architecture

5.1 Functionality

5.2 User Interface

5.3 External Interfaces

5.4 Supported BoD/IDC systems

6.0 References