User Tools

Site Tools


nexpres:nexpres_wp8

WP8 - Provisioning High-Bandwidth, High-Capacity Networked Storage on Demand

WP8 defines high-capacity networked storage usable on demand for e-VLBI and other high data volume applications.

Testing 2011-05-31 With Supermicro

Metsähovi obtained the recommended 36*2TB disk system (look below) and the quick testing with shell scripts show that there are no serious bottlenecks in the system. 36 concurrent processes are able to write to the disks concurrently with 40 Gbps speed and a simple test through the local network interface shows speed of 19 Gbps. The 19 Gbps test had extra overhead due to extra memory copes and it may be representative of the final speed of the system with optimized stack. This adds our confidence to obtain at least 8 Gbps with the system. Also, 16 Gbps or even more is potentially reachable. However, this requires several parallel 10 Gbps network interfaces and is likely beyond the scope of the project.

Quarterly report, NeXPRES WP8 / details of the results in March

Deliverable D8.2

The main activity was the deliverable D8.2 “Hardware design document for simultaneous I/O storage element” released 28-Feb-2011 (available at: http://www.jive.nl/dokuwiki/lib/exe/fetch.php?media=nexpres:2011-02-28_wp8-d8.2.pdf)

Deliverable D8.2 describes the reasoning for different architectures and suggests two possible concrete designs. However, testing done in March suggests some major modifications to the design. The main points are:

1. It turned out that the Highpoint RocketRaid cards in the design do not work well with new Linux kernels, especially kernel does not get hotswap notifications. We now recommend using disk controllers cards that are based on LSI2008 chip that has good Linux support. LSI2008 board is preliminarily tested (only one) and it worked as expected. Note that delivery times for these cards seem to be several weeks everywhere.

2. Motherboard Asus Crosshair IV Extreme turned out to be little suspicious in PCIe support (as documented in D8.2) but it is preliminarily tested in March and it may work fine (not tested with full configuration yet, some cards did not work with it)

3. We found still one promising reasonably priced almost COTS solution that we likely will test: Supermicro 36-disk case with Intel Xeon motherboard with two processors and 3+4 true PCIe slots (both CPUs have access to all PCIe but they are directly connected to one and indirectly connected to another).

4. Redundant power supply is not feasible to arrange with standard PC power supplies as described in the deliverable. Supermicro system does that with two powerful redundant 12V power supplies and one non-redundant 12V→5V transforming and wiring unit.

We plan to revise the delivarable D8.2 during the next quarter.

Co-operation during quarter

Deliverable D8.1 was discussed in a telephone meeting with other work groups 12th January. There was some discussion of the details but there were no concrete suggestions to change it. WP had some internal discussions of the authentication model presented in D8.1 and it was reiterated that using SSH with public key of the central site installed to the authorized_hosts file in the buffer (as written in D8.1) is by far the simplest solution and it should be secure enough to the intended use. There is however a need to write a detailed instructions how to maintain the security in some special situations (when host key has changed, how to distribute the public key in secure way (https site) and what to do if the secret key of the central site is compromised).

WP8 leader Ari Mujunen had a colloquium talk at 18 February 2010 at ASTRON/JIVE “Storage in Astronomy: Beyond the limits of bandwidth, capacity, and location” (PDF)

Planned co-operation

A meeting with D8.1, D8.2 and D8.3 teams is planned to verify that D8.3 work can start. There is a planned meeting in May in Aveiro in VLBI/SKA workshop where all WP8 participants will meet. Details of findings in testing D8.2 design in March

Disk controllers

Although determined fine in our EXPReS tests, recent tests have revealed that none of the Highpoint RocketRaid 2xxx series proprietary binary-only drivers we tested work well or at all in recent Linux kernels (>= 2.6.32).

Multiple Highpoint cards installed in a single PC seem to have trouble showing their BIOS setup screens. Highpoint cards cannot partition new disks. Other controllers (like motherboard controllers) need to be used for initial setup. This is because Highpoint insist on using the disks in their proprietary RAID format. Only if Highpoint controller sees a “legacy” partition table pre-made on a disk, it will allow the “JBOD”-style usage. Highpoint RocketRaid 3xxx series with Intel IOP chips has functional Linux drivers in the standard kernel (hptiop.ko) and decent performance, but there are several serious problems:

1. No hot swap.

2. No access to drive serial number / model info.

3. No access to drive SMART health status info.

4. A bit expensive / has on-board raid processor which is not really that useful to us.

Currently we are counting on LSI2008-based controllers to be the right choice for us (driven with mpt2sas.ko of the standard Linux kernel). This choice was made principally at the recommendation of:

http://blog.zorinaq.com/?e=10

which has an excellent summary of high-performance low-cost disk controllers for Linux and a high disk count. The LSI2008 chip is also frequently found on-board on server motherboards from manufacturers like Supermicro and Tyan.

In our past testing, previous generation LSI controllers did not perform well. HP-branded LSI controller LSI1068E (driven by mptsas.ko) showed lackluster performance (8 disks at 1.8Gbps total. We found that its old LSI BIOS left disk write caches disabled, thus destroying performance at ~35MB/s per disk. An LSI BIOS update changed the default to “write cache enabled” and performance was restored (85..142MB/s per disk, depending on disk model.

Motherboards

In D8.2, we presented “Asus Crosshair IV Extreme” as a low-cost, high-performance motherboard for the FlexBuff—albeit admitting that we have done actual testing with “Asus Crosshair IV Formula”.

The actual driver behind selecting either board from the “Asus Crosshair IV” series was driven by the fact that their AMD 890FX/SB850 chipset combines the largest number of PCIe Gen 2.0 lanes (to drive the maximum number or PCIe x8 expansion cards) in a modern COTS PC motherboard design. None of the Intel (Core i7) chipsets (X58, P67) can match that, plus the ability to choose either non-ECC or ECC memory plus six native 6Gbps SATA-III disk ports, all these contribute to this selection. Six-core CPU@3GHz seems also pretty cost-effective and powerful enough. A surprise with the Extreme motherboard compared with the Formula motherboard tested before is that it is over 20 mm deeper so that it does not fit into some of our old test chassises. It still fits well into the chassis we recommend in the D8.2 design.

Three x8 and one x4 PCIe slots already in 200€ “Asus Crosshair IV Formula” with its (the physical slots are larger but have only this many PCIe lanes connected) is the maximum that the AMD 890FX/SB850 chipset will natively support. Equipped with 16GB DDR3-1333 ECC memory and an AMD Phenom II X6 1090T (as shown in D8.2) we haven't (yet) been able to saturate its PCIe nor memory bus(es) though we have gotten in excess of 16Gbps out of a set of 24 disks, using 24 simultaneous processes, each handling its “own” disk with large-buffer O_DIRECT write() calls. The 300€ “Asus Crosshair IV Extreme”, however, has only one x16 slot natively connected to the chipset. The remaining PCIe lanes are connected to a PCIe switch chip, “HydraLogix” (http://www.lucidlogix.com/products_hydra200.html), the effect of which to the throughput performance of PCIe boards connected via it is yet unknown. With this chip, the board is supposed to offer four slots at (x16 x16 x8 x8) plus one PCIe x1 and one PCI.

We have seen compatibility issues: the old LSI1068E-based controller does not work at all (not shown in 'lspci') in any of those slots offered via this HydraLogix chip. However, for instance all Highpoint RocketRaid 2xxx cards appear in 'lspci' (though we don't have any Linux drivers working for them) The board has 6+8 on-board SATA connectors (2 more than Formula) and a better / more reliable gigabit on-board Ethernet (than Formula's sky2), so it is probably a fine choice. We have verified that (one) LSI2008 controller works fine in any PCIe slot including those driven by HydraLogix.

We have also been looking into a Supermicro X8DTH-6F Intel Xeon motherboard with seven PCIe x8 slots. More about that in the Chassis section.

Chassis

In D8.2 there are two scenarios, the “main AMD in a modified 4U case” (3k€) and the “alternative Intel in Supermicro 24-disk 4U case” SC847A-R1400LPB (4k€). Both of these allow for >= 24 disks (for >=8Gbps; the disk cost is not included in the estimate).

We have now looked into a third casing alternative, a Supermicro 36-disk 4U case (highest 3.5“ disk density available commercially). The main motivation here is that, in the near future, the slowest streaming performance of a single 3.5” SATA drive is going up from ~400Mbps to 450Mbps or more, then 36*0.45 == 16.2Gbps which would very possibly allow simultaneous read and write at 8Gbps with a dual-port 10GE. This box, however, only has 2U high space for the motherboard and thus all of those seven PCIe card will have to be low-profile cards and the CPU cooling has to be low-profile-compatible. Using regular Asus motherboards in this might get tricky.

If we consider that a 10GE board costs between 500–1000€ (depending on CX4/RJ45/fiber, single/dual) and 24–28–36 disks at approximately 50€ per terabyte (i.e. starting from 24*1TB==1400e up until 36*2TB==3600€) and consider these as the “fixed base costs”, then

a) modded case /w pwr 1000, Asus Crosshair IV Formula +CPU+mem 200+200+200, two LSI2008+cables 600 == ~2200€

b) modded case /w pwr 1000, Asus Crosshair IV Extreme +CPU+mem 300+200+200, three LSI2008+cables 900 == ~2600€

c) Supermicro 846TQ-R1200B 1600e, Asus Crosshair IV Formula +CPU+mem 200+200+200, two LSI2008+cables 600 == ~2600€

d) Supermicro 846TQ-R1200B 1600e, Asus Crosshair IV Extreme +CPU+mem 300+200+200, three LSI2008+cables 900 == ~3200€

e) Supermicro 847A-R1400LPB (1869€), Supermicro X8DTH-6F (689€), 2x Xeon E5620 (2*382€), heatsinks 76€, 6x 4GB DDR3-1333 ECC (6*67==402€), 4x LSI2008 (4*265==1060€), cables 198€, install 80€, warranty 3y 230€ == ~5400€ (we have an offer from a local Supermicro dealer quoting this price (including 23% Finnish VAT) for a system for delivery in 1–2 weeks).

Given the minuscule price difference between the modded Chieftec 4U case (28 disks) and the Supermicro 846TQ-R1200B (24 disks), modding will not make any sense at all unless:

1. it turns out that Supermicro disk trays (6 mounting screws per tray) do not dampen vibrations enough and disks slow down because of that

2. it turns out that 24 disks are not enough but 28 would be

However, going to the 36-disk 847A-R1400LPB represents a rather large price increase—and it is combined with the complete uncertainty whether the dual-Xeon motherboard actually can deliver high PCIe & memory bandwidth in real world. However, even the possibility of getting much more than 8Gbps (that is, 16Gbps) plus the possibility to add 2×8-port external mini-SAS controller boards to the system to support 16-disk Mark6 “disk cubes” would be relatively exciting.

WP8 Progress Meeting

2011 May 26
Aveiro, Portugal

This meeting was held in conjunction with the workshop “The Growing Demands on Connectivity and Information Processing in Radio Astronomy from VLBI to the SKA.” Full details of that meeting (agenda and presentations) are available from the workshop's website

nexpres/nexpres_wp8.txt · Last modified: 2011/05/31 13:12 by 127.0.0.1