Binaryspiral

It’s like “Hee Haw” with lasers.

Browsing Posts in Hardware

My original post included a quick and dirty test on raw hard drive performance using HDTach to give me an idea of what I was working with with my new DroboPro. Of course true to any benchmarking test – there are so many metrics that can be tested that may be able to give me a clearer picture, but I didn’t have the luxury of time to do so.

After publishing my testing, I felt that it was worth going back and getting more data. It’s obvious from the hit count I’ve seen – that this is an interesting topic that a few people actually find interesting. I’m in a unique position as a consumer to be able to test this unit in a great development environment against some really good equipment that others can identify with.

So, without further ado… here’s what I did.

continue reading…

Recently I was able to purchase and test two Drobo storage appliances.

A DroboS with five disks and a DroboPro with eight disks. All the disks are the same size, speed, and manufacturer. The specs are listed further below for your consumption.

The DroboS is attached via USB 2.0 to my desktop workstation – a Dell OptiPlex 745 with 4GB RAM and running Windows 7 (64bit). I have an eSATA card on order. I’ll be posting more details on this specific setup at a later date.

The DroboPro is the device I’m most interested in. Depending on the test results that I get – it may live in a remote campus building attached to two small ESX hosts with six virtual servers on them.

Currently our environment is hosted on three MPC DataFrame 120’s. These units are no longer covered by manufacturer’s warranty due to MPC’s crash and burn bankruptcy. However, I have two HP/LeftHand 2120 storage modules that could replace them, but it may be overkill. The HP units each have 12 disks and use much more power than our existing MPC units or the Drobo. Two domain controllers, two file servers, a print server, and an application server.

Read on for the gory details.

Test Environment

VMWare Host:

· HP DL360 G6 (Sixth Generation)

CPU: 1 x Intel Xeon E5530 Quad Core 2.4 GHz with Hyper threading enabled (8 total cores available)
Memory: 3 x 4GB PC3-10600R RDIMMs DDR3
Network: Six total 1Gbps Ethernet ports (2 on-board, 4 port PCIe NIC)

· Local Storage:

HP P400 Raid Controller with 512MB Cache and Battery Backup Unit (for full read/write caching)
4 x 72GB 10k RPM SAS (2 x RAID1)

· NAS Storage

2 x LeftHand (HP) NSM 2120 Storage Modules (mirrored but not load balanced)
2 x RAID 6 arrays and combined into one storage pool using SAN iQ

· VMFS volumes:
3 x 1TB (vmware default configurations)
1 x 250GB (vmware default configurations)

DroboPro

1 x 1Gbps Ethernet connection
Latest firmware (1.4.1)
8 x 500GB 7200RPM SATA
Single drive failure protection (dual disk disabled for testing)

· VMFS volumes:
2 x 2TB (8MB blocks, per best practice documentation from Drobo)

· Network:

1 x HP 3500 YL gigabit Ethernet switch (24 port – 101Gpps backplane)

· Testing Software:

Host OS: vSphere 4.0
VM OS: Windows XP SP3 (1GB RAM, 9GB HDD)
Notes: VMware tools and latest virtual hardware upgrades applied.
HD Tach 3.0.4.0

Notes:

HD Tach uses base 10 for conversion of bytes to megabytes and gigabytes. I use base 2 to reflect the OS and application measurements.

First Test:

Test VM on LeftHand 1TB VMFS volume.drobo test 1 - LeftHand Throughput

We start off slow but maintain a high throughput with minimal latency, averaging 8.4ms on the random access tests. Average read speeds are 51.5 MB/s with a burst above 61MB throughout the test. The test bandwidth ranges from 28.6 MB to over 66 MB, but mostly staying above 42 MB during the entire test.

Second Test:

Test VM on Drobo 2TB VMFS volume. drobo test 1 - DroboPro Throughput

After migrating the same test machine over to the Drobo, I ran the exact same test. We see much less top end and very low valleys on the ranges of throughput. The test ranges from 10MB on the lowest to a hair over 45MB/s on the bursts. The throughput averaging 27.56 MB/s with a latency of over 450ms. I reran the test at a later time and had a better result.

Third Test:

Test VM on Drobo 2TB VMFS volume (#2) drobo test 2 - DroboPro Throughput

I reran the HD Tach test on the Drobo to test, this time using 32MB blocks. The random access latency was much better at 14.4ms. The throughput remained the same – averaging 29.85 MB/s, the difference easily attributed to the larger block size.

Fourth Test:

Test VM on Drobo 2TB VMFS volume during a storage vmotion:drobo test 2 - drobo and vmotion

I started a storage VM Motion of a powered off clone of the test VM. This copy was being svmotioned to the second VMFS volume on the DroboPro, and finished during the first 3rd of the test. As you can see, the disk access for the running machine was crushed by the svmotion process of the vSphere host.

Fifth Test:

Text VM on HP local storage.XP VM DAS RAID1

I also wanted a benchmark in this test to prove the virtual machine and the host are not the bottlenecks in this test. I’ll let the graph speak for itself.

The sustained throughput and burst speeds are more than enough to prove that vsphere host and virtual hardware have opened up the floodgates to high I/O applications.

Final Thoughts:

I’m expecting too much from this SMB storage device. While I point out the performance issues I found, it really isn’t fair – I’m comparing apples to oranges in this environment.

The Pro is really a good unit and has a lot of potential for SMBs or workgroups. It’s brain dead simple to use and manage. The Drobo line of products are devices that “just works” out of the box. Plug it in, slap a pair of hard drives in, and turn it on.

I don’t think I can recommend this unit for the original purpose it was purchased for: Shared storage for two ESX hosts and 4-6 virtual servers in a remote campus. Even though it is certified by VMWare, it’s apparent by the tests that it would not support more than a few virtual machines – and certainly not any file servers that require a decent storage subsystem to keep users from complaining about slow file access or delays.

This is not to say “don’t buy it” because I really do think it’s a worthy product if you need a lot of storage that can be upgraded over time with very few technical skills. What other device out there can you yank a 500GB drive out and slap in a 1TB and start using it right away? That’s the power of these units.

I can think of a dozen situations here at work that this unit would be a perfect fit. Just not this one.

The lack of monitoring (remote or self) also removes it from our production environment. I brought this question up with Drobo support. They were very prompt in their turnaround, as I received a reply in less than an hour with a follow up question and then another 45 minutes with a recommendation.

They recommend I install the Drobo dashboard software on a virtual machine and setup email alerts from within the software. The catch is that the dashboard is an application – not a service. I would have to remain logged in for the dashboard to be running. This is not an option.

Alternatives:

The DroboPro is not the biggest dog in Drobo’s kennel. They recently released the Drobo Elite which offers a faster processor, a second gigabit nic, and some additional file system features to allow for multiple computer access from the network. The cost is about double what the Pro was.

However, I was hoping for more performance from an 8 drive unit than I got. It’s just not fast enough for my environment. I will post additional test information from my DroboS when I get the eSATA card installed. I’m looking forward to that!

-Update-
I reviewed my test data and methods and wanted a clearer picture of where the DroboPro was on my chart of storage. So I retested the DroboPro with additional software and tests for additional data. Read on to DroboPro Testing, Part 2.

Nehalem quietly

1 comment

I love deliveries. Especially this time of year. UPS, FedEx, AirTrans, carrier pigeon… I don’t care – it’s usually something expensive and always something that is going to make my job easier.

Today, FedEx delivered a pallet of new HP servers and parts for our new campus. The pallet also contained a few parts for servers we had just got last week.Xeon5500

So I’ve got five new HP DL360 servers, sixth generation. HP just released their new line last month with the new Intel 5500 Xeon processors. I like to compare them like the pro version of the i7 consumer chips. Four hyperthreading cores with onboard memory controller per chip.  Yeah, and even though it matches clock speed with our existing G5 servers – it’s smoking fast.

Opening the little 1U server chassis, shows a lot of room for expansion – given the amount of gear this unit has already. It has an onboard raid control card that can address up to eight 2.5” SAS or SATA drives. It also has an IDE controller for optical media. It also includes a USB port and SD Card slot on the motherboard… great for those moronic copy protection dongles or emergency boot drives or utilities.

I’m not going sit here and try and sell you a server by just spewing specs… what HP really did to impress me is cut the noise and power usage so drastically I seriously thought there was something wrong with it. c01668139

These servers are usually so loud I can’t build them at my desk – I had to take them and bench build them in our staging room. Not anymore.

I actually had this DL360 G6 installing windows 2008 64bit from DVD on a bench next to a Dell OptiPlex 755 sitting idle. When I placed my head between the two to check if the fans were actually spinning on the HP – the Dell was louder. I have never actually heard an SAS drive until today… amazing.

After diving into the onboard monitoring systems, I found out how they are able to keep the fans spinning at 19% while keeping cool – 28 onboard temp sensors watching everything in the box… if a section gets warmer – only the fans dedicated to that area increase their speed and only as much needed to move more air to cool it.

With a single quad core processor, three 2GB memory cards, four 10,000 rpm hard drives, and a four port gig nic PCI-x card – this server only pulled 130 watts of power out of both power supplies at its peak. When it was idling it sat at 93 watts. The only time I ever heard the fans is when I started the server after that near silence.

Yes, I’m that impressed with this new line – I’m looking forward to the next year when we upgrade our ESX environment to G6 host servers… Maybe I won’t be able to hear the server room from down the hall.

The college recently purchased a new Sophos Email Security appliance model. It was very easy to setup and I’m looking forward to having PureMessage filtering our spam and crapmail attacks, it’ll be a good thing.

The Active Directory integration is not a polished as their Web Security appliances’ are. We have two WS1000 appliances, also from Sophos. Both hooked right into AD and pulled down both students and staff accounts without issue. Even indicated what sub-domains it found during the process. Top notch, no brainer installation.

The problem I’m writing about is the ES4000 appliance’s inability to detect our second domain in the same forest as the domain our service account is in. First off, it couldn’t even automatically detect settings using the same service account using the “Detect Settings…” feature. An undocumented bug was documented on experts-exchange.com with the workaround being you have to use an account with Schema Admin privileges in the domain’s original Users OU. Once detected, you could move the user and modify the DN used to authenticate.

Okay, that one was fixed. But I still couldn’t sync both staff and students – even if I pointed the Base DN to the top domain or left it blank.

I opened a case with Sophos and went through first level support. After 48 hours (plus a weekend) of remote support they kicked me to second tier.

Second tier connected remotely and continue the troubleshooting. After an hour or so they found a workaround and had me test it. Success.

Fix: Replace the Base DN for users/groups with a single space. Done and now it works. I’m not much of an LDAP junkie, but I would consider that a bug.

Anyway, it works for me and I hope it helps someone else out there scratching their head wondering why the eff their ES4000 is not working.

Side note: All in all, Sophos support is pretty good I just wish they would read my entire email before firing back the first canned response that essentially was exactly what I had already done. For anyone absolutely buried with this product, I can highly recommend leveraging their consulting services. Well worth the small price to get it done right the first time.

The last few months at my new job I’ve been squishing small and medium bugs to get systems up to par. Service packs, patching, firmware updates, software upgrades, or just organized to make life easier for everyone involved.

Our server room has been one of those infested areas… I’ve been squashing the easy bugs but the room is frankly a disaster waiting to happen. It’s not large by data center measurements, frankly it’s just a small class room with three ceiling mounted cooling units and seven racks of equipment. Three two post network racks and four cabinets for servers. Problem is almost nothing is labled. Power cables, random colors of ethernet, and dayglo orange fiber cables are intertwined in a quilt of choas behind the cabinets. The cable ladder above the racks is about 12 inches too far away and has a large power bus bar below it. But that’s not difficult to fix. Yes, it’s a time consuming job – but working for a college has advantages.

The biggest problem is: I don’t know how much power I have to work with. 30 circuits of power and I haven’t a clue what goes where or how much I’m using.

Today was the big day I was waiting for. An electrician arrived and performed a detailed analysis and audit of our power usage. He started from the UPS inputs and worked through the distribution panel and finally labled and measured the outlets in the server room. This is where my worst fears were realized…

This Close, man!

we were this close to a massive cascading power failure. Three circuits have been identified as being over 75% utilized, one is at 96%…

Bad news: Nine servers are connected to this circuit.

Worse news: Three servers totally reliant on it, both power supplies are connected to this.

Even worse news: Two of those servers are part of a three node ESX cluster with twenty two virtual machines hosted on them.

Worse bad news: If that circuit trips, it’ll force the other six servers to pull power from another circuit almost as loaded, which will most likely put it over the top and trip that second breaker.

UPS Truck Fire

And, to top it all off: Our UPS load is really unbalanced, but not in a way we can fix with medication. You see, this room is fed with three feeds of electricity called “phases” or “legs”. Equipment like large appliances or electric motors run more efficiently using more than one phase. In this case, the UPS (our battery backup device for the servers) pulls electricity equally from all three phases, conditions it, charges its batteries, and then feeds it to a breaker box. In this breaker box are thirty 20A circuits. Each is connected to one of those phases. Our core switches are large units, so they get two circuits (and two phases) for each of their power connections. It’s a bit complicated, but the simple rule is – load the boat evenly and it won’t capsize.

Right now, phase one is running 3% over, phase two is 33% under, and phase three is 24% over average. So the devation between L2 and L3 is 58%! It’s no wonder the UPSs have only been living for two or three years. When a UPS has to supply power to a system, it performs better when the load across all of its connections are close to the same. Deviations up or down simply chew up UPS components and spit them out. Oh, and there is no UPS maintenance by-pass switch so if the UPS dies – the room dies. If we want to replace the UPS we have to kill the room until the hardwire connection is bypassed by an electrician.

But all is not lost.

Now that I have a detailed map of our power usage and outlets that are labeled, I’m throwing together an emergency change plan to migrate servers onto other circuits to reduce the load on the heavily loaded circuits AND to balance the load across phases.

In August we plan on installing new three phase power distribution units from APC with onboard monitoring and access to all three phases on the PDU. This will make balancing and loading a lot easier. Until then, I’m juggling power cables to anonymous power strips… but at least NOW they’re labeled.

Knowing is half the battle.

Half the battle.