Thursday, May 10, 2012

Isilon IQ 4800, how I love to hate you


A couple months ago when I started my new job, I became the intermittently proud administrator of a 4-node cluster of Isilon IQ4800's, and a 4-node cluster of Isilon IQ 6000i storage appliances.

[2011-05-11 correction: I figured out how to get restriping info out of the UI, added this to the bottom of this post]

There are a lot of things that are cool about these systems, which I won't go into detail right now, but let's just say the other day when my office lost power for a moment and I found (OH CRAP MOMENT) that the 4 node cluster of IQ 4800's were not on UPS, and one of the nodes wouldn't rejoin the cluster, yet the cluster kept on chugging, I was happy.

Until I tried to figure out what was wrong, and then I was unhappy because the documentation on these, even in the Isilon authored command reference and user guide is lacking.  Isilon folks (who are now EMC), please, pretty please, when you write documentation the average Sys Admin can follow more instructions than you surmise and calling Isilon Technical Support is not a good alternative in so many examples.

Here is what I found, after connecting a serial console cable to the node that wouldn't rejoin, and how I figured out it had a failed hard drive.

First thing you may need to know, when connecting a console cable, a standard, now hard to find console cable with female to female ends, as I recall.

Baud rate: 115200
Data bits: 8
Parity: None
Stop bits: 1
Flow control: Hardware (but dtr/dsr worked for me in SecureCRT)

Using, say, a baud rate of 9600 will yield Klingon text, which I am not adept at reading, but am very annoyed by.

After skimming the dozen pages of useless commands, these are the useful ones:
isilon-cluster# isi devices
! this shows a list of hard drives and their current status
Node 1, [DOWN]
Bay 1 Lnum N/A [REPLACE] SN:N/A N/A
Bay 2 Lnum 8 [HEALTHY] SN:XXX /dev/twed3
Bay 3 Lnum 5 [HEALTHY] SN:XXX /dev/twed6
Bay 4 Lnum 2 [HEALTHY] SN:XXX /dev/twed9
Bay 5 Lnum 10 [HEALTHY] SN:XXX /dev/twed1
Bay 6 Lnum 7 [HEALTHY] SN:XXX /dev/twed4
Bay 7 Lnum 4 [HEALTHY] SN:XXX /dev/twed7
Bay 8 Lnum 1 [HEALTHY] SN:XXX /dev/twed10
Bay 9 Lnum 9 [HEALTHY] SN:XXX /dev/twed2
Bay 10 Lnum 6 [HEALTHY] SN:XXX /dev/twed5
Bay 11 Lnum 3 [HEALTHY] SN:XXX /dev/twed8
Bay 12 Lnum 0 [HEALTHY] SN:XXX /dev/twed11
Unavailable drives:
Lnum 11 [SMARTFAIL] Last Known Bay 1

isilon-cluster# # isi status -n
Node LNN: 1
Node ID: 1
Node Name: isiloncluster-1
Node IP Address: X.X.X.191
Node Health: D-------
Node SN:  XXXXX
Node Capacity: n/a
Available: n/a (n/a)
Used: n/a (n/a)
Network Status:
See 'isi networks list interfaces -v' for more detail or man(8) isi.
Internal: 2 GbE network interfaces (1 up, 1 down)
External: 2 GbE network interfaces (1 up, 1 down)
1 Aggregated network interfaces (0 up, 1 down)
Disk Drive Status:
Bay 1 Bay 2 <8> Bay 3 <5> Bay 4 <2>
0b/s 0b/s 0b/s 531Kb/s
[REPLACE] [HEALTHY] [HEALTHY] [HEALTHY]

Bay 5 <10> Bay 6 <7> Bay 7 <4> Bay 8 <1>
531Kb/s 0b/s 0b/s 531Kb/s
[HEALTHY] [HEALTHY] [HEALTHY] [HEALTHY]
Bay 9 <9> Bay 10 <6> Bay 11 <3> Bay 12 <0>
0b/s 531Kb/s 0b/s 0b/s
[HEALTHY] [HEALTHY] [HEALTHY] [HEALTHY]


Okay, so based on this, I know I have a failed hard drive in Bay 1.  Now how do I get a new hard drive when I 1) don't have support and 2) well, see #1.  In my case, I lucked out and had a spare Isilon (yep, what are the odds?) so I cannibalized a drive out of that one, but if you don't I think the drives are SATA, just make sure it's the same size drive or larger and report back here on whether that worked.  Buena suerte!

Back to the failed drive.  I powered off the node using "shutdown -h now" because, guess what, under the hood this box is FreeBSD.  One extra point for Isilon for using FreeBSD.  There didn't seem to be an easier, obvious way to shut down the box, aside from the LED panel "shutdown" once you navigate with the arrows.  Have fun with that.

So, if by chance you put in a drive that is smaller than the current ones, you may see something like [TOO SMALL] when you run "isi devices".

When I put in a used hard drive and powered on the device (there's a button on the back of the chassis, btw to power it on), running "isi devices" told me that the drive in Bay 1 was "[USED]"
Yep, it sure is!  Through a bit of guessing, my coworker and I guessed how to make it less used:

Figure out what node the Isilon thinks it is:
isiloncluster-1# isi devicess --action=status --device=x:y
in my case x=1 for node 1, and y=1 for bay 1, but use this command to make sure you know which disk it is.


Then, FORMAT!
isiloncluster-1# isi devices --action=format --device=1:1

After about 15 minutes, running "isi devices" showed the drive as [HEALTHY], and after 12 hours or so, the cluster had all four nodes back online, however all the nodes in the cluster now show [ATTN] running "isi status":

isiloncluster-1# isi status
Cluster Name:     isiloncluster
Cluster Health:   [ATTN]
Available:        9.3T (53%)

                        Health    Throughput (bits/s)
 ID | IP Address      |D-A--S-R|   In     Out    Total |  Used  / Capacity
----+-----------------+--------+-------+-------+-------+-----------------------
  1 | x.x.x.191   |--A-----|     0 |     0 |     0 |  2.0T  /  4.3T (46%)
  2 | x.x.x.192   |--A-----|   41K |  1.3M |  1.3M |  2.0T  /  4.3T (46%)
  3 | x.x.x.193   |--A-----|   n/a |   n/a |   n/a |   n/a  /   n/a (n/a)
  4 | x.x.x.194   |--A-----|     0 |     0 |     0 |  2.0T  /  4.3T (46%)
-------------------------------+-------+-------+-------+-----------------------
 Cluster Totals:               |   n/a |   n/a |   n/a |  8.0T  /   17T (46%)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Critical Alerts:

Finished Operations (2):
 OPERATION (ID)             POLICY   START        COMPLETE     ELAPSED
 FlexProtect (1)            MEDIUM   05/08 17:33  05/08 20:39  3:05:55
 Collect (2)                LOW      05/08 20:40  05/09 06:54  10:14:37

Active Operations (1):
 OPERATION (ID)             POLICY   START        ELAPSED    PCT  LAST COMPLETED
 AutoBalance (3)            LOW   05/09 06:54  1d 7:36     0%  Chunk 1120 (1000 lins,  4.1GB)

No waiting operations.
No failed operations.

After searching and searching, and no, I mean NO info in the logs, I saw the obvious.  The AutoBalance command (which distributes data near-equally across all the nodes), which is supposed to run after FlexProtect (the ISI command that detects if drives or nodes fail), had been running for over a day, but was at zero percent.  Running the command a few more times showed that the Chunk # was incrementing and the data size as well, but ...

A bit more digging said that I could adjust the priority of the AutoBalance command, but how?!  Isilon, I beg you, tell me how?!?

Trial and error, this command did the trick:
isicluster-1# isi restripe update autobalance medium

which basically means, assign the autobalance process to the medium policy, or priority.  Other useful tidbits of the isi restripe command, which has only this help info:

isicluster-1# isi restripe --help
usage: isi restripe [-wD] [action [-lda] [-n] [-o order] (operation | -i id) [-p ] [-R ] ["cron_time"] [policy]]

# isi restripe >>
Valid actions are: start, pause, stop, update, resume.

# isi restripe update >>
Valid operations are: collect, flexprotect, autobalance, mediascan, upgrade, setprotection, quotascan, treedelete, snapshotdeletelins, integrityscan, avscan.

# isi restripe update autobalance >>
  Valid impact policies are: low, medium, high.

Oh, and my final thoughts:
1) This advice comes with no warranty, expressed or implied.  Caveat emptor!
2) running "isi restripe" shows just the status of the restriping, without the clutter of "isi status"
3) I'm running OneFS v. 5.5.4.21
4) If you have more info than I do, please share.  If you want a copy of the command reference or config guide, post here with your email and I'll send it to you.

And Buena Suerte!

2012-05-11 Addition/correction - how to get restriping info & set priorities in the web UI.
I'll admit it, I was wrong.  I thought all the restriping info was not easily available or modifiable in the web UI.  Here's how to do pretty much the same thing in the UI that I show above in the CLI.

In the Isilon web interface, select
File System > File System Settings > Restriper Operations
The output here is similar to what you see in the cli with "isi restripe" with an added bonus - progress is measured to one decimal point (i.e. 1.3% vs. 1% in the cli).  On this page you can also modify the priority of jobs that are displayed by clicking "Edit" next to the ISI command name.