You are here: Home / What do you need? / Help and documentation / Unix tricks and information / 3ware RAID and tw_cli

3ware RAID and tw_cli

by Darrell Kingsley last modified Mar 13, 2014 02:05 PM
tw_cli is a RAID monitoring utility to help you maintain your 3Ware RAID array

We use RAID-1 mirroring, so all of the information here relates to RAID-1, though some of it may be useful for other RAID types.

You can only run tw_cli as root, so either su or sudo it.

You can either run it as as program with its own command line i.e. cd to wherever you've installed it and then

[user@box name]# ./tw_cli

to get the command line, and then run the the commands e.g.

//box name> show

Or you can run it as a shell utility e.g. 

[user@box name]# ./tw_cli show

We'll assume from here on in that we're running it at the shell command line.

Some commands and tw_cli's responses explained

First of all, let's get the big picture

[user@box name]# ./tw_cli show

or

[user@box name]# ./tw_cli info

 

which both return, for example:

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c0    8006-2LP     2             2          1         1           3       -        -       

This means controller c0 has two drives on two ports, one of which has a problem.

Now try

[user@box name]# ./tw_cli info c0

This asks for info on unit c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    DEGRADED       -       -       -       139.735   ON     -      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     DEGRADED         u0     139.73 GB   293046768     WD-WMAP41084290     
p1     OK               u0     139.73 GB   293046768     WD-WXC0CA9D2877    

Very similar to this is

[user@box name]# ./tw_cli info c0 u0

this produces the same info as above but in a slightly more compact form:

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-1    DEGRADED       -       -       -     -       139.735   
u0-0     DISK      DEGRADED       -       -       p0    -       139.735   
u0-1     DISK      OK             -       -       p1    -       139.735  

Here u0-0 means unit u0, port p0

Both of the above outputs show there are two drives in our RAID-1 array. Our array has only one unit - u0, which I think is standard for RAID-1. I think that other RAID configurations RAID-10 or RAID-6 might have more than one unit per controller, but as this doesn't affect us, I haven't paid too much attention.

 The RAID array is degraded i.e. it is not functioning properly. In this case it is because the disk on port p0 is itself degraded. This probably means it has errors, but it may just mean it has stopped working properly for another reason, so it may be worth trying to rebuild the array again. You do this as follows:

[user@box name]# ./tw_cli maint remove c0 p0

 This removes the degraded disk from the array, producing the following output

Removing port /c0/p0 ... Done.

If we now run:

[user@box name]# ./tw_cli info c0 u0

we get a slightly different result

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-1    DEGRADED       -       -       -     -       139.735   
u0-0     DISK      DEGRADED       -       -       -     -       139.735   
u0-1     DISK      OK             -       -       p1    -       139.735  

The only difference here is that disk u0-0 is no longer assigned to port 0. Now you have to find the disk again...

[user@box name]# ./tw_cli maint rescan c0

This produces the following output, if it finds the disk i.e. it hasn't stopped spinning or something.

Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p0].

If you run it again, it won't find anything second time around, cos it has already found it once! At this point it's possible that info c0 u0 still won't show the degraded disk as being on port p0, but 

[user@box name]# ./tw_cli info c0

gives

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    DEGRADED       -       -       -       139.735   ON     -      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               -      139.73 GB   293046768     WD-WMAP41084290     
p1     OK               u0     139.73 GB   293046768     WD-WXC0CA9D2877    

so perhaps it's a latency thing, or perhaps they just differ in what they show at this stage.

Anyway, if the disk was going to work again, we now have to rebuild the array as follows:

[user@box name]# ./tw_cli maint rebuild c0 u0 p0

This returns:

Sending rebuild start request to /c0/u0 on 1 disk(s) [0] ... Done.

If it's happy with your request. You now need to check whether the rebuild is actually in progress.

[user@box name]# ./tw_cli /c0/u0 show rebuildstatus

which in our case returned

/c0/u0 is not rebuilding, its current state is DEGRADED

Which means it didn't work. I think the next step is to run FSCK to try and repair the disk, and then try and rebuild the array.

A different kind of problem

On another of our boxes, running:

[user@box name]# ./tw_cli info c0

gives the following:

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    DEGRADED       -       -       -       139.688   ON     OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     NOT-PRESENT      -      -           -             -
p1     OK               u0     139.73 GB   293046768     WD-WMAP41398693    

Here the disk at port p0 is listed a NOT-PRESENT, which suggests is has failed altogether and may have stopped spinning. Interestingly (I use this word in a fairly loose sense),

[user@box name]# ./tw_cli info c0 u0

gives the following:

Unit     UnitType  Status         %RCmpl  %V/I/M  Port  Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-1    DEGRADED       -       -       -     -       139.688   
u0-0     DISK      OK             -       -       p1    -       139.688   
u0-1     DISK      DEGRADED       -       -       -     -       139.688   
u0/v0    Volume    -              -       -       -     -       139.688  

where the offending disk is listed as degraded, with no port assigned.

Anyway, running:

[user@box name]# ./tw_cli maint rescan c0

Gives the bleak output:

Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [none].

This suggests it hasn't just lost track of the disk, but it really has failed. It may be unseated of course, so get someone to remove it and plug it in again, if possible. Trying the following:

[user@box name]# ./tw_cli maint remove c0 p0

Gives the output:

Removing port /c0/p0 ... Failed.
(0x0B:0x002E): Port empty

Yes. It's really not there, and it really can't find it. So either it has become unseated or it is dead.

I *think* that another meaning for NOT-PRESENT might be that there is a disk there but it hasn't been added to any array, or it has failed and is therefore not part of an array, but is still okay. In that case do this:

[user@box name]# ./tw_cli /c0/p0 export

This comes back with:

Removing /c0/p0 will take the disk offline.
Do you want to continue ? Y|N [N]:

Respond Y and if the disk is okay, you'll get:

Exporting port /c0/p0 ... Done.

Then you can add it to the array again with a maint rescan followed by a maint rebuild.

In our case it responded with:

Removing port /c0/p0 ... Failed.
(0x0B:0x002E): Port empty

Which confirms the deadness of the disk. There's loads of other stuff you can do with tw_cli, so here are some useful links where I managed to grab most of this information. I'd also like to add the rider, that as both of the boxes used in the examples above had disks which were dead, and we haven't had a situation where we've managed to rescue an array and bring a failed disk back to life, we can't vouch for the Lazarus techniques listed above from personal experience, and they're here so we know what to try next time it happens.

 

BB4 - New LSI controller

We now use

sas2ircu 0 display

for info on the RAID in BB4. This utility comes courtesy of Supermicro and there's a link here:

http://www.natecarlson.com/2010/08/23/lsi-command-line-utility-for-sas2-non-raid-controllers/