Linux raid1 issue

Don´t be shy, Linux is fun! =)
Post Reply
User avatar
ayu
Staff
Staff
Posts: 8109
Joined: 27 Aug 2005, 16:00
18
Contact:

Linux raid1 issue

Post by ayu »

I haven't used raid that much in Linux so I'm not sure what to make of this error.

Code: Select all

This is an automatically generated mail message from mdadm
running on Teresa

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdc1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] 
md0 : active raid1 sdc1[0](F) sdd1[1]
      488384400 blocks super 1.2 [2/1] [_U]
      [=======>.............]  check = 39.9% (194954496/488384400) finish=405339.6min speed=12K/sec
      
unused devices: <none>

Should I be worried?
"The best place to hide a tree, is in a forest"

User avatar
bad_brain
Site Owner
Site Owner
Posts: 11636
Joined: 06 Apr 2005, 16:00
19
Location: In your eye floaters.
Contact:

Re: Linux raid1 issue

Post by bad_brain »

hm, not yet. ^^
check the corresponding syslog entry about that event, also check the SMART status with smartctl to make sure the HDD is not about to die. but I bet it's a random error in context with the mainboard drivers, on the old suck-o server one HDD jumped out of the RAID regularly because of such driver issues....if the resync finishes properly I am pretty sure it's such an error.

P.S. if the resync speed is VERY low best reboot and let the resync start again, I had this behavior with the old suck-o server too.
Image

User avatar
ayu
Staff
Staff
Posts: 8109
Joined: 27 Aug 2005, 16:00
18
Contact:

Re: Linux raid1 issue

Post by ayu »

Well this is the status right now at least ^^

Code: Select all

Personalities : [raid1]
md0 : active raid1 sdc1[0](F) sdd1[1]
      488384400 blocks super 1.2 [2/1] [_U]

unused devices: <none>
It seems that one device is down.
smartctl gives me

Code: Select all

root@Teresa:/media/backup/daily/# smartctl -a /dev/sdc
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               /2:0:0:0
Product:
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
"The best place to hide a tree, is in a forest"

User avatar
bad_brain
Site Owner
Site Owner
Posts: 11636
Joined: 06 Apr 2005, 16:00
19
Location: In your eye floaters.
Contact:

Re: Linux raid1 issue

Post by bad_brain »

ok, to initialize a resync you can either simply reboot, or remove the faulty device and the re-add it:

Code: Select all

mdadm --manage /dev/md0 --remove /dev/sdc1
mdadm --manage /dev/md0 --add /dev/sdc1
in case those commands fail it points to a controller error, then you really have to reboot (and add the device manually with the 2nd command if I remember it right, could be it's added automatically though)....but it SHOULD work because the device just jumped out of the RAID and not completely disappeared.

P.S. there are 2 ways you can see that a device "left" the RAID:
[_U] == 1st device gone, with both in the RAID it's [UU]
sdc1[0](F) == device marked as faulty (F)
Image

User avatar
ayu
Staff
Staff
Posts: 8109
Joined: 27 Aug 2005, 16:00
18
Contact:

Re: Linux raid1 issue

Post by ayu »

Yeah I read about the states in the manual, but now it looks like this after a reboot

Code: Select all

root@Teresa:/home/cats# cat /proc/mdstat 
Personalities : [raid1] 
md127 : active raid1 sdc1[0]
      488384400 blocks super 1.2 [2/1] [U_]
      
md0 : active raid1 sdd1[1]
      488384400 blocks super 1.2 [2/1] [_U]
This is ... strange? : |
"The best place to hide a tree, is in a forest"

User avatar
bad_brain
Site Owner
Site Owner
Posts: 11636
Joined: 06 Apr 2005, 16:00
19
Location: In your eye floaters.
Contact:

Re: Linux raid1 issue

Post by bad_brain »

hm, have you tried to remove sdc1 from the weird RAID and then add it to the real one?
Image

User avatar
ayu
Staff
Staff
Posts: 8109
Joined: 27 Aug 2005, 16:00
18
Contact:

Re: Linux raid1 issue

Post by ayu »

bad_brain wrote:hm, have you tried to remove sdc1 from the weird RAID and then add it to the real one?
Tried the following

Code: Select all

root@Teresa:/home/cats# mdadm --manage /dev/md127 --remove /dev/sdc1
mdadm: hot remove failed for /dev/sdc1: Device or resource busy
"The best place to hide a tree, is in a forest"

User avatar
bad_brain
Site Owner
Site Owner
Posts: 11636
Joined: 06 Apr 2005, 16:00
19
Location: In your eye floaters.
Contact:

Re: Linux raid1 issue

Post by bad_brain »

hm, let's see if that works:

Code: Select all

mdadm --stop /dev/md127
Image

User avatar
ayu
Staff
Staff
Posts: 8109
Joined: 27 Aug 2005, 16:00
18
Contact:

Re: Linux raid1 issue

Post by ayu »

bad_brain wrote:hm, let's see if that works:

Code: Select all

mdadm --stop /dev/md127
Did that, and then re-added it to the correct chain, and at first I got this

Code: Select all

root@Teresa:/home/cats# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdc1[2] sdd1[1]
      488384400 blocks super 1.2 [2/1] [_U]
      [====>................]  recovery = 23.3% (114060736/488384400) finish=79.5min speed=78382K/sec
Which looked pretty good, but now I just get this

Code: Select all

root@Teresa:/home/cats# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdc1[2](F) sdd1[1]
      488384400 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>
Seems like the HDD is dead or something?
"The best place to hide a tree, is in a forest"

User avatar
bad_brain
Site Owner
Site Owner
Posts: 11636
Joined: 06 Apr 2005, 16:00
19
Location: In your eye floaters.
Contact:

Re: Linux raid1 issue

Post by bad_brain »

hmmm...best remove the faulty drive from the RAID again and then run fsck on it.
checked the syslog?
Image

Post Reply