Converting to Software RAID


References
  1. What is RAID?
  2. The Software-RAID HOWTO
  3. Man pages for Raidtools package: raidtab(8), mkraid(8), raidstart(8) and raidstop(8)
  4. Partitioning with fdisk

Introduction

Computer memory is typically more than 50 times faster than hard drive throughput. Which means that hard drive I/O is a major bottleneck. Linux software RAID0 results in a dramatic improvement in overall throughput of a computer and is very stable and easy to set up. And of course it's free, unlike hardware RAID.

Many distributions will allow you to configure a fresh install with RAID, but this page describes how to transition to it once a Linux distribution is already installed in a non RAID partition.

For the typical home user, RAID0 (parallel throughput with no redundancy) works well because it requires the least number of drives. The setup used in this example uses one hard drive on IDE0, one on IDE1 (that means the drives are on separate cables) and one SCSI drive. It is important that each drive be on a different channel.

In my experience, Linux software RAID is very efficient and you should expect nearly twice the throughput for two drives and three times the throughput for three drives, etc.

A RAID partition is made up of a collection, or array of hard drive partitions. A RAID0 partition is sized on the smallest partition in the collection. Therefore, when setting up the partitions with fdisk, make each partition in the collection as near the same size as you can to avoid wasting space. For example, in a two partition RAID0 with hda1 set at 20 Gb and hdc1 set to 30 Gb will result in 10 Gb of hdc1 being completely inaccessible.

The size of a RAID0 partition is simply the size of the smallest partition times the number of partitions in the collection. For the above example, the RAID0 would be 40 Gb.

Detailed Example

What follows below is a step by step account of how I moved a Slackware 10.0 Linux distribution installed on a single partition to a root RAID0 partition. The steps involved are very generic and will work with any Linux distribution of Linux.

All typed keystrokes are in green.

  1. Plan Partitioning

    If you're unsure of the sizes of your drives, have a look at the Linux boot up messages. It's best to look right after a reboot to ensure dmesg will display all the boot messages.

    dmesg -s65536 | less

      hdb: 55704096 sectors (28520 MB) w/512KiB Cache, CHS=3467/255/63, UDMA(33)
      hdc: 37615536 sectors (19259 MB) w/418KiB Cache, CHS=37317/16/63, UDMA(33)
      SCSI device sda: 17783240 512-byte hdwr sectors (9105 MB)
      
    Since the SCSI drive is the smallest, I decided to use all of it for the RAID0 partition. Then configure a 9 Gb partition on hdb and hdc. The resulting raid partition will be 27 Gb which is plenty for any Linux Distribution.

  2. Partition the First Drive

    While logged in as root use fdisk to set up the RAID partitions. Starting with the SCSI drive:
      root@bultaco:~# fdisk /dev/sda      
    
    Clink on this link to see details of partitioning /dev/sda

    NOTE: Be sure to set all partitions in the RAID collection to type "fd". Otherwise the RAID partition will not be recognized by the kernel at boot up.

    /dev/sda1, the first RAID0 partition in the collection looks like this:

    Disk /dev/sda: 9105 MB, 9105018880 bytes
    64 heads, 32 sectors/track, 8683 cylinders
    Units = cylinders of 2048 * 512 = 1048576 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/sda1               1        8683     8891376   83  Linux
    
    Note also that number of bytes can be computed pretty close with

        64 heads(8683 cyl/2 heads)(2 track/1 cyl)(32 sect/1 track)(512 byte/1 sect) = 9104785408 bytes

    or

        8891376(1024 byte/1 block) = 9104769024

    This gives a difference of 9104785408 - 9104769024 = 16384 bytes (why is this? Maybe because of the LBA translation?)

  3. Partition the Second Drive

    Next create a raid partition on /dev/hdb as near to the same size as /dev/sda1 as possible (within the nearest cylinder).
      root@bultaco:~# fdisk /dev/hdb      
    
    The end result is the creation of /dev/hdb2 as shown in the following listing:
    Command (m for help): p
    
    Disk /dev/hdb: 28.5 GB, 28520497152 bytes
    255 heads, 63 sectors/track, 3467 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
       Device Boot      Start         End      Blocks   Id  System
    /dev/hdb1               1          32      257008+  82  Linux swap
    /dev/hdb2              33        1139     8891977+  fd  Linux raid autodetect
    /dev/hdb3            1140        3467    18699660   83  Linux
    
    The Start and End cylinder numbers were calculated as follows:

        N0 = CE0 - CS0+ 1 = 8683 - 1 + 1 = 8683

        N1 = H0*N0*S0 / (H1*S1) = 64*8683*32 / (255*63) = 1106.927

        CE1 = N1 + CS1 + 1 = 1107 + 33 + 1 = 1141

    Where N0:   Number of cylinders in initial partition
          CS0:   Starting cylinder number of initial partition
          CE0:   End cylinder number of initial partition
          H0:   Number of logical heads for the initial drive
          S0:   Number of sectors/track in the initial drive
    
          N1:   Computed number of cylinders to for the next partition
          CS1:   Starting cylinder number of next drives partition
          CE1:   End cylinder number of next drives partition
          H1:   Number of logical heads for the next drive
          S1:   Number of sectors/track in the next drive
    

    If you set the End cylinder to match the displayed number of blocks you get 1139 instead of 1141. I suspect it's more accurate to use 1141 but it's just pocket change, so either way it's okay.

  4. Partition the Third Drive

    The final partition in the collection is /dev/hdc where I fdisk'ed it as follows:
    Disk /dev/hdc: 19.2 GB, 19259154432 bytes
    255 heads, 63 sectors/track, 2341 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    
    Device Boot      Start         End      Blocks   Id  System
    /dev/hdc1               1          32      257008+  82  Linux swap
    /dev/hdc2              33        1139     8891977+  fd  Linux raid autodetect
    /dev/hdc3            1140        2341     9655065   83  Linux
    
    The calculation works out the same for this drive as it did for /dev/hdb.

  5. Create /etc/raidtab

    The man page for /etc/raidtab talks a little about optimizing for speed by adjusting "chunk" size. This is the data size that is used to stripe the disks with. One chunk is sent to one drive and then the next chunk sent to the next disk and so on. I did some file transfers and found that a 32k chunk size was fastest on my old P5 166 MHz machine.

    The raid array of interest here is /dev/md0 as it is the RAID0 array. /dev/md1 is just a linear array to lump the left over bits into one contiguous partition.

    root@bultaco:/etc# cat raidtab
    # 
    # chunk-size shootout:
    #
    #       16 = 4.8 Mb/s
    #       32 = 5.1 Mb/s
    #       64 = 4 Mb/s
    #
    raiddev /dev/md0
    	raid-level              0
    	nr-raid-disks           3
    	persistent-superblock   1
    	chunk-size              32
    
    	device                  /dev/hdb2
    	raid-disk               0
    	device                  /dev/hdc2
    	raid-disk               1
    	device                  /dev/sda1
    	raid-disk               2
    
    raiddev /dev/md1
    	raid-level              linear
    	nr-raid-disks           2
    	persistent-superblock   1
    	chunk-size              32
    
    	device                  /dev/hdc3
    	raid-disk               0
    	device                  /dev/hda2
    	raid-disk               1
    

  6. Create the RAID0 Array

    The the RAID0 array can be made with mkraid.

    root@bultaco:/etc# mkraid /dev/md0

  7. Check Out the New RAID0 Array

    Lastly make sure it's up and running and look at /proc/mdstat:

    root@bultaco:/etc# cat /proc/mdstat

    Personalities : [linear] [raid0] [raid1] [raid5] 
    read_ahead 1024 sectors
    md1 : active linear hda2[1] hdc3[0]
          12751488 blocks 32k rounding
          
    md0 : active raid0 sda1[2] hdc2[1] hdb2[0]
          26675072 blocks 32k chunks
          
    unused devices: <none>
    

  8. Format /dev/md0

    Now the new raid array (partition) /dev/md0 can be formatted just like another partition. I recommend using a newer filesystem with journaling, such as ReiserFS, ext3 or xfs. For example, use mkreiserfs(8) for a Reiser file system:

    root@bultaco:/etc# mkreiserfs -l root /dev/md0

  9. Reboot With A Rescue Disk

    With the new RAID0 array formatted, the existing Linux distribution needs to transferred to it. Although not required, I find this most easily done with the use of a Linux live CD or resuce CD (a boot floppy will work too). There are many sources of these disks. Your Linux distribution probably has a rescue CD or, as is the case with SuSE, a rescue mode as part of the bootable installation CD. Disk 2 of the Slackware CD set has a bootable rescue CD. There are also many of the biz card type (50 Mb) rescue CDs that will work fine too.

    Here's some I've used:

  10. Copy Linux to /dev/md0

    Once booted to an alternate Linux, create directories and copy your Linux distribution to the new raid array as follows:
    
        cd /
        mkdir old new
        mount /dev/hda2 old
        mount /dev/md0 new
        cp -a old/* new
        reboot
    
    

    The above "reboot" command should get you back into Linux on the old partition.

  11. Update the new /etc/fstab

    The version of fstab on the new raid array needs to be updated. First mount the new raid array with:
    
        mount /dev/md0 /mnt
    
    

    Change /mnt/etc/fstab to look something like this:

    /dev/hdb1        none             swap        pri=42
    /dev/hdc1        none             swap        pri=42
    /dev/md0         /                reiserfs    defaults         1   1
    /dev/hda6        /boot            ext2        defaults         1   2
    /dev/md1         /r               ext3        defaults         1   2
    /dev/hda2        /old             reiserfs    defaults         1   2	
    
    Setting both swap partitions to the same priority is supposed to make them act as a RAID0 array. I don't know how to verify that it really works though.

    The last line allows you to look at the old linux distribution on the single partition. For this to work you have to create the /old directory with the command:

    root@bultaco:/etc# mkdir /old

  12. Reconfigure LILO

    Lastly, the boot loader you use must be configured to boot to the new raid array. I still prefer LILO, so here is how my /mnt/etc/lilo.conf looks:

    # LILO configuration file
    # Start LILO global section
    boot = /dev/hda
    message = /boot/boot_message.txt
    prompt
    timeout = 150
    # Override dangerous defaults that rewrite the partition table:
    change-rules
      reset
    # VESA framebuffer console @ 1024x768x256
    vga = 773
    default = Raid
    
    # End LILO global section
    
    # Windows
    other = /dev/hda1
      label = Windows
      table = /dev/hda
    
    # Linux RAID0
    image = /boot/vmlinuz.s1
      root = /dev/md0
      label = Raid
      read-only
    
    # Linux - Original install in single partition
    image = /boot/vmlinuz.s1
      root = /dev/hda2
      label = linux-old
      read-only
    
    
    By simply copying files to /dev/md0 and adding a new section to lilo.conf leaves the original Linux installation intact. In this way you can always boot with the original installation until you get the new one working (or if you want a complete, working, backup installation).
  13. Rerun LILO

    To make lilo use the new lilo.conf under the /mnt directory, use lilo with the chroot option:

    root@bultaco:/etc# lilo -r /mnt

    If lilo runs successfully, it will list the boot labels from your /mnt/etc/lilo.conf file and place a star after the default system to boot to. (in this case "Raid" is the default).

    If all seems to be in order, Go For It and Reboot!

    If you new RAID setup does not come up, you can reboot to the old one and fix things. Email me at davidbtdt at gmail.com if you need help.


This page serves as a good reminder for when I need to do this once every few years. I hope it can help others too. If it needs more explanation, let me know. Hopefully though, if you read the man pages and this page, it should all start to make sense. Please drop me a quick email if this helps or if you have comments.