Manual Linux Rescue System/en
Aus EUserv Wiki
Root (Diskussion | Beiträge) |
Almi1 (Diskussion | Beiträge) |
||
Zeile 771: | Zeile 771: | ||
smartctl -a -d megaraid,N /dev/sdX | smartctl -a -d megaraid,N /dev/sdX | ||
+ | |||
+ | = Checking the RAM = | ||
+ | |||
+ | In order to perform a check of the server's memory the '''memtester''' utility can be used. It's available on the EUserv mirror and can be obtained from the following link: | ||
+ | |||
+ | http://mirror.euserv.net/misc/memtester.tar.gz | ||
+ | |||
+ | |||
+ | To perform the check, please proceed as follows: | ||
+ | |||
+ | |||
+ | * Log in to the Rescue-System | ||
+ | |||
+ | * Download '''memtester'''. Use the following command: | ||
+ | |||
+ | <pre>wget http://mirror.euserv.net/misc/memtester.tar.gz</pre> | ||
+ | |||
+ | * Extract the archive. Use the following command: | ||
+ | |||
+ | <pre>tar xfz memtester.tar.gz</pre> | ||
+ | |||
+ | * Change to the extracted directory. Use the following command: | ||
+ | |||
+ | <pre>cd memtester</pre> | ||
+ | |||
+ | * Compile the program. Use the following command: | ||
+ | |||
+ | <pre>make</pre> | ||
+ | |||
+ | Now you can execute the program with the following pattern: | ||
+ | |||
+ | <pre>./make <Amount of memory> <Passes></pre> | ||
+ | |||
+ | |||
+ | The amount of memory can be determined with the command '''free -m'''. The respective value can be found under the '''total''' column. | ||
+ | |||
+ | |||
+ | '''Example:''' | ||
+ | |||
+ | <pre>total used free shared buffers cached | ||
+ | Mem: 3821 3444 376 3 1 2717 | ||
+ | -/+ buffers/cache: 724 3096 | ||
+ | Swap: 1953 5 1947</pre> | ||
+ | |||
+ | In order to check the memory two times in a row you can use the following command: | ||
+ | |||
+ | <pre>./memtester 3821 2</pre> |
Version vom 08:38, 15. Okt. 2015
Languages: |
Deutsch • English |
Using the Linux Rescue system
Using the Linux Rescue system
Activating the Linux Rescue system
You have to activate the Linux Rescue system via the customer service center. The following Wiki guide shows you how to activate the Linux Rescue system:
Connecting to the Linux Rescue system
After the activation you can connect with the Linux Rescue system. Here you can find a Wiki guide covering that topic:
Connect with the Rescue system
Resetting the root password
Preparation
You have to connect with the Linux Rescue system to change the root password. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
Implementation
Please proceed as follows to change the root password:
- Enter your installed system in a chroot environment (replace X with the relevant partition number):
mount /dev/sdaX /mnt/custom //(root partition) mount /dev/sdaX /mnt/custom/boot //(boot partition) cd /mnt/custom mount --bind /dev dev mount --bind /sys sys mount --bind /proc proc chroot . /bin/bash
- Enter the following command as root:
passwd
- Enter the new password.
- Enter the new password again.
- Exit the chroot environment and unmount the partitions:
exit umount dev sys proc boot cd .. umount custom
- Deactivate the Rescue system via the customer service center.
- Perform a web reset via the customer service center.
You have successfully changed the root password. You can now connect to your system with the new assigned password.
Disabling the firewall
Preparation
You have to connect to the Linux Rescue system in order to disable the firewall. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
Implementation
Please proceed as follows to change the root password:
- Enter your installed system in a chroot environment (replace X with the relevant partition number):
mount /dev/sdaX /mnt/custom //(root partition) mount /dev/sdaX /mnt/custom/boot //(boot partition) cd /mnt/custom mount --bind /dev dev mount --bind /sys sys mount --bind /proc proc chroot . /bin/bash
CentOS/Red Hat/Fedora
Enter the following command as root user:
chkconfig --level 2345 iptables off
Debian/Ubuntu
Enter the following command as root user:
update-rc.d -f iptables remove
Restoring a faulty software RAID
Checking of MBR / GPT partition
To check if the partition tables of the hard disk drives are formatted as MBR or GPT, proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
- Check the format of the partition table of your hdd with the following command (Replace X with the hard disk which has to be checked):
parted -s /dev/sdX print
The partition table has the format MBR:
Partition Table: mbr
The partition table has the format GPT:
Partition Table: gpt
RAID1 with MBR partition
Preparation
You have to connect to the Linux Rescue system to restore a faulty software RAID. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
- Check the format of the partition table: Checking of MBR / GPT partition
- Check the state of the software RAID with the following command:
cat /proc/mdstat
An intact RAID1 partition has the status 'U'. That means all involved partitions are ok.
Example output:
Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 1847608639 blocks super 1.2 [2/2] [UU] md2 : active raid1 sda3[0] sdb3[1] 1073740664 blocks super 1.2 [2/2] [UU] md1 : active raid1 sda2[0] sdb2[1] 524276 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 8387572 blocks super 1.2 [2/2] [UU] unused devices: <none>
A faulty RAID1 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.
Example output:
Personalities : [raid1] md3 : active raid1 sda4[0] 1843414335 blocks super 1.2 [2/1] [U_] md2 : active raid1 sda3[0] 1073740664 blocks super 1.2 [2/1] [U_] md1 : active raid1 sda2[0] 524276 blocks super 1.2 [2/1] [U_] md0 : active raid1 sda1[0] 12581816 blocks super 1.2 [2/1] [U_] unused devices: <none>
In this example the partitions on the second hard disk drive sdb are not displayed. This indicates either a faulty hard disk drive or RAID partition.
Implementation
In order to restore the RAID, please proceed as follows:
- Enter the following command (Please follow the sequence carefully! (sfdisk -d source system | sfdisk target system):
sfdisk -d /dev/sda | sfdisk /dev/sdb
- Enter the following command to rescan the partition table:
sfdisk -R /dev/sdb
- Use the following command to check if the hard disk drives sda and sdb have the same partition sizes:
cat /proc/partitions
- If all partitions are present, you can assemble them into the RAID:
mdadm /dev/md0 -a /dev/sdb1 mdadm /dev/md1 -a /dev/sdb2 mdadm /dev/md2 -a /dev/sdb3 mdadm /dev/md3 -a /dev/sdb4
Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:
cat /proc/mdstat
RAID1 with GPT partition
Preparation
You have to connect to the Linux Rescue system to restore a faulty software RAID. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
- Check the format of the partition table: Checking of MBR /GPT partition
- Check the state of the software RAID with the following command:
cat /proc/mdstat
An intact RAID1 partition has the status 'U'. That means all involved partitions are ok.
Example output:
Personalities : [raid1] md3 : active raid1 sda4[0] sdb4[1] 1847608639 blocks super 1.2 [2/2] [UU] md2 : active raid1 sda3[0] sdb3[1] 1073740664 blocks super 1.2 [2/2] [UU] md1 : active raid1 sda2[0] sdb2[1] 524276 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 8387572 blocks super 1.2 [2/2] [UU] unused devices: <none>
A faulty RAID1 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.
Example output:
Personalities : [raid1] md3 : active raid1 sda4[0] 1843414335 blocks super 1.2 [2/1] [U_] md2 : active raid1 sda3[0] 1073740664 blocks super 1.2 [2/1] [U_] md1 : active raid1 sda2[0] 524276 blocks super 1.2 [2/1] [U_] md0 : active raid1 sda1[0] 12581816 blocks super 1.2 [2/1] [U_] unused devices: <none>
In this example the partitions on the second hard disk drive sdb are not displayed. This indicates either a faulty hard disk drive or RAID partition.
Implementation
To restoring the RAID, proceed as follows:
- Enter the following command to copy the partition table of sda to the new hdd sdb:
sgdisk -R /dev/sdb /dev/sda
Assign a new random UUID to the hdd:
sgdisk -G /dev/sdb
Now the hdd can be mounted to the RAID:
mdadm /dev/md0 -a /dev/sdb1 mdadm /dev/md1 -a /dev/sdb2 mdadm /dev/md2 -a /dev/sdb3 mdadm /dev/md3 -a /dev/sdb4
Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:
cat /proc/mdstat
RAID5/6 with MBR partition
Preparation
You have to connect to the Linux Rescue system in order to restore a RAID5/6. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
- Check the format of the partition table: Checking of MBR / GPT partition
- Check the state of the software RAID with the following command:
cat /proc/mdstat
An intact RAID5 partition has the status 'U'. That means all involved partitions are ok.
Example output:
Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid5 sda7[0] sdc7[2] sdb7[1] 5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
A faulty RAID5/6 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.
Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid5 sda7[0] sdc7[2] sdb7[1] 5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
In this example the fourth hard disk drive in RAID5 is missing.
Implementation
In order to restore the RAID, please proceed as follows:
- Enter the following command (Please follow carefully the sequence! (sfdisk -d source system | sfdisk target system):
sfdisk -d /dev/sda | sfdisk /dev/sdd
- Enter the following command to rescan the partition table:
sfdisk -R /dev/sdd
- Use the following command to check if the hard disk drives sda and sdb have the same partition sizes:
cat /proc/partitions
- If all partitions are present, you can mount these into the RAID:
mdadm /dev/md0 -a /dev/sdd1 mdadm /dev/md1 -a /dev/sdd2 mdadm /dev/md2 -a /dev/sdd3 mdadm /dev/md3 -a /dev/sdd4
Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:
cat /proc/mdstat
RAID5/6 with GPT partition
Preparation
You have to connect to the Linux Rescue system in order to restore a RAID5/6. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
- Check the format of the partition table: Checking of MBR /GPT partition
- Check the state of the software RAID with the following command:
cat /proc/mdstat
An intact RAID5 partition has the status 'U'. That means all involved partitions are ok.
Example output:
Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid5 sda7[0] sdc7[2] sdb7[1] 5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
A faulty RAID5/6 array has the status '_'. That either means one hard disk drive is missing / faulty or RAID partition has failed.
Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid5 sda7[0] sdc7[2] sdb7[1] 5842954752 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
In this example the fourth hard disk drive in RAID5 is missing.
Implementation
To restore the RAID, proceed as follows:
- Enter the following command to copy the partition table of sda to the new hdd sdd:
sgdisk -R /dev/sdd /dev/sda
Assign a new random UUID to the hdd:
sgdisk -G /dev/sdd
Now the hdd can be mounted to the RAID:
mdadm /dev/md0 -a /dev/sdd1 mdadm /dev/md1 -a /dev/sdd2 mdadm /dev/md2 -a /dev/sdd3 mdadm /dev/md3 -a /dev/sdd4
Now the partitions will be restored. This process may take some time depending on the partition size. The status can be queried with the following command:
cat /proc/mdstat
Checking / Restoring a faulty filesystem
Checking / Restoring a filesystem of a physical hard disk drive
In order to check the filesystem of a physical hard disk drive you have to connect with the Linux Rescue system. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
Enter the following command to start the check of the filesystem (Replace X with the relevant partition):
fsck /dev/sdX
fsck performs checking and repairing of a Linux file system.
Important: Don't run fsck on a mounted filesystem!
Checking the filesystem type
Enter the following command to check which filesystem type is used (Replace X with the relevant partition):
parted -s /dev/sdX print
ext2/3/4
To restore the faulty filesystem from type ext2/3/4, enter the following command (Replace X with the relevant partition):
fsck.ext3 /dev/sdX fsck.ext2 /dev/sdX ...
xfs
To restore the faulty filesystem from type xfs, enter the following command (Replace X with the relevant partition):
xfs_check /dev/sdX
xfs_repair /dev/sdX
Checking the filesystem of a software RAID
In order to check the filesystem of a software RAID you have to connect with the Linux Rescue system. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
Enter the following command to start the check of the filesystem (Replace X with the relevant partition):
fsck /dev/mdX
fsck performs checking and repairing of a Linux file system.
Important: Don't run fsck on a mounted filesystem!
Checking the filesystem of a hardware RAID
Converting of a filesystem
Converting from ext3 to ext4
Preparation
To convert an existing ext3 filesystem to an ext4 filesystem, you'll have to add the corresponding filesystem features. After changing the options you'll have to run a filesystem check.
For preparation, proceed as follows:
- Perform a data backup of all(!) important files (including your configuration files under /etc)
- Determine the exact name of the partition which has to be converted (using the following commands):
sudo fdisk -l sudo blkid
The first command will give you a listing of all hard disks and their partitions. The blkid command prints the unique ID of each of those partitions.
Please notice: The partition, which has to be changed needs to be unmounted!
Implementation
To convert an existing ext3 filesystem to an ext4 filesystem, proceed as follows:
- Add the ext4-specific options with the following command:
sudo tune2fs -O extents,uninit_bg,dir_index /dev/<device_file>
Replace <device_file> with the just determined partition name.
- Perfom a filesystem check with the following command (Please notice: The partition, which has to be changed needs to be unmounted!)
sudo fsck -fCVD /dev/<device_file>
- Mount the partition as ext4 with the following command:
sudo mount /dev/<device_file> /mnt
Replace <device_file> with the name of the partition you just converted.
- Open the file /mnt/etc/fstab with an editor.
- Replace the entry "ext3" with "ext4" of the corresponding partition:
/dev/<device_file> / ext4 relatime 0 1
Replace <device_file> with the name of the partition you just converted.
- Save the file, disable the Rescue System and perform a 'normal' system start.
Now the ext4 filesystem will be used.
Checking the hard disk drives
In order to check the hard disk drives you have to connect with the Linux Rescue system. Please proceed as follows:
- Activate the Rescue system via the customer service center.
- Connect to the Rescue system via SSH.
Hard drive check with smartctl / smartmontools
Hard disk drive with smartctl / smartmontools for normal hard disk drives
Please proceed as follows in order to check your hard disk drives with smartmontools:
- Start a short hard disk drive check with the following command (Replace X with the relevant hard disk drive):
smartctl -t short /dev/sdX
- Start a long hard disk drive check with the following command (Replace X with the relevant hard disk drive):
smartctl -t long /dev/sdX
Hard disk drive with smartctl / smartmontools for hard disk drives on hardware RAID controllers
In order to perform a short check for hard disk drives on 3ware hardware RAID controllers please proceed as follows:
- Enter the following command to start a short test (Replace X with the number of the relevant controller port on which the hard disk drive is connected. Please notice: The first hard disk drive is connected on port 0.):
smartctl -d 3ware,X -t short /dev/twa0
- Enter the following command to start a long test:
smartctl -d 3ware,X -t long /dev/twa0
Evaluation of the results
Enter the following command to display the results of the hard disk drive tests:
smartctl -l selftest /dev/sdX
The following output example shows that the hard disk drive health is ok:
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 4970 # 2 Long offline Completed without error 00% 4972
The following output example shows that the hard disk drive health is not ok ("read failure"):
=== START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 20% 717 555027747
Reporting errors to the support
Reporting errors of normal hard disk drives
In order to report errors of the hard disk drive to the support, please specify the output of the following command:
smartctl -a /dev/sdX
Reporting errors of hard disk drives behind hardware RAID controllers
In order to report errors of the hard disk drive on 3ware RAID controllers to the support, please specify the output of the following command (Replace X with the number of the relevant controller port on which the hard disk drive is connected.):
smartctl -d 3ware,X -a /dev/twa0
Hardware RAID
Basics / General information
Checking the status of the controller
3ware RAID controllers
In order to check the status of 3ware RAID constrollers, you have to be connected with the Linux Rescue system. Please proceed as follows:
- Activate the Rescue system via the customer service center:
- Connect to the Rescue system via SSH.
- Enter the following command to identify the ID of the controller (usually it is 0):
dmesg | grep 3ware
The following output is displayed (the controller ID is the number behind scsi):
[ 5.487015] scsi4 : 3ware 9000 Storage Controller
- Enter the following command to read the hardware RAID controller information (Replace X with the relevant controller ID):
tw_cli /cX show
The following exmaple output is possible:
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-1 OK - - - 149.001 RiW ON VPort Status Unit Size Type Phy Encl-Slot Model ------------------------------------------------------------------------------ p0 OK u0 149.05 GB SATA 0 - SAMSUNG HD160JJ p1 OK u0 149.05 GB SATA 1 - SAMSUNG HD160JJ
In this case the RAID is in a perfect condition.
Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB)
u0 RAID-1 REBUILDING 23% - - - 149.001 u0-0 DISK DEGRADED - - p0 - 149.001 u0-1 DISK OK - - p1 - 149.001 u0/v0 Volume - - - - - 149.001
In this case the RAID performs a rebuild. The faulty hard disk drive is the one that is connected on port 0.
LSI RAID controllers
In order to check the status of LSI RAID constrollers, you have to be connected with the Linux Rescue system. Please proceed as follows:
- Activate the Rescue system via the customer service center:
- Connect to the Rescue system via SSH.
- Enter the following command:
megacli -AdpAllInfo -aAll
An output of information about the LSI controller is displayed.
Checking the status of the hard disk drives
3ware RAID controllers
In order to start a check of hard disk drives behind 3ware RAID controllers with smartmontools, please proceed as follows:
- Enter the following command to start a short test (Replace X with the relevant controller port, on which the hard disk drive is connected. Please note that the first hard disk drive is connected on port 0):
smartctl -d 3ware,X -a /dev/twa0
LSI RAID controllers
In order to start a check of the hard disk drives behind LSI RAID controllers with smartmontools, please proceed as follows:
- Enter the following command to identify the device ID of the hard disk drive:
storcli /c0 /eall /sall show
- You can access your hard disk drive with the following command (Replace <X> with the relevant hard disk drive and <N> with the device ID):
smartctl -a -d megaraid,N /dev/sdX
Reporting errors to the support
3ware RAID controllers
In order to report errors of your hard disk drive behind a 3ware RAID controller to the support specify the output of the following command:
smartctl -d 3ware,X -a /dev/twa0
LSI RAID controllers
In order to report errors of your hard disk drive behind a LSI RAID controller to the support specify the output of the following command:
smartctl -a -d megaraid,N /dev/sdX
Checking the RAM
In order to perform a check of the server's memory the memtester utility can be used. It's available on the EUserv mirror and can be obtained from the following link:
http://mirror.euserv.net/misc/memtester.tar.gz
To perform the check, please proceed as follows:
- Log in to the Rescue-System
- Download memtester. Use the following command:
wget http://mirror.euserv.net/misc/memtester.tar.gz
- Extract the archive. Use the following command:
tar xfz memtester.tar.gz
- Change to the extracted directory. Use the following command:
cd memtester
- Compile the program. Use the following command:
make
Now you can execute the program with the following pattern:
./make <Amount of memory> <Passes>
The amount of memory can be determined with the command free -m. The respective value can be found under the total column.
Example:
total used free shared buffers cached Mem: 3821 3444 376 3 1 2717 -/+ buffers/cache: 724 3096 Swap: 1953 5 1947
In order to check the memory two times in a row you can use the following command:
./memtester 3821 2