These particular machines are 32-way SPARC64 V based Fujitsu machines (actually 64-way, but each chassis is divided into two partitions). They have numerious I/O controllers (off the top of my head, I'd estimate about 16 controllers), with an even split of both traditional SCSI and fibrechannel controllers that were installed at various points during the lifecycle of the old OS. Most controllers have devices visible on them - either locally installed hard drives, or LUNs visible across the SAN.
My plan was to retain the pair of drives from the old Operating System (Solaris 8), and install the new OS onto a fresh pair of drives which were already connected - this would facilitate an easy back-out, should it be required. I could not guarantee that the controller numbers would not renumber during the re-install - and to have Jumpstart accidentally select the wrong devices could lead to the old OS image being overwritten, data loss, or simply just the wrong devices being selected so that the disks are not mirrored between separate system boards as intended.
Now, armed with the engineering manual of the PrimePOWER server in question, I could have researched the probe order of the PCI cards, and correleated this with the existing installation, but to be frank, I'm way too lazy to do that. I'd rather find another less error-prone method of forcing Jumpstart to select the correct devices for install. Additionally, given that a full boot/POST/reboot cycle on these machines can take in excess of an hour, I was keen to ensure that the outage window for the work was kept to an absolute mininum - this meant that I had to be absolutely sure that I was going to get it right first time.
I think I found a good, simple solution. In fact, I'm surprised that this functionality isn't already allowed in the jumpstart profile.
Within Solaris, we already have a unique device identifier that will persist across reboots and OS re-installs - the physical device path. So, rather than specificying the intended devices using the traditional cXtYdZ notation, we should be able to specify a physical device path (for example on this particular machine type it would take the format something like this: /pci@86,4000/scsi@4/sd@0,0). In fact - we don't need to specify the entire physical path - the only requirement is that we specify enough to uniquely identify one device, but in most cases it's probably safer to fully specify that path.
In order to facilitiate this, I used a jumpstart derived profile, and using a very basic "begin" script that will check the system hostname, and according to a case statement will identify which physical devices are appropriate for this host. It will then lookup the logical access path in cXtYdZ format and generate the necessary profile for it.
I'll not fully reproduce the script here because it contains references to client config and other functionality that I may write about in the future, but the following code fragments ought to give you the idea (hostnames have been changed to protect the innocent):
case `hostname` in
pw2500a)
SOL10_ROOT=/pci@86,4000/scsi@4/sd@0,0
SOL10_MIRROR=/pci@a4,4000/scsi@4/sd@0,0
;;
pw2500b)
SOL10_ROOT=/pci@8a,4000/scsi@4/sd@0,0
SOL10_MIRROR=/pci@94,4000/scsi@4/sd@0,0
;;
*)
gameOver "ERROR: `hostname` not found in config"
;;
esac
######################
# Translate a physical device to a logical cXtXdX
translateDevice() {
if [ $# -ne 1 ] ; then
gameOver "ERROR : translateDevice() called with no args"
fi
DEVICE=$(ls -l /dev/dsk/c*s0 |grep $1 |awk '{print $(NF-2)}')
if [ -z "$DEVICE" ] ; then
gameOver "ERROR: $1 not found in /dev/dsk/"
fi
if [ $(echo $DEVICE |wc -w) -gt 1 ] ; then
gameOver "ERROR: $1 found, but returned multiple disks ($DEVICE)"
fi
print $DEVICE | cut -d/ -f4 | sed 's/s0$//'
}
######################
# Main program
if [ -z "$SOL10_ROOT" ] ; then
gameOver "ERROR: SOL10_ROOT not defined for `hostname`"
fi
SOL10_ROOT=$(translateDevice $SOL10_ROOT)
if [ -z "$SOL10_ROOT" ] ; then
exit 1
fi
SOL10_ROOT="${SOL10_ROOT}sXSECTIONX"
SOL10_DEVS=$SOL10_ROOT
if [ ! -z "$SOL10_MIRROR" ] ; then
SOL10_MIRROR=$(translateDevice $SOL10_MIRROR)
if [ -z "$SOL10_MIRROR" ] ; then
exit 1
fi
SOL10_MIRROR="${SOL10_MIRROR}sXSECTIONX"
SOL10_DEVS="$SOL10_ROOT $SOL10_MIRROR"
fi
print "filesys mirror $SOL10_DEVS 10240 /" |sed 's/XSECTIONX/0/g'
print "filesys mirror $SOL10_DEVS 4096 swap" |sed 's/XSECTIONX/1/g'
print "filesys mirror $SOL10_DEVS 6144 /var" |sed 's/XSECTIONX/3/g'
print "filesys mirror $SOL10_DEVS free /export" |sed 's/XSECTIONX/5/g'
print "metadb $SOL10_ROOT size 8192 count 3" |sed 's/XSECTIONX/7/g'
if [ ! -z $SOL10_MIRROR ] ; then
print "metadb $SOL10_MIRROR size 8192 count 3" |sed 's/XSECTIONX/7/g'
fi
(any errors or omissions in that script are likely to be due to me copy/pasting it into this entry..).
Importantly, this method also allows the user to run this script on the
host prior to the jumpstart actually being performed, and it should
still accurately identify the correct devices using their current
logical access paths. A useful pre-install check:
$ ./getDiskLayout
filesys mirror c1t0d0s0 c5t0d0s0 10240 /
filesys mirror c1t0d0s1 c5t0d0s1 4096 swap
filesys mirror c1t0d0s3 c5t0d0s3 6144 /var
filesys mirror c1t0d0s5 c5t0d0s5 free /export
metadb c1t0d0s7 size 8192 count 3
metadb c5t0d0s7 size 8192 count 3
$
The end result - every single machine built first time correctly, without a significant amount of manual investigation into the physical device path and PCI probe order. When using derived profiles, there are many ways that the same functionality could be implemented, and this illustrates just one of those methods.

0 Trackbacks