I've recently finished my fourth article for SysAdmin magazine - this was a bit of a rush job, as they want to publish in the "Clusters" themed issue in April. It is a discussion of a technique for system failover that I recently put into production, which more-or-less guarantees transparency of failover.
I'll not publish the full article online until after the magazine put it in print, however, here's the introduction.
Whilst working at a customer site, I recently had a requirement to design a mechanism whereby a very large legacy database server could be failed over from one site to another.
Over the years several methods of doing so had been investigated, and subsequently abandoned for various financial, technical and political reasons.
Years after the initial service was implemented, the system was still a major single point of failure for the customer, and scheduled to still be in active duty for anything up to eighteen months.
During this time, the size and criticality of the system meant that the risk of implementing a creative solution was deemed to be inappropriate, until a single, significant event forced the client to reassess the situation.
The solution had to be constructed around the existing system, keeping change to a minimum, and absolutely avoiding any direct interference with the database and application.
We had a healthy budget to implement, however, we felt that a competent solution could be delivered whilst utilising a fraction of the available funds.

0 Trackbacks