Jul 25

After a bit of healthy debate in the office about the merits and
implications of using adaptive copy mode (ACp) with EMC's SRDF, I wanted to clarify my own
thoughts on how it operates, and the benefits of using it.

Firstly, I guess, for the uninitiated, it's a question of what is it exactly is it? - Well, EMC has their SRDF (Symmetrix Remote Data Facility) data replication protocol that allows data to be replicated over distance from one array to another.

Normally, whereever possible, when running replication at the hardware layer like this (where the hardware has no concept of application transactional consistency), you need the replication to perform synchronously with the I/Os. This essentially means that when a write operation is performed on the source array, it is flushed from disk and committed to the destination (i.e. remote) array before the operation is ack'd to the server as being completed.

Clearly there is a time overhead with this kind of arrangement, dictated by two factors - the (relatively constant, dependant on array loading) "rippling" effect of the write operation having to pass through two arrays before getting a final "commit", and also the transmission and acknowledgement time (as dictated by the speed of light and the distance between the arrays, which can be many kilometres apart).

From this (the aggregate of two times the transmission latency and the write latency for the remote array), it can be easily seen that the distance has an immediate bottleneck (not necessisarily in bandwidth, but certainly in transaction time - roundtrip times of 20-30ms is not unusual).

As the long-haul links between the arrays become congested, performance will rapidly degrade, and because of the synchronous nature your application will run slowly (I've seen write operations in excess of 300-400ms reported by the OS in bad cases)

In contrast, an async transfer mode will commit the write operation to local storage while transmitting the write to the remote array in the backgroup. This gives performance comparable to that of a non-SRDF setup, at the expense of the risk of missing I/O transactions in the event of a link or primary site failure. Because of the asynchronous nature of the SRDF transaction, even when the long-haul links between the two arrays are congested, your application will perform well and the SRDF updates will conclude at the next available opportunity.

And so, back to ACp - a compromise between the two.

The trouble with Synchronous is that you don't necessarily want your application slowing down at every busy spot during the day, perhaps you want to make more efficient usage of your bandwidth by playing the odds. You might, for example feel that running your long haul link at an extremely high utilisation is more cost effective than an upgrade in bandwidth. ACp will allow you to do this, but with the caveat that during peak load, you will be slightly out of sync.

ACp introduces the concept of the skew value.. essentially a threshold, counted in number of write operations. Below the skew value, the device pairing operates in async mode, and above the threshold it switches to synchonous mode (the skew value normally defaults to 65536 operations).

So, for example, playing the odds.. If, by running your long haul link at a higher utilisation meant that you couldn't keep your local app running quickly enough due to the link contention and latency, ACp may help. You might not be fully consistent one hundred percent of the day, but 98% might be good enough, especially when coupled with the local recovery abilities of journalled filesystems and modern DBMS.

Personally, I have only ever used ACp to mitigate high-bandwidth situations, for example bringing a new pair of arrays online, and having to perform a full SRDF establish of many terabytes of data. However.. the gotcha here is, depending on what else is using your long-haul links.. you may want to set your other services to ACp also, lest your high-bandwidth transfers have a knock-on effect on other (synchronous) pairings that are going on over the same links.


Posted by Mike Scott

| Top Exits (0)

0 Trackbacks

  1. No Trackbacks

5 Comments

Display comments as(Linear | Threaded)
  1. Doug Burns says:

    So is this a decision that can be made at a low enough level to use synchronous for Oracle online redo log files and ACp for data files?

  2. Mike says:

    Yup - as long as you've planned your storage accordingly - it can be set at the LUN level (i.e. where two hypers or metas on different arrays are paired for SRDF). So as long as you keep your data separate, this config is easily possible.

  3. Steve says:

    Can we have sql servers on our prod san and DR san using EMC srdf async replication alone to mirror the mdf's and ldf's and not have transactional consistency errors ? If we split the link and bring up the DR sql servers would there not be possibilities of torn pages / db's not recovering ? SRDF is bit level replication and doesn't ahere to sql server transactional consistency does it. I know this scenario would work in some situations but is not safe enough for a DR scenario is it ?

  4. Mike says:

    Hi Steve,

    Unfortunately, that's the risk that you may run when you're not able to run in a synchronous mode (usually distance constraints force the use of async). DBMS and filesystem journal rollback may form an acceptable risk, depending on the application.

    However, that said, SRDF/A does incorporate a checkpoint mechanism (known as "delta sets"), where I/O transactions are effectively rolled back to the last checkpoint in the event of a disaster - this gives it a fighting chance of being less inconsistent than if it were just a simple sliding window mechanism.

  5. Sam says:

    I believe this post should have placed a greater emphasis on the issue of data consistency.

    SRDF synchronous and SRDF asynchronous are both claimed by EMC to guarantee data consistency in the event of a link failure. (Note that I am not commenting on whether the vendor's assertion is true.)

    SRDF Adaptive Copy, on the other hand, has no such vendor guarantee of consistency.

    In this regard SRDF Adaptive Copy is not, in your words, "a compromise between the two," but is instead less capable than each.

Add Comment


Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA