Friday, November 9, 2012

Sudden large increases in MySQL slave lag caused by clock drift

Just in case this ever helps anyone else, I had a machine where slave lag (as reported by Seconds_Behind_Master in SHOW SLAVE STATUS) would sometimes suddenly jump to 7 hours and then come back, and jump again, and come back.


Turns out, the machine's clock was off by 7 hours and no one had noticed!  After fixing NTP synchronization, the issue remained, I suspect that MySQL keeps a base timestamp in memory that was still off by 7 hours.

The fix was to STOP SLAVE; START SLAVE;

3 comments:

  1. Seconds_Behind_Master is sort of a misnomer; it actually measures the difference between the IO Thread and the SQL Threads on the slave. If for some reason you have a slave, say, on a modem or on the moon, Seconds_Behind_Master could read zero despite being severely delayed. This field is, as you discovered, resilient to time differences between master and slave -- but only if that difference is constant.

    A more reliable means to measure replication delay would be to use a replicated heartbeat table (UPDATE Heartbeat SET LastUpdated = UNIX_TIMESTAMP()).

    ReplyDelete
  2. Well in this case the time difference between both machines' clocks was constant.

    Maybe I should just deploy Percona's pt-heartbeat.

    ReplyDelete
  3. our seconds behind master is in 43000

    After seeing the post I looked at the time of master & slave, I had corrected the time but the replication delay is not at all decreasing, any suggestions would be helpful..please

    ReplyDelete