Friday, November 9, 2012

Sudden large increases in MySQL slave lag caused by clock drift

Just in case this ever helps anyone else, I had a machine where slave lag (as reported by Seconds_Behind_Master in SHOW SLAVE STATUS) would sometimes suddenly jump to 7 hours and then come back, and jump again, and come back.


Turns out, the machine's clock was off by 7 hours and no one had noticed!  After fixing NTP synchronization, the issue remained, I suspect that MySQL keeps a base timestamp in memory that was still off by 7 hours.

The fix was to STOP SLAVE; START SLAVE;

2 comments:

Sean Rees said...

Seconds_Behind_Master is sort of a misnomer; it actually measures the difference between the IO Thread and the SQL Threads on the slave. If for some reason you have a slave, say, on a modem or on the moon, Seconds_Behind_Master could read zero despite being severely delayed. This field is, as you discovered, resilient to time differences between master and slave -- but only if that difference is constant.

A more reliable means to measure replication delay would be to use a replicated heartbeat table (UPDATE Heartbeat SET LastUpdated = UNIX_TIMESTAMP()).

Benoit Sigoure said...

Well in this case the time difference between both machines' clocks was constant.

Maybe I should just deploy Percona's pt-heartbeat.