AppPool/IIS DNS Caching beyond TTL
So using AWS Redis ("elasticache") with 3 nodes, as a session state via the StackExchange Redis sessionstate provider.
Connection is via a CNAME. AWS provides a single DNS entry with a very short TTL that always points to the "master" node, so in the event of a failover, DNS updates, propegates and systems resume.
In theory.
Last night our master failed. The cluster very happily failed from node 1 -> node 2, but our website stayed offline.
Got alerted about an hour after the event, so for a < 5 minute TTL, already at over an hour.
Investigation found DNS was fine. Both nslookup and "ping" from the web server itself showed the main host was resolving to node #2 as expected. By this point node #1 had rebooted but was now a read only replica.
So now the application is completely throwning an error that it can't write to a read only server, even though DNS was showing the proper IP everywhere.
In the end, recycle the app pool, and instantly everything came back online. From what I can tell, the app pool was essentially caching the DNS lookup beyond the TTL.
Is there a way to prevent or change this behaviour? I'd like to have the app be properly resilient to a future failover event.
0 comments:
Post a Comment