Thursday, May 23, 2019

AppPool/IIS DNS Caching beyond TTL

So using AWS Redis ("elasticache") with 3 nodes, as a session state via the StackExchange Redis sessionstate provider.

Connection is via a CNAME. AWS provides a single DNS entry with a very short TTL that always points to the "master" node, so in the event of a failover, DNS updates, propegates and systems resume.

In theory.

Last night our master failed. The cluster very happily failed from node 1 -> node 2, but our website stayed offline.

Got alerted about an hour after the event, so for a < 5 minute TTL, already at over an hour.

Investigation found DNS was fine. Both nslookup and "ping" from the web server itself showed the main host was resolving to node #2 as expected. By this point node #1 had rebooted but was now a read only replica.

So now the application is completely throwning an error that it can't write to a read only server, even though DNS was showing the proper IP everywhere.

In the end, recycle the app pool, and instantly everything came back online. From what I can tell, the app pool was essentially caching the DNS lookup beyond the TTL.

Is there a way to prevent or change this behaviour? I'd like to have the app be properly resilient to a future failover event.

AppPool/IIS DNS Caching beyond TTL Click here
  • Blogger Comment
  • Facebook Comment

0 comments:

Post a Comment

The webdev Team