One week ago today, social media accounts for the information-system services at several universities and colleges starting lighting up with advisories to students: Duo, Cisco's popular authentication service, was suffering from performance issues, preventing them from logging into their accounts.
"DUO is experiencing a systemwide outage, which may impact the ability to log in to GU systems," stated Georgetown University's Information Services on X, formerly known as Twitter. The school updated students two hours later: "DUO performance is slowly improving, but DUO is still reporting issues affecting the university’s two-factor authentication for all Georgetown systems."
On the other side of the country, San Francisco State University's Information Technology Services posted a similar message for students: "ITS has identified an issue with DUO. ITS is working towards a resolution in collaboration with the vendor to solve this issue as quickly as possible."
The outage lasted just five hours, from approximately 9 a.m. to 2 p.m. ET — caused by application latency that had "increased to service-impacting levels," according to Duo's status log — but that was long enough to highlight why organizations need to make contingency plans in the event their two-factor authentication services are not available. Organizations exploring newer and stronger authentication methods should also include resilience and business continuity in the conversation.
A failure in an authentication service can disrupt operations, says Andras Cser, vice president and principal analysis at Forrester Research.
"Anytime two-factor or multifactor authentication does not have alternative or backup login methods and form factors — biometrics, offline one-time password generators, etc. — MFA can become a bottleneck, regardless of whether it's on-prem or cloud-hosted. When there is no authentication, the company essentially stops working."
Don't Let a Minor Outage Cause Major Disruptions
Duo first received a notification of latency issues at 9:03 a.m. ET on Aug. 21, with the problem escalating until the increased latency caused authentication failures for some customer applications protected by Duo's service. The latency impacted a variety of organizations; among the most vocal were the universities previously cited, as well as the University of Manchester in the United Kingdom, Colorado State University, and Western University, according to comments posted on X by the schools' IT groups.
The outage offers some valuable lessons for enterprises, chief among them: Businesses should walk through their authentication infrastructures and make sure that they do not have single points of failure, says Steve Won, chief product officer at 1Password, an authentication firm. Push-based models, where a device and a back-end authentication service are linked, can cause outages when the back end becomes unavailable.
"Push-based solutions are proprietary and reliant on a back-end service, whether it’s Duo, Okta, [or] Microsoft, receiving requests via network and making authentication decisions based on policy," Won says.
Time-based one-time password (TOTP) authentication does not have a single point of service failure and can be used as a failover mechanism, he adds.
"TOTP solutions are generic because [they're] built on a standard technology and can also work regardless of if the authentication service is up or down," Won says. "That's because there's an algorithm and a timer both on the service side and locally."
While SMS text-based one-time passwords could also be used, current best practices call for avoiding the mechanism due to known ways of bypassing the technology, mainly through SIM swapping attacks.
In their "Guide to Business Continuity Preparedness," Duo outlines how applications should fail over during an outage: Organizations can choose to "fail secure" and not allow alternative types of access, or to "fail safe" and allow users to use lesser forms of authentication and bypass two-factor authentication. When service is reachable but performance is degraded, organizations have to take their own countermeasures, the guide states.
"Continuity is nuanced for customers, individually based on risk tolerance and risk acceptance factors," a Duo spokesperson told Dark Reading.