Service Performance Enhancements for eduroam following Networkshop 38

Introduction

In general terms Networkshop 38 was a great success for eduroam. The service was available over the 802.11n wireless network throughout the main conference venue and was successfully used by a large number of delegates. Unfortunately however, despite pre-conference testing, there were some people who experienced difficulties and the service periodically became unavailable for short periods. These problems only became apparent at the venue under the high-load conditions experienced during the conference; during previous smaller events at Manchester, the eduroam service had performed perfectly.

The good news is that following an investigation into the causes of the difficulties by Technical Support in conjunction with the conference IT facility organisers at Manchester, a number of beneficial measures have been introduced:

  • RADIUS realm logic-based anti-authentication loop performance enhancement for the National RADIUS Proxy Servers
  • a concerted programme to eradicate RADIUS configuration errors at participating organisation sites

Difficulties experienced and analysis of problem

The problems at Networkshop arose when a large number of RADIUS access-request packets generated by users attempting to authenticate were sent to the NRPS from Manchester. This occurred due to:

a) a number of home site issues such as i) accounts being locked due to user mis-keying resulting in multiple re-tries, ii) misconfigured home site ORPS not accepting legitimate variations of username credentials, resulting in multiple re-tries

b) sheer volume of authentication attempts at particular times during the Networkshop days.

It should be remembered that each authentication attempt generates a large number of RADIUS packets. Some EAP types generate an order of magnitude more traffic than others. At Networkshop, there were a number of delegates from sites using systems that generate such high-traffic volumes.

The high volumes of access-requests led to a protection mechanism kicking in on the NRPS on a number of occasions. The NRPS packet rate limiter protection mechanism has been in place on the NRPS since they were first installed and was designed to protect the service from authentication loops - which can render a RADIUS server unavailable. (Although there are three NRPS, the effect of such an inadvertent ‘denial of service’ auth-loops could have compromised the whole eduroam service).

Once the rate-limiter on the NRPS had been triggered, all RADIUS packets coming from Manchester were dropped for a short period of time. However as eduroam users at Manchester failed to gain authentication, they continued to retry authentication, so generating even more access-requests from Manchester that the rate-limiter code on the NRPS interpreted as further dangerous authentication-loop activity. This unfortunately had the effect of making the eduroam service unavailable to all eduroam users at Manchester for short periods of time. (The rate-limiter code contained logic to reset counters over a period of 5 minutes, effectively clearing memory about the potentially harmful site, which is why the problem was intermittent).


Solution

Following a thorough investigation and careful examination of the RADIUS and rate-limit system logs, the cause of the problems at Networkshop became apparent. The conclusion that the NRPS rate-limit protection mechanism, that had been designed to protect the NRPS, had itself resulted in the eduroam service becoming periodically unresponsive for Networkshop, led to a search for an alternative system.

A RADIUS realm logic-based performance enhancement for the national RADIUS proxy servers has been designed and loaded onto the NRPS. In tandem with this, there has been a concerted programme to eradicate RADIUS configuration errors on ORPS at participating organisation sites. The NRPS RADIUS packet-based rate-limiter protection trap has now been removed and following successful testing, the new realm-based logic has been deployed which will lead to performance improvement.

Summary

Lessons have been learned from the experiences at Networkshop. The causes of the problems were discovered and new realm-based logic has been developed and put in place on the NPRS. ORPS configuration errors at participating organisation sites have been corrected to eliminate the root cause of authentication loops and we can confidently look forward to improved authentication performance all round.


Further Reading

Authentication Loops

It is possible for authentication loops to arise during RADIUS authentication because there is no built-in expiry time within access-request packets – it’s a feature of the current RADIUS protocol. A loop can form due to misconfiguration of the realm-handling logic on a RADIUS server. This logic must be able to deal with non-mainstream variations of user ID that users might enter into their supplicant software. In the context of eduroam, a loop forms if an ORPS receives an authentication request from a user from the realm that the ORPS should deal with but the ORPS for whatever reason does not recognise the realm or rejects the user and returns the access-request to NRPS. The NRPS correctly handles the realm and sends the access-request back to the ORPS, which incorrectly returns it to the NRPS initiating a loop. Since there is no time to live limit, the access-request is sent back and forth, soaking up NRPS and ORPS processing and denying service to other authentication requests.

For example an ORPS should handle authentication attempts itself from the following users:

realuser@UPPERCASE-local-realm

realuser@valid-sub-realm.local-realm

realuser@local-realm@another-realm

invaliduser@local-realm

If the ORPS does not deal with these and instead returns them to the NRPS, the NRPS correctly identifies 'local-realm' and returns the access-request to the misbehaving ORPS. It was to prevent this situation that the rate-limiter code was originally put in place on the NRPS. The effect was simply to temporarily block RADIUS traffic from errant ORPSs with automatic restoration of service after a short period of time.

Realm-based logic

The new logic tackles the authentication loop problem at the RADIUS proxy level rather than the RADIUS packet-count level of the rate-limiter mechanism. Basically, the NRPS now identifies access-request packets coming from an ORPS that the ORPS should be handling itself, processes the packets and returns access-rejects containing the reason for the rejection.

The NRPS compares the realm component of the outer identity of the user ID in the access-request with the realm identity tag of the ORPS from where the access-accept came. Any access-request in which the outer ID is the same as the realm identity tag of the sending ORPS (a condition that should never arise in correct normal operation) will be processed by the NRPS and an access-reject returned to the ORPS. The access-reject will contain the reason for the rejection.

With this new realm-based logic in place on the NPRS, authentication loops and the previous packet-rate based protection mechanism are now things of the past.

eduroam June 2010

 

Any problems, comments or suggestions regarding this page, please e-mail the eduroam service manager.