The Lync Kerberos Account

This post is going to be about what seems to be one of the least discussed features in Lync. I’m talking about the Lync Kerberos Account… No, don’t you dare close this window. We know it’s there, the Best Practice Analyzer tells us it’s considered ‘best practice’ to enable Kerberos authentication for Web traffic. Let’s keep that in mind while I try to unravel the reasons. To do so, and still make some sense, I have to start by explaining (quite) a bit about Kerberos and how it works. For any of you who are familiar with Exchange probably recognizes the Exchange Alternate Service Account (ASA), this is the Exchange equivalent of the Lync Kerberos Account. This is going to be a long post, and much won’t be specific to Lync, but I believe it’s worth reading and I believe people need to start deploying the Lync Kerberos Account.

Kerberos is complicated and most often you don’t have to care much about it, because it all happens automatically under the hood. While it’s great that it works automatically (most of the time) it shouldn’t be overlooked as it’s very important. It is the preferred authentication protocol for Windows. Most services that you access daily on your LAN are accessible to you because you have authenticated using Kerberos. Lync is no exception, it wants to use Kerberos as much as possible, if you’ll let it.
Today there are three commonly used authentication protocols in Windows. There’s basic authentication which, by itself, offers no security at all, that’s right, no security at all. Then we have NTLM, which was the preferred authentication protocol in pre-Kerberos times. It is still widely used today, because unlike Kerberos authentication, it doesn’t require that the authenticating client has access to a Domain Controller. This is the reason internal Lync clients try to authenticate using Kerberos and external clients fall back to NTLM. Finally, we have Kerberos.

If you aren’t interested in how it works and just want to deploy the Lync Kerberos account there is a TL;DR almost at the end of the post.

Great, you’re still reading. It gets a little complicated but I will do my best to explain it as simply as I can without omitting very important things.

Kerberos
Kerberos is an authentication protocol that is based on tickets that allows users and computers to authenticate in a secure manner over a non-secure network. The tickets that are used to authenticate never contains any passwords which means that a users password is never sent over the network. It uses symmetric key cryptography and requires a 3rd party that is trusted by the entity that is authenticating and the one that requires authentication. This 3rd party is the KDC (Key Distribution Center). In the Microsoft world this is the Domain Controllers. The KDC knows the secret key for all users and all computer accounts. All secret keys are a salted hash of the account password. The KDC is also trusted by all domain members, meaning that it can validate all tickets and any ticket issued by the KCD is trusted by all domain members. This is very important as we go on. As mentioned above Kerberos is based on tickets. There are different types of tickets and they are used for different things. To access a service you need a service ticket, a service ticket proves your identity to a computer that is running a service. To get a service ticket you need a Ticket Granting Ticket (TGT). To get a TGT you must authenticate yourself to the KDC.

In the following example a user wants to access an OWA (It’s a little easier to use as an example than Lync). The whole authentication process starts when the user logs on to the workstation.

  1. The user signs in to the workstation, when they do, the workstation will contact the KDC and request a Ticket Granting Ticket (TGT). The requests consists of a plain text part which includes among other things, the username. A portion of the request is encrypted with the users password. This is called the Authentication Service Request (AS-REQ).
  2. The KDC receives the request and inspects the plain text part where it can see who is this person claiming to be. It then looks up the user and in it’s database where it also finds the users secret key. It uses this secret key to decrypt the encrypted part of the message. If it’s able to decrypt the message it knows that the user is who they claim to be (or at least that the person is using the correct password). Now that the Domain Controller knows who the user is it will go ahead and respond with a TGT. This TGT contains information about the authenticated user and is encrypted with the KDC’s secret key which is not known to the workstation. It stores this TGT in the memory, for the sake of this example let’s call it the Kerberos memory, for as long as it is valid (10 hours by default). The response itself is called the Authentication Service Reply (AS-REP).

Now remember that step 1 and 2 happened when the user logged into their workstation. Now the sequence continues when the user decides to access the OWA. This happens, let’s say, an hour later.

  1. The user tries to access the OWA anonymously. The server responds with a HTTP 401 status code, meaning it requires authentication. In the response there is a header named “WWW-Authenticate” with the value of “Negotiate” which tells the client which type of authentication protocol to use. In this case Kerberos. The workstation looks in its Kerberos memory, where it will notice that it doesn’t have a service ticket for the HTTP service of the machine hosting the OWA. It then contacts the KDC again with a service ticket request. The request states which service the client wants to access. This is specified by including the Service Principal Name (SPN i.e. HTTP/exchange-cas01.contoso.com). The client also attaches the TGT it received in step 2 (from the kerberos memory) to the request. This is called the Ticket Granting Service Request (TGS-REQ).
  2. The KDC receives the TGS-REQ from the client. It reads the SPN in the request and searches it database for a user account, or computer account, with a matching SPN registered. Now that the KDC has verified that the SPN exists, it also knows which account that holds that particular SPN. To determine who the user is, it decrypts the TGT using its own secret key (the same it used to encrypt the ticket in step 2). If it successfully decrypts the ticket, it knows that it was issued by the KDC and therefore the content can be trusted. And the content contains… You’re right, it contains the users verified identity. Now the KDC knows which account that holds the SPN (the Exchange CAS) and the user who wants to access the service. It then responds to the TGS-REQ with a service ticket, which it encrypts with the secret key of the account that holds the SPN. The workstation then receives the service ticket. Note that since the service ticket is encrypted with a secret key unknown to the client, it cannot read the encrypted part of the ticket. The client stores this ticket in the Kerberos memory for as long as it’s valid. The response is called the Ticket Granting Service Reply (TGS-REP).
  3. The workstation now connects to the OWA. It includes the service ticket, taken from the Kerberos memory, and uses it in the “authorization” header of the HTTP request.
  4. The web server hosting OWA receives the request and inspects the authorization header. It extracts the service ticket and attempts to decrypt it using its own secret key. If it can, it means the KDC has encrypted it and it’s content can be trusted. From the ticket it is able to extract the users identity and their group memberships. The request is then authenticated and the user is served the requested contents.

I’ve made a small video that illustrates how the above example works



Important things to remember from the above example.

  • The server running the service (OWA) never contacted the DC. This means less load for the service and each client doing most of the job.
  • No password was sent over the network.
  • When a client has a valid ticket for a service it can be used as long as its valid, meaning no need to go through the authentication process again and again. Each request is authenticated but the client uses the same ticket.
  • The ticket sent to the service also includes group memberships which means authorization decisions can be made directly without contacting a Domain Controller.

NTLM
Now that we understand how Kerberos works (we do, don’t we?)… Well, at least we know the basics of how it works, which is enough for the scope of this post. Before we can get back to Lync we need to understand one more authentication protocol… I promise, its the last one for this post. We need to look into NTLM which is also used by Lync. Without understanding both protocols we can’t understand the pros and cons.

Unlike Kerberos, NTLM is not built on tickets. NTLM happens in three phases. These phases includes three different types, Type 1-3. So let use the same example as above. A client wants to access an OWA that is protected by NTLM authentication.

  1. The client starts his browser and navigates to the OWA address.
  2. The server responds that it requires NTLM authentication.
  3. The client sends an NTLM Type 1 message that includes its NTLM capabilities.
  4. The server responds with an NTLM Type 2 message that contains a generated challenge.
  5. The client extracts the message from the server and uses it’s password to generate a Type 3 message that is sent to the server.
  6. The server receives the Type 3 message, sends the Type 2 and Type 3 message to the DC.
  7. The DC receives the Type 2 and Type 3 message. Validates the messages towards its database and responds back to the server with whether the user has successfully authenticated or not.
  8. The server receives the responds from the DC and if it’s correctly authenticated it lets the client access the requested resource.

Seems a lot easier than Kerberos, right? Well, it has to do this for each request… Yes, each image, each JavaScript and each file… That’s not very efficient. There is NTLM authentication persistence that is meant to help with the above problem but it would work best with one CAS or only one Front End.

So after reading through the above we can quite safely say that we want to use Kerberos over NTLM where possible.

Load balancing
We’re getting closer now to the point of this post, the Lync Kerberos Account, and one of the reasons it’s considered best practice.
While a client dependant load balancing/fail over perhaps isn’t the most elegant solution, it makes it a lot easier to deploy. This works well for SIP traffic where the authentication is connection based. Meaning that when the connection is established and the user is authenticated the connection is kept open and marked as an authenticated connection. This way the client does not have to provide credentials with each request. There is one protocol which is not covered by DNS load balancing and still needs to be load balanced using a hardware load balancer, namely HTTP which is used for Lync Web Services. Lync Web Services is used quite a lot by the clients, which then risks falling into the NTLM trap mentioned above. Meaning each request has to be authenticated towards the DC. While this certainly works it’s not optimal at all. Why put more load on DC’s and network than necessary? Why use an inferior authentication protocol? There really is NO good reason.

How come it’s not deployed everywhere then? Because it works without deploying it and likely because Kerberos authentication is less well known and a little trickier to load balance. Hopefully it won’t be after reading this post.

While explaining Kerberos above I mentioned that it relies on SPNs to function properly. This is because tickets are encrypted with the key for the service that is known to the server hosting the service and the DC. In the following scenario we have lyncpool1.contoso.com that consists of three Front Ends (fe01-03.contoso.com). The overridden name lyncpool1-web.contoso.com DNS record points to the load balancer that is in front of all three Front Ends.

The client will connect to https://lyncpool1-web.contoso.com and from that name it will determine that it needs a service ticket for the HTTP/lyncpool1-web.contoso.com to which the DC will likely reply “no such service principal” because no such SPN exists. On some installations I’ve seen that the SPN does exist in one of the following configurations.

  1. Registered to each Front End Servers computer account. This breaks Kerberos. Each SPN must be unique.
  2. Registered to one Front End. This doesn’t break Kerberos but can’t be load balanced. When a client goes to the load balancer with a valid kerberos ticket and gets directed to a Front End that doesn’t have that SPN registered, it will fail. As you can see in the image below the client’s Kerberos ticket is issued to HTTP/lyncpool1-web.contoso.com. The request ends up at FE01 which does not hold that SPN.

A user accessing internal Web Services with a kerberos ticket containing an incorrect SPN.

So, what to do? Enter the Lync Kerberos Account!

The Lync Kerberos Account
The Lync Kerberos Account is a really smart idea that makes load balancing Kerberos for Lync Web Services a non-issue. Essentially it’s a computer account. When created it’s just that. Now comes the special part; when you assign it to a site the correct SPNs are registered to that account and all Front Ends in the site become aware of the account and can use it for authentication purposes. This means that each server in the pool knows the secret for that account that then can be used to decrypt Kerberos tickets. This also means that whether the client ends up at FE01, 02 or 03 it doesn’t matter because the ticket it sends is encrypted with the service key for HTTP/lyncpool1-web.contoso.com which all of them can decrypt and read because they have the secret key of the Lync Kerberos Account.

The same request with the Lync Kerberos Account deployed. Now all Front Ends can decrypt the ticket.

TL;DR Kerberos is an authentication protocol based on tickets. It’s better than NTLM. Always deploy the Lync Kerberos Account.

Now that we’re all on the same page (hopefully), how do we deploy this account? It’s quite simple.

First we need to create the account itself.
New-CsKerberosAccount -UserAccount "Contoso\KerbAuth" -ContainerDN "CN=Users,DC=contoso,DC=com"

Then we need to assign it to our site.
New-CsKerberosAccountAssignment -UserAccount "Contoso\KerbAuth" -Identity "site:Contoso"

Enable the topology
Enable-CsTopology

Set the password for the account. Do NOT skip this step.
Set-CsKerberosAccountPassword -UserAccount "contoso\KerbAuth"

And that’s it! The account is deployed. You can verify that the correct SPN’s has been added to the account by running.
setspn -L KerbAuth

Verify that you can see the overridden internal Web Services SPN in the list (HTTP/lyncpool1-web.contoso.com). Also make sure that the DNS A record for lyncpool1-web.contoso.com is pointing to VIP on your load balancer for internal Lync Web Services.

If you receive client pop ups after you have enabled the Lync Kerberos Account it means you have missed something in the configuration.

You can verify your Lync Kerberos Account configuration by running the following command from the Lync Server Management Shell.
Test-CsKerberosAccountAssignment -Identity "site:Contoso" -Report "c:\temp\KerberosReport.htm" -Verbose

I know this has been a long post, hopefully you found it interesting. Thank you for reading.

A special thanks to @grahamcropley.