Avoiding Lync 2013 Certificate Prompts

Lync 2013 and the introduction of the lyncdiscover client bootstrapping process has introduced a new world of hurt for many Lync deployments, especially those that contain multiple SIP domains. I won't go in to the guts of why or how this happens - Richard Brynteson and Jeff Schertz do a great job explaining this aspect. The gist of the issue is your users will see a certificate warning in either of these two scenarios:

  • The lyncdiscover.<Your SIP Domain Here> query returns a web services FQDN that doesn't end with <Your SIP Domain Here>.
  • The lyncdiscover query returns a web services FQDN that ends with <Your SIP Domain Here> and succeeds, but then returns a SIP pool FQDN that doesn't end with <Your SIP Domain Here>.

So, how to work around this? You have a few options.

Keep all FQDNs within the SIP Domain Namespace

A typical deployment will put the pool FQDN within the namespace of the Active Directory domain, but you can actually name the pool anything you'd like when adding it to Topology Builder. You avoid the prompt by creating a pool FQDN and web services FQDN that are both within the primary SIP domain namespace. So, to provide an example configuration:

  • Active Directory Domain: ptown.local
  • Primary SIP Domain: confusedamused.com
  • Front End Pool FQDN: pool.confusedamused.com
  • Internal Web Services FQDN: webservicesinternal.confusedamused.com

This option only works if you're deploying an Enterprise Edition pool, and only resolves the warnings for your primary SIP domain. Other SIP domains serviced by this pool will still get the certificate warning. While it's technically possible to modify the internal web services FQDN on a Standard Edition pool you can't change the pool FQDN, so there is no point.

Configure the Trust Model Data

For those running Standard Edition pools or Enterprise Edition pools with multiple SIP domains, you'll need to configure the TrustModelData registry key outlined in KB 2531068.

In Lync 2013 the key no longer exists by default, and the location of the registry settings have shifted slightly from the KB article. They can now be found in HKLM\Software\Microsoft\Office\15.0\Lync.

One other change to watch out for is that the client doesn't actually read a TrustModelData key in the HKLM hive (true at least on Windows 8.) It will only read the HKCU version. In order to work around this issue you can deploy the key to the HKCU hive for all users via GPO preferences. The value specified should be a comma separated list of domains for the pool or web services FQDNs, which will typically be your Active Directory namespace.

Capture1

Fixing VeriSign Certificates on Windows Servers

One item I’ve seen repeatedly cause issues in new Exchange or Lync environments centers around certificates from public providers such as VeriSign, Digicert, or Entrust. These providers generally use multiple tiers of certificates, so when you purchase a certificate it is generally issued by a subordinate, or issuing certificate authority instead of the root certificate authority. The way that SSL certificate chains work require an end client to only need to trust the top most, or root certificate in the chain, in order to accept the server certificate as valid. But in order to properly present the full SSL chain to a client a server must first have the correct trusted root and intermediate certificate authorities loaded. So the bottom line here is that if you haven’t loaded the full certificate chain on the server then you may see clients have trouble connecting.

This becomes especially problematic in the case of VeriSign’s latest chain. If you are using a modern Windows client such as Windows 7 or 2008 R2 you’ll see the VeriSign Class 3 Public Primary Certification Authority – G5 certificate which expires in 2036 with thumbprint ‎4e b6 d5 78 49 9b 1c cf 5f 58 1e ad 56 be 3d 9b 67 44 a5 e5 installed in the Trusted Root Certification Authorities by default. There is some extra confusion generated because there is also a VeriSign Class 3 Public Primary Certification Authority – G5 certificate which expires in 2021 with thumbprint ‎32 f3 08 82 62 2b 87 cf 88 56 c6 3d b8 73 df 08 53 b4 dd 27 installed in the Intermediate Certification Authorities by default. The names of these certificates are identical, but they are clearly different certificates expiring on different dates.

What you’ll find after purchasing a VeriSign certificate is that the CA which actually issues your server certificate, VeriSign Class 3 Secure Server CA – G3, is cross-signed by both of the G5 certificates. This means that there are now 2 different certificate chains you could present to clients, but what is actually presented depends on how you configure the server. The two chain options you can present are displayed below, and while one is a bit longer, both paths are valid.

image

So if a client trusts either of the G5 certificates as a trusted root, it will trust any certificate issued by a subordinate CA such as the G3. What ends up happening is that the certificate chain will look correct when a Windows 7 or 2008 R2 server connects to it, because those operating systems already have the 2036 G5 CA as a trusted root. You’ll see only 3 tier chain presented, and the connection will work just fine.

image

There’s nothing actually wrong with this if all you have are newer clients. In fact, that’s one advantage of cross-signing – that a client can leverage the shortest possible certificate chain. But any kind of downlevel client, such as Lync Phone Edition, does not trust that newer G5 CA by default. This means that when those devices try to connect to the site they are presented with the 2036 G5 certificate as the top-level root CA, and since they do not trust that root they will drop the connection. In order to support the lowest common denominator of devices the chain should actually contain 4 tiers, like in the following screenshot. Older devices typically have the VeriSign Class 3 Public Primary CA already installed as a trusted root, so you may get better compatibility this way.

image

The screenshots have been from the same certificate, but the difference is how the chain is presented. In order for a server to present the full chain you must log on to each server hosting the certificate and open the certificates MMC for the local computer. Locate the VeriSign Class 3 Public Primary Certification Authority – G5 certificate in the Trusted Root Certification Authority node, right-click, and open the Properties. Select Disable all purposes for this certificate and press OK to save your changes.

image

By disabling the incorrect trusted root certificate the server will now be presenting the full chain. The big ‘gotcha’ here is that you can’t easily test this. If you browse to the site from a Windows 7 client and open the Certification Path tab for the certificate it’s still going to look the same as before. The reason for this is that Windows 7 also has the VeriSign Class 3 Public Primary Certification Authority – G5 certificate in the Trusted Root Certification Authorities machine node by default. And because Windows 7 trusts that as a root CA, it will trust any certificate below that point. Certificate testing tools you find on the Internet also aren’t going to be much help here because they also already trust the 2036 G5 certificate. The only way you can verify the full chain is to delete or disable that cert from the client you’re testing on. And no, this is not something you should ever attempt on multiple machines – I’m suggesting this only for testing purposes. If you’re using any kind of SSL decryption at a load balancer to insert cookies for persistence you’ll want to make sure the load balancer admin has loaded the full chain as well.

So now you’ve fixed the chain completely, and after the next Windows Update cycle you’ll probably find the G5 certificate enabled again on the server. The root certificate updates for Windows will actually re-enable this certificate for you (how kind of them!), and result in a broken chain for older clients again. In order to prevent this from occurring you can disable automatic root certificate updates from installing via Windows Update. This can be controlled through a Group Policy setting displayed here:

image

Your OCS A/V Authentication Certificate Subject Name Doesn't Matter

A few months back I was involved in a discussion about what the subject name of an OCS Edge Server's A/V authentication certificate should be. Some folks were saying to use the Edge server's internal FQDN and others were saying to use the external, public FQDN you define for A/V. I was in the camp using the external name, but the odd thing was both sides said their approach worked. There is definitely some confusion about what name you should use and Microsoft has actually published directly conflicting information which further confuses the issue. Some testing I've recently done clears up why so many documents and people contradict each other - the subject name doesn't matter. Really. You could put whatever you want in that subject name, assign it to A/V authentication and it will work flawlessly. The purpose of this certificate per the Technet documentation:

The private key of the A/V authentication certificate is used to generate authentication credentials.

Specifically, it's not used for encryption or MTLS even if that's not made clear anywhere. Let's take a step back and clarify a few things for some background:

  • There are two services that run on the Edge server with "A/V" in the name. If you’re not familiar with the difference, Jeff Schertz’s More on OCS Edge Server Certificates article has a good explanation for some background on what the difference is between the A/V Authentication and A/V Edge services, but basically - the A/V Authentication service is internal facing and A/V Edge Service is external facing.
  • There is no certificate assigned to the A/V Edge service because encryption for external A/V traffic is provided by SRTP.
  • The certificate for A/V Authentication is only used by internal clients when trying to communicate with an external or federated client. This means you can (and should) use an internal certificate authority to issue this certificate. There is no benefit or need to use a public certificate for A/V authentication.

Let's walk through a little example here as if I was trying to figure out what name to use for my A/V authentication certificate. I have the following environment:

  • Public Domain: confusedamused.com
  • Internal AD Domain: ptown.local
  • SIP Domain: confusedamused.com
  • Edge Server Internal FQDN: edge.ptown.local
  • A/V Edge Service FQDN: av.confusedamused.com

So with that information what should I use as the certificate name for the A/V authentication certificate? If you consult the Technet documentation topic Set up Certificates for A/V Authentication you’ll find this note (emphasis is mine):

The subject name should match the fully qualified domain name (FQDN) of the A/V Edge Service published by the external firewall, or the FQDN of the VIP used by the A/V Edge Service array on the external load balancer (that is, if the Edge Servers are load balanced).

So based on that blurb, my A/V authentication certificate subject name should be av.confusedamused.com. Fair enough.

I ran through the OCS 2007 R2 Edge Planning Tool for a sanity check. You can see the result below, but the tool follows the Technet documentation and uses the external FQDN I defined for the A/V Edge Service when it asked.

tool-av
tool-results

A group of MVPs and Microsoft employees published a document called Deploying Certificates in Office Communications Server 2007 which says the following about the A/V authentication certificate (emphasis is mine again):

Must be the FQDN of Audio/Video authentication server in DNS.

Well that calls out the name of the authentication server, not the A/V Edge Service. I think this comes down to really just poor wording in the document which contributes to confusion, but what is the name of our A/V Authentication server? It would be the same name as the internal Edge interface, right? The A/V Authentication server is the Edge server, not the external FQDN. So now we're being told to use the internal FQDN, edge.ptown.local as the subject name.

Also released by Microsoft was a document called OCS 2007 R2 Walkthrough - Scale to Load Balanced Edge Server which completely contradicts Technet and the Edge Planning Tool (emphasis mine):

  • Access Edge Internal (Corporate Certificate). In our sample topology, the subject name would be set to ocsedge.contoso.com, the FQDN of the Edge Server internal interface.
  • A/V Authentication Internal (Corporate Certificate). In our sample topology, the subject name would be set to ocsedge.contoso.com, the FQDN of the Edge Server internal interface.

This seems to match up with the certificates document and is somewhat backed by the exact same Technet article I referenced earlier which says:

As a security precaution, you should not use the same certificate for A/V authentication that you use for the internal interface of the Edge Server.

This begs the question "Why would I ever even try to use the same certificate?" The only logical reason would be perhaps because they use the same subject name. That jives with the Scale to a Load Balanced Edge Server documentation. If we're thinking about this in terms of MTLS connections, you would have to think that this makes the most sense. In your OCS Forest properties if you added an A/V Edge server with the name edge.ptown.local for port 5062, it's reasonable that you'd expect the A/V Authentication service operating on port 5062 of the internal interface to offer a certificate matching this name. If it presented something wrong, say maybe the external FQDN of the A/V Edge service it should fail, right?

Well, the truth is the name doesn't matter. There isn't MTLS validation happening on port 5062 the same way you'd expect MTLS between servers on 5061. I think the reason the certificate requirement issue hasn't been pointed out yet is because it's never caused a problem - it works either way. I can use a certificate with a subject name gobblygook.confusedamused.com and media relay authentication through the Edge server works just fine. It just needs a certificate to generate authentication credentials for the media relay process. Go ahead and try it out - put whatever name you want on the certificate and it will still work.

So while the subject name doesn't really matter, if you're still interested in adhering to best practices I would recommend using the external facing, public A/V Edge name. In the example earlier this would be av.confusedamused.com. Hopefully Microsoft will update the certificate and scaling documents with a clarification and make them more consistent with the rest of Technet.