Fixing VeriSign Certificates on Windows Servers

One item I’ve seen repeatedly cause issues in new Exchange or Lync environments centers around certificates from public providers such as VeriSign, Digicert, or Entrust. These providers generally use multiple tiers of certificates, so when you purchase a certificate it is generally issued by a subordinate, or issuing certificate authority instead of the root certificate authority. The way that SSL certificate chains work require an end client to only need to trust the top most, or root certificate in the chain, in order to accept the server certificate as valid. But in order to properly present the full SSL chain to a client a server must first have the correct trusted root and intermediate certificate authorities loaded. So the bottom line here is that if you haven’t loaded the full certificate chain on the server then you may see clients have trouble connecting.

This becomes especially problematic in the case of VeriSign’s latest chain. If you are using a modern Windows client such as Windows 7 or 2008 R2 you’ll see the VeriSign Class 3 Public Primary Certification Authority – G5 certificate which expires in 2036 with thumbprint ‎4e b6 d5 78 49 9b 1c cf 5f 58 1e ad 56 be 3d 9b 67 44 a5 e5 installed in the Trusted Root Certification Authorities by default. There is some extra confusion generated because there is also a VeriSign Class 3 Public Primary Certification Authority – G5 certificate which expires in 2021 with thumbprint ‎32 f3 08 82 62 2b 87 cf 88 56 c6 3d b8 73 df 08 53 b4 dd 27 installed in the Intermediate Certification Authorities by default. The names of these certificates are identical, but they are clearly different certificates expiring on different dates.

What you’ll find after purchasing a VeriSign certificate is that the CA which actually issues your server certificate, VeriSign Class 3 Secure Server CA – G3, is cross-signed by both of the G5 certificates. This means that there are now 2 different certificate chains you could present to clients, but what is actually presented depends on how you configure the server. The two chain options you can present are displayed below, and while one is a bit longer, both paths are valid.

image

So if a client trusts either of the G5 certificates as a trusted root, it will trust any certificate issued by a subordinate CA such as the G3. What ends up happening is that the certificate chain will look correct when a Windows 7 or 2008 R2 server connects to it, because those operating systems already have the 2036 G5 CA as a trusted root. You’ll see only 3 tier chain presented, and the connection will work just fine.

image

There’s nothing actually wrong with this if all you have are newer clients. In fact, that’s one advantage of cross-signing – that a client can leverage the shortest possible certificate chain. But any kind of downlevel client, such as Lync Phone Edition, does not trust that newer G5 CA by default. This means that when those devices try to connect to the site they are presented with the 2036 G5 certificate as the top-level root CA, and since they do not trust that root they will drop the connection. In order to support the lowest common denominator of devices the chain should actually contain 4 tiers, like in the following screenshot. Older devices typically have the VeriSign Class 3 Public Primary CA already installed as a trusted root, so you may get better compatibility this way.

image

The screenshots have been from the same certificate, but the difference is how the chain is presented. In order for a server to present the full chain you must log on to each server hosting the certificate and open the certificates MMC for the local computer. Locate the VeriSign Class 3 Public Primary Certification Authority – G5 certificate in the Trusted Root Certification Authority node, right-click, and open the Properties. Select Disable all purposes for this certificate and press OK to save your changes.

image

By disabling the incorrect trusted root certificate the server will now be presenting the full chain. The big ‘gotcha’ here is that you can’t easily test this. If you browse to the site from a Windows 7 client and open the Certification Path tab for the certificate it’s still going to look the same as before. The reason for this is that Windows 7 also has the VeriSign Class 3 Public Primary Certification Authority – G5 certificate in the Trusted Root Certification Authorities machine node by default. And because Windows 7 trusts that as a root CA, it will trust any certificate below that point. Certificate testing tools you find on the Internet also aren’t going to be much help here because they also already trust the 2036 G5 certificate. The only way you can verify the full chain is to delete or disable that cert from the client you’re testing on. And no, this is not something you should ever attempt on multiple machines – I’m suggesting this only for testing purposes. If you’re using any kind of SSL decryption at a load balancer to insert cookies for persistence you’ll want to make sure the load balancer admin has loaded the full chain as well.

So now you’ve fixed the chain completely, and after the next Windows Update cycle you’ll probably find the G5 certificate enabled again on the server. The root certificate updates for Windows will actually re-enable this certificate for you (how kind of them!), and result in a broken chain for older clients again. In order to prevent this from occurring you can disable automatic root certificate updates from installing via Windows Update. This can be controlled through a Group Policy setting displayed here:

image

Some notes on Lync and Exchange UM QoS

If you haven’t found it yet, the Enabling Quality of Service documentation on TechNet is a fantastic resource to get started on configuring QoS marking for Lync servers and clients. So when planning on enabling QoS in your environment you should start there, and I’d also recommend following Elan Shudnow’s posts for step-by-step screenshots of how to configure these policies on Lync servers. What I’d like to cover here is one scenario that I don’t see documented at this point – Exchange UM and Lync Edge QoS. When a remote user calls in to UM Subscriber Access or an Auto-Attendant via Lync the audio stream will not flow through the Front-End servers. Instead, it will be User <> Edge <> UM.  So if your QoS policies on the Edge don’t take UM into account you won’t have audio traffic on the Edge > UM leg of the call being tagged with a DSCP value.

To get started you can reference the Configure Quality of Service for Unified Messaging documentation. If you’ve only ever used policy-based QoS settings like Lync Server 2010 leverages then you may find the UM setup a little confusing. The key to getting UM to start marking packets is to enable the QoS feature via registry key. On each UM server you’ll want to create a new DWORD Called QoSEnabled inside HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\RTC\Transport and set the value to 1 (don’t worry if some of those sub-keys don’t exist yet – it’s safe to create them.) You can ignore the confusing TechNet note that says you should restart your Lync or OCS servers after this change. The registry key and restart applies to the Exchange UM server you just configured this registry key on – not your Lync servers.

After restarting the UM services you’ll find it will mark all outbound audio packets as SERVICETYPE_GUARANTEED. Windows defaults to applying a DSCP value of 40 for this type of traffic, but you may need to modify this to be something more standard in the networking (Cisco) world where audio is typically marked with DSCP 46. In order to do this you can either apply a Group Policy to the machines or edit the local Group Policy settings on each UM server. You can adjust this value within the Computer Configuration\Administrative Templates\Network\QoS Packet Scheduler\DSCP Value of Conforming Packets section of Group Policy.

SNAGHTML342349

Edit the Guaranteed Service Type value to match the DSCP value your network devices are expecting for audio:

SNAGHTML3472cf

At this point UM tagging of audio packets should be functional and you can (and should) verify this with a Wireshark or Netmon capture. What I’ve not seen called out is the fact that UM is just another client in the world of Lync with Edge servers and that it will be passing audio traffic through the Edge servers for remote users. UM will not respect the audio ports you limit Lync clients to, and it does not use the same range as Lync servers for audio. UM’s default port range is actually quite large since it uses UDP 1024-65535. If you’re tagging traffic from your Edge servers to Lync servers already you can simply re-use the same ports by configuring them in the msexchangeum.config file found within C:\Program Files\Microsoft\Exchange\v14\bin on each UM server.

If you’d prefer to not adjust the default port range you’ll want to be sure the UM servers are accounted for on each of your Lync Edge servers as a separate target in your QoS policy. In this example I’ve set up a separate policy towards each UM server and specified the dynamic range UM will be using as the destination port. This ensures any traffic leaving the internal-facing Edge NIC and heading towards Exchange UM will be marked with DSCP 46.

image

I also want to reiterate one point that Elan calls out since it’s not documented properly at this point – the TechNet docs suggest targeting the MediaRelaySvc.exe application in the QoS policy on the Edge servers. What you’ll find is that if you do specify an executable the packets leaving the internal-facing Edge interface will not be tagged at all. Your rule probably looks perfect and you can restart the server as many times as you’d like, but if you specify the executable you will find all packets leaving the server as DSCP 0. The workaround here is to either not specify the executable at all, or if you want to be more specific you can make sure the source IP in your QoS policy is the internal-facing NIC like I’ve done in the screenshot above.

Office 365 Migration with Cisco IronPort

I ran across an interesting issue recently where a client could not get Autodiscover to work properly during their “rich coexistence” period with an on-prem Exchange 2010 during their migration to Office 365. Autodiscover for an on-prem user would work fine, but as soon as the user had their mailbox moved to Office 365 the Autodiscover process wouldn’t work. The DNS records looked fine and when looking at the log we saw the client would connect to the internal SCP, get a redirect to Office 365 for the correct SMTP address, and then fail. We couldn’t set up a brand new profile for the user internally, but we noticed it would work perfectly ok from an Internet client. Must be something internal at that point, right?

After some more testing we learned a Cisco IronPort was being used for outbound web proxy filtering. As soon as we added an exception for the test machine's IP address we found Autodiscover worked just fine for a cloud user. In the end we added an exception for the FQDNs .outlook.com and .online.lync.com. Secure web filtering keeping users safe and admins frustrated. Happy migrating.