Fixing VeriSign Certificates on Windows Servers

One item I’ve seen repeatedly cause issues in new Exchange or Lync environments centers around certificates from public providers such as VeriSign, Digicert, or Entrust. These providers generally use multiple tiers of certificates, so when you purchase a certificate it is generally issued by a subordinate, or issuing certificate authority instead of the root certificate authority. The way that SSL certificate chains work require an end client to only need to trust the top most, or root certificate in the chain, in order to accept the server certificate as valid. But in order to properly present the full SSL chain to a client a server must first have the correct trusted root and intermediate certificate authorities loaded. So the bottom line here is that if you haven’t loaded the full certificate chain on the server then you may see clients have trouble connecting.

This becomes especially problematic in the case of VeriSign’s latest chain. If you are using a modern Windows client such as Windows 7 or 2008 R2 you’ll see the VeriSign Class 3 Public Primary Certification Authority – G5 certificate which expires in 2036 with thumbprint ‎4e b6 d5 78 49 9b 1c cf 5f 58 1e ad 56 be 3d 9b 67 44 a5 e5 installed in the Trusted Root Certification Authorities by default. There is some extra confusion generated because there is also a VeriSign Class 3 Public Primary Certification Authority – G5 certificate which expires in 2021 with thumbprint ‎32 f3 08 82 62 2b 87 cf 88 56 c6 3d b8 73 df 08 53 b4 dd 27 installed in the Intermediate Certification Authorities by default. The names of these certificates are identical, but they are clearly different certificates expiring on different dates.

What you’ll find after purchasing a VeriSign certificate is that the CA which actually issues your server certificate, VeriSign Class 3 Secure Server CA – G3, is cross-signed by both of the G5 certificates. This means that there are now 2 different certificate chains you could present to clients, but what is actually presented depends on how you configure the server. The two chain options you can present are displayed below, and while one is a bit longer, both paths are valid.

image

So if a client trusts either of the G5 certificates as a trusted root, it will trust any certificate issued by a subordinate CA such as the G3. What ends up happening is that the certificate chain will look correct when a Windows 7 or 2008 R2 server connects to it, because those operating systems already have the 2036 G5 CA as a trusted root. You’ll see only 3 tier chain presented, and the connection will work just fine.

image

There’s nothing actually wrong with this if all you have are newer clients. In fact, that’s one advantage of cross-signing – that a client can leverage the shortest possible certificate chain. But any kind of downlevel client, such as Lync Phone Edition, does not trust that newer G5 CA by default. This means that when those devices try to connect to the site they are presented with the 2036 G5 certificate as the top-level root CA, and since they do not trust that root they will drop the connection. In order to support the lowest common denominator of devices the chain should actually contain 4 tiers, like in the following screenshot. Older devices typically have the VeriSign Class 3 Public Primary CA already installed as a trusted root, so you may get better compatibility this way.

image

The screenshots have been from the same certificate, but the difference is how the chain is presented. In order for a server to present the full chain you must log on to each server hosting the certificate and open the certificates MMC for the local computer. Locate the VeriSign Class 3 Public Primary Certification Authority – G5 certificate in the Trusted Root Certification Authority node, right-click, and open the Properties. Select Disable all purposes for this certificate and press OK to save your changes.

image

By disabling the incorrect trusted root certificate the server will now be presenting the full chain. The big ‘gotcha’ here is that you can’t easily test this. If you browse to the site from a Windows 7 client and open the Certification Path tab for the certificate it’s still going to look the same as before. The reason for this is that Windows 7 also has the VeriSign Class 3 Public Primary Certification Authority – G5 certificate in the Trusted Root Certification Authorities machine node by default. And because Windows 7 trusts that as a root CA, it will trust any certificate below that point. Certificate testing tools you find on the Internet also aren’t going to be much help here because they also already trust the 2036 G5 certificate. The only way you can verify the full chain is to delete or disable that cert from the client you’re testing on. And no, this is not something you should ever attempt on multiple machines – I’m suggesting this only for testing purposes. If you’re using any kind of SSL decryption at a load balancer to insert cookies for persistence you’ll want to make sure the load balancer admin has loaded the full chain as well.

So now you’ve fixed the chain completely, and after the next Windows Update cycle you’ll probably find the G5 certificate enabled again on the server. The root certificate updates for Windows will actually re-enable this certificate for you (how kind of them!), and result in a broken chain for older clients again. In order to prevent this from occurring you can disable automatic root certificate updates from installing via Windows Update. This can be controlled through a Group Policy setting displayed here:

image

Some notes on Lync and Exchange UM QoS

If you haven’t found it yet, the Enabling Quality of Service documentation on TechNet is a fantastic resource to get started on configuring QoS marking for Lync servers and clients. So when planning on enabling QoS in your environment you should start there, and I’d also recommend following Elan Shudnow’s posts for step-by-step screenshots of how to configure these policies on Lync servers. What I’d like to cover here is one scenario that I don’t see documented at this point – Exchange UM and Lync Edge QoS. When a remote user calls in to UM Subscriber Access or an Auto-Attendant via Lync the audio stream will not flow through the Front-End servers. Instead, it will be User <> Edge <> UM.  So if your QoS policies on the Edge don’t take UM into account you won’t have audio traffic on the Edge > UM leg of the call being tagged with a DSCP value.

To get started you can reference the Configure Quality of Service for Unified Messaging documentation. If you’ve only ever used policy-based QoS settings like Lync Server 2010 leverages then you may find the UM setup a little confusing. The key to getting UM to start marking packets is to enable the QoS feature via registry key. On each UM server you’ll want to create a new DWORD Called QoSEnabled inside HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\RTC\Transport and set the value to 1 (don’t worry if some of those sub-keys don’t exist yet – it’s safe to create them.) You can ignore the confusing TechNet note that says you should restart your Lync or OCS servers after this change. The registry key and restart applies to the Exchange UM server you just configured this registry key on – not your Lync servers.

After restarting the UM services you’ll find it will mark all outbound audio packets as SERVICETYPE_GUARANTEED. Windows defaults to applying a DSCP value of 40 for this type of traffic, but you may need to modify this to be something more standard in the networking (Cisco) world where audio is typically marked with DSCP 46. In order to do this you can either apply a Group Policy to the machines or edit the local Group Policy settings on each UM server. You can adjust this value within the Computer Configuration\Administrative Templates\Network\QoS Packet Scheduler\DSCP Value of Conforming Packets section of Group Policy.

SNAGHTML342349

Edit the Guaranteed Service Type value to match the DSCP value your network devices are expecting for audio:

SNAGHTML3472cf

At this point UM tagging of audio packets should be functional and you can (and should) verify this with a Wireshark or Netmon capture. What I’ve not seen called out is the fact that UM is just another client in the world of Lync with Edge servers and that it will be passing audio traffic through the Edge servers for remote users. UM will not respect the audio ports you limit Lync clients to, and it does not use the same range as Lync servers for audio. UM’s default port range is actually quite large since it uses UDP 1024-65535. If you’re tagging traffic from your Edge servers to Lync servers already you can simply re-use the same ports by configuring them in the msexchangeum.config file found within C:\Program Files\Microsoft\Exchange\v14\bin on each UM server.

If you’d prefer to not adjust the default port range you’ll want to be sure the UM servers are accounted for on each of your Lync Edge servers as a separate target in your QoS policy. In this example I’ve set up a separate policy towards each UM server and specified the dynamic range UM will be using as the destination port. This ensures any traffic leaving the internal-facing Edge NIC and heading towards Exchange UM will be marked with DSCP 46.

image

I also want to reiterate one point that Elan calls out since it’s not documented properly at this point – the TechNet docs suggest targeting the MediaRelaySvc.exe application in the QoS policy on the Edge servers. What you’ll find is that if you do specify an executable the packets leaving the internal-facing Edge interface will not be tagged at all. Your rule probably looks perfect and you can restart the server as many times as you’d like, but if you specify the executable you will find all packets leaving the server as DSCP 0. The workaround here is to either not specify the executable at all, or if you want to be more specific you can make sure the source IP in your QoS policy is the internal-facing NIC like I’ve done in the screenshot above.

Office 365 Migration with Cisco IronPort

I ran across an interesting issue recently where a client could not get Autodiscover to work properly during their “rich coexistence” period with an on-prem Exchange 2010 during their migration to Office 365. Autodiscover for an on-prem user would work fine, but as soon as the user had their mailbox moved to Office 365 the Autodiscover process wouldn’t work. The DNS records looked fine and when looking at the log we saw the client would connect to the internal SCP, get a redirect to Office 365 for the correct SMTP address, and then fail. We couldn’t set up a brand new profile for the user internally, but we noticed it would work perfectly ok from an Internet client. Must be something internal at that point, right?

After some more testing we learned a Cisco IronPort was being used for outbound web proxy filtering. As soon as we added an exception for the test machine’s IP address we found Autodiscover worked just fine for a cloud user. In the end we added an exception for the FQDNs .outlook.com and .online.lync.com. Secure web filtering keeping users safe and admins frustrated. Happy migrating.

ExtraTeam is hiring a Microsoft UC Engineer

Sorry for the off topic post, but my company is growing and looking to add another Microsoft UC Consultant to the team. Please reach out to jobs…at…extrateam.com if interested. The full job posting is below.

Microsoft Unified Communications Consultant @ ExtraTeam
We tend to assume success, and for good reason. We’ve built a bleeding edge technology organization from the ground up. Each and every day we receive validation on our immense value to the world in strategizing and deploying the best of the best technology solutions. Our Microsoft practice has more than doubled over the past year and we continue to expand at a breath-taking pace. You will be joining the top Microsoft consulting team in the Bay Area; our team consists of Microsoft Certified Masters, MVP’s, and published authors.

This is high-performanceville and we just can’t wait to have you here.

Standard description to an exceptional opportunity:

This is a fast moving job where you will be working on all the latest technology from Microsoft.

Typical projects you will be working on include designing, deploying and maintaining;

  • Exchange 2010 including Unified Messaging
  • Lync 2010 with full voice and video integration

Job Responsibilities:

  • Designing: Work closely with our customers to assess their needs and design appropriate solutions as well as being an evangelist for ExtraTeam.
  • Implementing: You will be part of a high level team responsible for meeting our customers’ implementation, configuration, installation and management needs.
  • Troubleshooting: Work closely with customers to resolve networking problems across a wide range of technologies.
  • Documenting: Ensure high quality technical documents are produced quickly and accurately.

Our customer base is a very diverse mix including many household names, defense contractors, retail giants, leading pharmaceuticals as well as local government and education.

Although technical expertise is key, your attitude and aptitude will be far more important. We’re looking for someone with a strong desire to learn from the best, as part of our tightly-knit team.

We are a long standing Microsoft Gold Partner as well as a Cisco Gold Partner.

What’s in it for you:

  • Strong base salary, quarterly bonus, benefits, 401K, and much more.
  • Stable, fun, and team-oriented work environment.
  • Opportunity to innovate with the latest tools at your disposal.
  • Opportunity to work remotely on select projects
  • Opportunity for growth. This is a full-time, permanent position. We’re thinking long term.

Requirements for you to meet your potential:

  • Microsoft MCITP certification in Exchange 2010 and/or Lync 2010
  • You will need to be able to handle multiple projects concurrently and drive them to completion (yes, we’re very busy)
  • Cisco certification would be desirable

Adding Speech Languages to an Existing Exchange UM Dial Plan

There have been a few instances lately where I’ve needed to add a speech language pack to an Exchange Unified Messaging server after a dial plan and auto-attendants have been created. Installing the language pack is no problem, but what you’ll find is that the new language is only available to new dial plans and any objects tied to them. You cannot simply install the pack and then select the language for an existing user or auto-attendant.

Here is an example case where I’ve installed the Portuguese language pack on a server:

But you can see the pack is not available for a dial plan created prior to the installation:

I’m sure the Microsoft answers are either to A – make sure you install the language packs up front, or B – create a new dial plan and auto-attendants, but in my case A was not possible and I had no interest in the effort involved for B.

So, ADSI Edit to the rescue. You can grab the language codes for installed packs from the UM server object properties at CN=<Server Name>, CN=Servers, CN=Exchange Administrative Group (FYDIBOHF23SPDLT), CN=Administrative Groups, CN=<Exchange Org Name>, CN=Microsoft Exchange, CN=Services, CN=Configuration, DC=<Forest>, DC=<TLD>. The msExchUMAvailableLanguages attribute will list the languages installed on the server (1033 is US English):

Now, armed with the language code for Portuguese (1046) you can modify the existing dial plan or auto-attendants objects in the UM AutoAttendant or UM DialPlan containers to support this language. The containers for these objects are found within CN=Exchange Administrative Group (FYDIBOHF23SPDLT), CN=Administrative Groups, CN=<Exchange Org Name>, CN=Microsoft Exchange, CN=Services, CN=Configuration, DC=<Forest>, DC=<TLD>. You can add the language as an option by modifying the msExchUMAvailableLanguages attribute to include the new language code. Here I have added it to an existing plan called Brazil:

You can now see this language appear as an option for the dial plan within the Exchange Management Console:

You can use this same method for an auto-attendant, but I would add the language first to the dial plan the auto-attendant is associated with. Obviously using ADSI Edit incorrectly has potential for causing some serious issues. Proceed at your own risk.

File Share Witness and Datacenter Failback

This afternoon we ran across an issue with a fairly new Exchange 2010 Database Availability Group comprised of 3 nodes all running SP1 with Update Rollup 3. The primary datacenter had 2 nodes with a local file share witness while the 3rd node and alternate file share witness were in a DR site. We also had recently performed a successfull datacenter failover and failback test that went swimmingly so everything was back up and running in the primary datacenter.

What we noticed today was that the cluster quorum and file share witness settings persisted as a node and file share majority after the failback instead of reverting to a node majority model like a 3-node DAG should be using. The only time Exchange should be using this model is when we have an even number of servers in the DAG. So without reproducing this again I can only see this as a timing issue – when one of the primary datacenter nodes gets added back to the DAG the quorum settings are flipped, but once the 3rd and final node joins again the quorum settings are not adjusted. This leaves us with a node and file share majority, and the FSW being our alternate FSW.

You can see here if you open the Cluster MMC our DAG is operating as a node and file share majority model even though all 3 nodes are online:

The fix for the issue is really easy – just run the Set-DatabaseAvailabilityGroup with no parameters. This process does not take the databases or cluster offline, but you’ll see the DAG detect it is using the wrong model for an odd number of nodes and adjust itself accordingly:

After the change you can verify in the cluster MMC that the quorum settings have been corrected to be a node majority:

I’m sure there’s a rational reason behind this behavior, but I haven’t quite nailed down why this happens quite yet. In the meantime it’s just one more step to add to your DR documentation!

Source IP Address Preference with Multiple IPs on a NIC

Something I’m finding myself doing more and more lately is using multiple IP addresses on a single NIC for a Windows server. The reasons vary, but it’s generally in order to support a single server running 2 different services on the same port. This can happen for Lync with your Edge servers (or for skirting the reverse proxy requirement on Front-Ends), or with Exchange when creating multiple receive connectors on a server.

A behavior that changed with the introduction of Server 2008 is that the source IP address on a NIC will always be the lowest numerical IP. So that whole idea of your primary IP being the first one you put on the NIC – throw that idea out the window.

For example, let’s say we build a new Exchange server and configure the NIC with IP 10.0.0.100. This IP is registered in DNS and the server uses this IP as the source when communicating with other servers. Our fantastic network administrator has also created a NAT rule on the firewall to map this IP to a particular public IP for outbound SMTP so that our PTR lookups match up.

But now we want to add another IP for a custom receive connector and the network admin hands you a free IP which happens to be 10.0.0.50. You add this as an additional IP on the NIC and voila – you have a couple issues:

  • You just registered two names for the same server in DNS if dynamic registration is enabled.
  • Your server is now sending all outbound traffic from 10.0.0.50! (because 50 is lower than 100)

One of these is easily solved – just turn off dynamic registration and manually create the DNS records for the server. The other one is a little trickier because Server 2008 and 2008 R2 will still be sending traffic as the 10.0.0.50 IP. In the case of Exchange, this could create some ugliness for outgoing SMTP because now your firewall is not NATing to the correct public IP and you start bouncing mail due to PTR lookup failures.

Fortunately, we have a way to tell Windows not to use the lower numbered IP as a source address by adding the IP via the netsh.exe command. For Server 2008 SP2 and 2008 R2 RTM we need to apply a hotfix first. 2008 R2 SP1 included this fix by default so it is no longer required. Without the hotfix or SP1 you’ll find netsh.exe does not display or recognize the special flag.

Hotfix Downloads:

The key to this is the IP address must be added via netsh.exe with a particular flag. So if you’ve already added the IP address via the GUI you’ll need to remove it first. After that, use this command to add the secondary IP:

netsh int ipv4 add address "Local Area Connection" 1.2.3.4/24 SkipAsSource=true

The SkipAsSource flag does two things – first, it instructs Windows not to use this IP as a source IP for outgoing traffic. And secondly, it prevents the registration of this IP in DNS if dynamic registration is enabled. Two birds with one stone!

You can always view the status of the IPs and their SkipAsSource status with the following command:

netsh int ipv4 show ipaddresses level=verbose

Lync Error: Sharing is not supported with this contact

This morning I tried doing a desktop share with one of my coworkers and received an error I hadn’t seen before:

Sharing is not supported with this contact.

It was odd because the user was Available and not using a mobile client as far as I could tell from the presence. Turns out, the user was signed in to OWA and using the Lync integration there to chat with me. It would be nice if in a future update (hint, hint) that an OWA user’s presence be published as “IM Only” to indicate it does not support the other modalities.

Lync Claims EWS Not Deployed

In the last few Lync deployments I’ve done I’ve run into two different instances where the Lync client was failing to login to Exchange Web Services to retrieve the conversation history and user voicemail. In both cases there wasn’t actually the red exclamation mark on those two tabs in the UI like you’d expect if there were an error; the client just hummed along like nothing was wrong. In each scenario if I viewed the configuration information you would see the client report “EWS Not Deployed”, which was odd because Exchange 2010 was most definitely deployed at both customer sites.

Sidenote: The EWS polling takes roughly 30 seconds to reach this state. If you view the configuration info immediately you’ll see “EWS OK”, which is only because Lync has tried yet. So be careful when testing this and thinking everything is just fine.

Solution 1: Verify the InternalURL and ExternalURL for the Web Services virtual directory are entered
The first fix was incredibly easy and after some more digging we determined this was only occurring when a client was external and logging in through an Edge server. When we looked at the Exchange Client Access Server we found this customer had not actually entered an ExternalURL parameter for the Web Services virtual directory. This works just fine for Outlook clients, but Lync is expecting this value to be filled out. If it’s not entered it assumes EWS is not deployed externally and doesn’t attempt a connection, which is a pretty reasonable action. You might argue the Outlook action is incorrect and it should treat it the same way. But anyway, the fix is to just fill out the ExternalURL and Lync will begin using that value to login to EWS successfully.

Sidenote 2: The information discovered by Lync via Autodiscover is cached in the registry at HKCU\Software\Microsoft\Communicator\<SIP URI>\Autodiscovery (Can you tell a Lync dev wrote the regkey name? Autodiscovery instead of Autodiscover?) You’ll see entries for the internal and external URLs for the Availability Service, Exchange Control Panel, Exchange Web Services, and Out of Office Assistant. I’ve been able to delete this entire registry key for quick testing and found it recreated with no issues.

Solution 2: Place https://<Your SMTP domain>/ in the Local Intranet Zone
The second instance of this issue was a little more complicated, and still doesn’t make much sense to me, but I figured I would share. In this case the customer did not have Outlook Anywhere published so we expected it to fail externally, but this error was actually occurring internally. After verifying the InternalURL was filled out correctly we started doing some traces and noticed the Lync client would make a GET request to the /Autodiscover/Autodiscover.xml file on Exchange, Exchange would return a 401 Unauthorized challenging for credentials like we expected, and then the trace died. There were be no more responses from the Lync client IP address sent to Exchange in the logs. We verified this on multiple machines and operating systems and concluded that the Lync client would never respond to the credential request! For what it’s worth, Autodiscover was working fine for Outlook clients and no special configuration had been done to Exchange.

So we put a call into PSS and they told me Lync will not read the SCP for Autodiscover in AD, even if the Lync client is internal and that it will do its own Autodiscover lookup (Can anyone confirm/deny this?). Therefore, it will fall back to https://domain.com/Autodiscover/Autodiscover.xml, and if that fails it should move on to https://autodiscover.domain.com/Autodiscover/Autodiscover.xml like an Outlook client. This is where it got weird – PSS told me from the ETL trace Lync was not falling back to the 2nd option, yet I could clearly see it make a request to IIS and not respond. From what they saw the Lync client was getting stuck on the 1st option which didn’t really exist. In any event, they had me add http://<domain.com>/ to the Local Intranet Zone on the client. Even though we knew this was not the location of Autodiscover and I really didn’t think it would make a difference it did solve the problem. After adding entry this we saw clients then try to resolve autodiscover.domain.com and grab the Autodiscover.xml file correctly from https://autodiscover.domain.com/Autodiscover/Autodiscover.xml. At this point the EWS status in the configuration information returned to EWS OK.

Sidenote 3: There is a thread on the Technet forums about this issue which suggests editing your applicationhost.config file on the Exchange server. I have to recommend against this and as you can see in the comments it hasn’t really fixed the problem for anyone. The solution is more likely one of the ones presented here.

Lync Web App and TMG Hangs

Something I’ve been noticing in a few Lync (and Exchange) deployments where Forefront TMG is involved is a significant delay in loading websites through the reverse proxy. I generally use Chrome as my primary browser and noticed sites would be extremely slow to load or appear entirely unresponsive when published through TMG. This was happening to both Lync Web App and Outlook Web App scenarios so I figured the issue just had to be with TMG. After some digging it turns out the problem is Chrome because when browsing to these sites in either Internet Explorer or Firefox the pages load just fine.

The issue here is a new feature in Chrome called SSL False Start which is supposed to speed up your SSL connections. Unfortunately, the end result against sites published by TMG is they don’t ever load unless the user manually refreshes the page a 2nd time. Keep in mind this applies to any SSL website published by TMG and accessed by a user with Chrome, not just Lync Web App or Outlook Web App.

There is also an issue open on Google Code about this problem, http://code.google.com/p/chromium/issues/detail?id=67617, but there is no server-side fix. At this time the only solution is to modify the Google Chrome shortcut to disable the SSL False Start feature. Just modify your shortcut to be “chrome.exe -disable-ssl-false-start” and all is well.