OCS 2007 R2 Cumulative Update 6 and Stored Procedure Mismatches

Something not mentioned in the release notes of Cumulative Update (CU6) is that there is a dependency on running the new OCS2009-DBUpgrade.msi before any server updates. If you try to run the ServerUpdateInstaller.exe and apply the server updates without first running the database package you may see an error like this:

Event ID: 30968
Source: Live Communications User Services
Details: The component Live Communications User Services reported a critical error: code C3EE78F8 (Enterprise Edition Server successfully registered with the back-end, but a stored procedure version mismatch was detected. The service will not start until this problem is resolved. Cause: The database schema and the Enterprise Edition Server were updated by different installation packages. Resolution: Ensure both the Enterprise Edition Server and back-end were installed or modified by the same installation package. The service has to stop.

Obviously the error verbiage is a bit outdated with references to LCS, but the error is correct – there is a mismatch between the stored procedure versions which makes the Front-End service to fail to start.

To avoid the issue be sure to apply the latest OCS2009-DBUpgrade.msi package before updating any Front-End servers.

OCS 2007 R2 and .NET Framework 4.0

Executive summary here is don’t install .NET Framework 4.0, or at least not before you install your OCS 2007 R2 bits. If you build up a fully patched new Server 2008 R2 server the .NET 4.0 updates will be included and when you run an OCS install you’ll get:

Microsoft Office Communications Server 2007 R2, Microsoft Unified Communications Managed API 2.0 Core Redist 64-bit installation requires Microsoft .NET Framework version 3.5. Installation can not continue.

The solution here is to go in to Programs and Features, and then remove the Microsoft .NET Framework Extended and Microsoft .NET Client Profile packages. Restart the system and you should be good to continue with your installation.

Activating the OCS 2007 R2 voice Applications after installation

Today I had the fun of trying to figure out how to activate the voice applications on a 2007 R2 Front-End after the installation has already occurred. You know that checkbox screen during the install that asks if you want to activate the Conferencing Attendant, Conference Announcement Service, Outside Voice Control, and Response Group Service that everyone leaves checked by default? Well, it was unchecked during this install for some reason and now we needed it on for a dial-in conferencing pilot.

These services are always all installed by default, but just left in an un-activated state if you untick those checkboxes. To activate them you can use LCSCMD.exe following the documentation on Technet, and when you do you’ll see this error:

Failed to activate Microsoft.Rtc.Application.Caa on machine
Error: Unable to determine the location of the manifest file.
Description: The registry key Software\Microsoft\Real-Time Communications\Applications\Microsoft.Rtc.Application.Caa does not exist.

And then you’ll wisely open up Regedit and verify this key does exist and be further confused. After pondering it some more you’ll squint real hard and realize the key is slightly off because the folks who wrote the documentation forgot an “S” in the name of the applications.

Be sure to include the “S” in each application ID to make the activation process succeed. For example, use Microsoft.Rtc.Applications.Caa as the ApplicationID for the Conferencing Attendant.

Public Certificates for Exchange 2010 Federation

I think that one of the coolest features of Exchange 2010 is the seamless free/busy and calendar federation between organizations. In order to get federation provisioned there are a number of steps you need to take which you can find detailed on Technet.

The first step of this setup involves creating a Federation Trust to the Microsoft Federation Gateway (MFG), but in order to create this trust you need to use a public certificate issued by one of the following Certificate Authorities (the haphazard thumbprint formatting is Technet’s, not mine):

CA certificate friendly name Thumbprint
Comodo NA
Digicert Global Root CA ‎083B:E056:9042:46B1:A175:6AC9:5991:C74A
Digicert High Assurance EV Root CA ‎91 8d a5 e4 99 c1 5f 7c 62 75 b1 24 fe de 53 35 7c 34 bd 36
Entrust.net CA (2048) 801D 62D0 7B44 9D5C 5C03 5C98 EA61 FA44 3C2A 58FE
Entrust Secure Server CA 99A6 9BE6 1AFE 886B 4D2B 8200 7CB8 54FC 317E 1539
Go Daddy Secure Certification Authority ‎7c46 56c3 061f 7f4c 0d67 b319 a855 f60e bc11 fc44

I recently was involved an Exchange deployment that involved purchasing a SAN certificate from Comodo. One of the certificate authorities Comodo uses to issue SAN certs is the USERTrust Legacy Secure Server CA, which has its own certificate issued by the Entrust.net Secure Server Certification Authority. Bottom line is the certificate you get verifies up to the Entrust certificate you can see below which the Federation Gateway supports.

image

After trying to create the Federation Trust we were seeing the following error:

image

An error occurred while attempting to provision Exchange to the Partner STS. Detailed information “An error occurred accessing Windows Live. Detailed information “The request failed with HTTP status 403: Forbidden.”.”

Basically this is the MFG’s way of saying “I don’t trust this certificate.” It turns out the MFG is geared to only accept certificates issued directly from one of the certificate authorities listed above which is not something I saw in the documentation. So if the Entrust Secure Server Certification Authority had issued our webmail certificate it would have been accepted. But like in our case, if your certificate is issued from a 3rd party intermediate certificate authority it won’t be accepted even if it technically verifies up to a support rooted authority.

The good news is a call to PSS resulted in Microsoft making a change on the MFG to accept certificates issued by this particular intermediate CA going forward for everyone. So if ran into this error previously you should be able to try again with the same certificate and see the trust succeed. As of this writing I’ve requested them to also add support for the AAA Certificate Services intermediate CA Comodo also issues certificates from.

Exchange 2010 SSL Offloading

One of the deployments I’ve been working on recently involved using F5 BigIP hardware load balancers to do SSL offloading for a two-node Exchange 2010 design. To give some background here usually you would just pass through port 443 (I’m skipping over the RPC Client Access piece since it’s not relevant here) from your load balancer straight to the Exchange servers, letting the servers handle the SSL encryption like in this diagram:

image

The benefit of that approach is it’s simple and a very common deployment method. On the flip side, you can benefit from offloading SSL encryption to the BigIPs and gain some more advanced forms of load balancing. In this case the improved load balancing was the goal along with some internal policies forcing this approach. What happens with SSL offloading is the HTTPS traffic ends at the BigIPs which turn around and pass port 80 clear-text traffic back to the Exchange servers so they have a bit less CPU work to do. That strategy looks more like this:

image

The problem with this configuration is Exchange is really designed to operate with SSL in mind and you have to go out of your way to allow it to operate in clear-text. What you’ll need to configure on each CAS server is:

The issue I ran into is after following all of these steps Autodiscover was still not functional through the load balancing. I could enter https://<CAS Array FQDN>/Autodiscover/Autodiscover.xml into a browser and reach the XML file with no problem, but running the Autodiscover test within Outlook would return a 404 error. Every other service was working just fine:

image

This threw me for awhile and after a bit of searching I ran across KB 980048 where it’s noted that Autodiscover cannot be used on port 80 with an HTTP POST request, which is what Outlook uses. My attempts at accessing the XML directly succeeded because I was only trying to download the file. Supposedly this is going to be fixed in Service Pack 1.

While the KB provides no immediate solution what I found that works is to use the same methodology Technet recommends for the Exchange Web Services web.config file. Go into your /Autodiscover folder and edit the web.config to replace all instances of httpsTransport with httpTransport (a simple search and replace should work). Be sure to save a copy before you make modifications, restart your server after making the change and you should be able to offload SSL for Autodiscover successfully. Since as far as I know this is undocumented today you can try this at your own risk, but it appears to be working.

Your OCS A/V Authentication Certificate Subject Name Doesn’t Matter

A few months back I was involved in a discussion about what the subject name of an OCS Edge Server’s A/V authentication certificate should be. Some folks were saying to use the Edge server’s internal FQDN and others were saying to use the external, public FQDN you define for A/V. I was in the camp using the external name, but the odd thing was both sides said their approach worked. There is definitely some confusion about what name you should use and Microsoft has actually published directly conflicting information which further confuses the issue. Some testing I’ve recently done clears up why so many documents and people contradict each other – the subject name doesn’t matter. Really. You could put whatever you want in that subject name, assign it to A/V authentication and it will work flawlessly. The purpose of this certificate per the Technet documentation:

The private key of the A/V authentication certificate is used to generate authentication credentials.

Specifically, it’s not used for encryption or MTLS even if that’s not made clear anywhere. Let’s take a step back and clarify a few things for some background:

  • There are two services that run on the Edge server with "A/V" in the name. If you’re not familiar with the difference, Jeff Schertz’s More on OCS Edge Server Certificates article has a good explanation for some background on what the difference is between the A/V Authentication and A/V Edge services, but basically – the A/V Authentication service is internal facing and A/V Edge Service is external facing.
  • There is no certificate assigned to the A/V Edge service because encryption for external A/V traffic is provided by SRTP.
  • The certificate for A/V Authentication is only used by internal clients when trying to communicate with an external or federated client. This means you can (and should) use an internal certificate authority to issue this certificate. There is no benefit or need to use a public certificate for A/V authentication.

Let’s walk through a little example here as if I was trying to figure out what name to use for my A/V authentication certificate. I have the following environment:

  • Public Domain: confusedamused.com
  • Internal AD Domain: ptown.local
  • SIP Domain: confusedamused.com
  • Edge Server Internal FQDN: edge.ptown.local
  • A/V Edge Service FQDN: av.confusedamused.com

So with that information what should I use as the certificate name for the A/V authentication certificate? If you consult the Technet documentation topic Set up Certificates for A/V Authentication you’ll find this note (emphasis is mine):

The subject name should match the fully qualified domain name (FQDN) of the A/V Edge Service published by the external firewall, or the FQDN of the VIP used by the A/V Edge Service array on the external load balancer (that is, if the Edge Servers are load balanced).

So based on that blurb, my A/V authentication certificate subject name should be av.confusedamused.com. Fair enough.

I ran through the OCS 2007 R2 Edge Planning Tool for a sanity check. You can see the result below, but the tool follows the Technet documentation and uses the external FQDN I defined for the A/V Edge Service when it asked.

tool-av
tool-results

A group of MVPs and Microsoft employees published a document called Deploying Certificates in Office Communications Server 2007 which says the following about the A/V authentication certificate (emphasis is mine again):

Must be the FQDN of Audio/Video authentication server in DNS.

Well that calls out the name of the authentication server, not the A/V Edge Service. I think this comes down to really just poor wording in the document which contributes to confusion, but what is the name of our A/V Authentication server? It would be the same name as the internal Edge interface, right? The A/V Authentication server is the Edge server, not the external FQDN. So now we’re being told to use the internal FQDN, edge.ptown.local as the subject name.

Also released by Microsoft was a document called OCS 2007 R2 Walkthrough – Scale to Load Balanced Edge Server which completely contradicts Technet and the Edge Planning Tool (emphasis mine):

  • Access Edge Internal (Corporate Certificate). In our sample topology, the subject name would be set to ocsedge.contoso.com, the FQDN of the Edge Server internal interface.
  • A/V Authentication Internal (Corporate Certificate). In our sample topology, the subject name would be set to ocsedge.contoso.com, the FQDN of the Edge Server internal interface.

This seems to match up with the certificates document and is somewhat backed by the exact same Technet article I referenced earlier which says:

As a security precaution, you should not use the same certificate for A/V authentication that you use for the internal interface of the Edge Server.

This begs the question "Why would I ever even try to use the same certificate?" The only logical reason would be perhaps because they use the same subject name. That jives with the Scale to a Load Balanced Edge Server documentation. If we’re thinking about this in terms of MTLS connections, you would have to think that this makes the most sense. In your OCS Forest properties if you added an A/V Edge server with the name edge.ptown.local for port 5062, it’s reasonable that you’d expect the A/V Authentication service operating on port 5062 of the internal interface to offer a certificate matching this name. If it presented something wrong, say maybe the external FQDN of the A/V Edge service it should fail, right?

Well, the truth is the name doesn’t matter. There isn’t MTLS validation happening on port 5062 the same way you’d expect MTLS between servers on 5061. I think the reason the certificate requirement issue hasn’t been pointed out yet is because it’s never caused a problem – it works either way. I can use a certificate with a subject name gobblygook.confusedamused.com and media relay authentication through the Edge server works just fine. It just needs a certificate to generate authentication credentials for the media relay process. Go ahead and try it out – put whatever name you want on the certificate and it will still work.

So while the subject name doesn’t really matter, if you’re still interested in adhering to best practices I would recommend using the external facing, public A/V Edge name. In the example earlier this would be av.confusedamused.com. Hopefully Microsoft will update the certificate and scaling documents with a clarification and make them more consistent with the rest of Technet.

Broadcom NIC Teaming and Hyper-V on Server 2008 R2

The short of this is if you’re trying to use NIC teaming for the virtual adapter on Server 2008 R2 save yourself the headache, pony up a few extra dollars and buy Intel NICs.  The Broadcoms have a bug in the driver that prevents  this from working correctly on Server 2008 R2 Hyper-V when using a team for the Hyper-V virtual switch. Per the Broadcom driver release notes this is supposed to be a supported configured now, but it does not work correctly. There are two scenarios so far where I’ve been able to reproduce the problem:

  • VM guest has a static MAC assigned and is running on a VM host. Shut down the VM, assign it a dynamic MAC and start it again on the same host. You’ll find it has no network connectivity.

  • VM guest is running on VM Host A with a dynamic MAC. Live Migrate the VM guest to Host B. It has network connectivity at this point, but if you restart the VM on the opposite host you’ll find it receives a new MAC and no longer has network connectivity.

Take a look at this diagram (only showing NICs relevant to Hyper-V) and you’ll see what the setup is that causes the issue. We have 2 Broadcom NICs on Dell R710’s each connected to a different physical switch to protect against a port, NIC, or switch failure. They are teamed in an Active/Passive configuration. No load balancing or link aggregation going on here. The virtual adapter composed of the two team members is then passed through as a virtual switch to Hyper-V and it is not shared with the host operating system. The host itself has a team for its own management and for the Live Migration network, which I’ll point both work flawlessly – the issue here is purely related to Broadcom’s teaming through a Hyper-V virtual switch.

image

Say I have a VM running on Host A where the NIC team has a hypothetical MAC called MAC A. When it boots up, it receives a dynamic MAC address we’ll call MAC C from Host A’s pool. If you try to ping the VM guest’s IP 1.1.1.1 and then look at your ARP table you’ll see something like:

Internet Address Physical Address Type
1.1.1.1 MAC A Dynamic

This is because the NIC team is responsible for answering requests on behalf of the VM. When the NIC team receives traffic for the VM’s IP it will accept it, and then pass it along to the Hyper-V virtual switch. If you were to take a packet trace off the NIC you’ll see the team has modified the Layer 2 destination address to be MAC C, the dynamic MAC the VM got when it booted. This is how the teaming is supposed to work.

Now say I migrate the VM to Host B (where the NIC team has a MAC called MAC B) via Live or Quick migration. The VM retains connectivity and if you take a look at your MAC table you’ll now see something like:

Internet Address Physical Address Type
1.1.1.1 MAC B Dynamic

Yup, the MAC for Host B’s NIC team is now answering requests for the VM’s IP. Again, this is how the teaming is supposed to work. Everything is peachy and you might think your clustering is working out great, until you restart the VM.

image

When the VM restarts, upon booting it receives a new dynamic MAC from Host B’s pool and you’ll find it has no network connectivity. Your ARP table hasn’t changed (it shouldn’t, the same team is still responsible for the VM), but the guest has been effectively dropped. When I pulled out a packet trace what I noticed was the team was still receiving traffic for the VM’s IP, which ruled out a switching problem, but it was still modifying the packets and sending them to MAC C. When in fact, now the VM has restarted it has MAC D. The problem is that it seems somebody (the driver) forgot to notice the VM has a new MAC and is sending packets to the wrong destination, so the VM never receives any traffic.

image

I found that toggling the NIC team within the host actually fixes the problem. If you simply disable the virtual team adapter and then re-enable it the VM will instantly get its connectivity back so it seems that during the startup process the team reads the VM MACs it’s supposed to service. I would think this is something it should be doing constantly to prevent this exact issue, but for now it looks like it’s done only at initialization.

The most practical workaround I’ve found so far is to just set static MAC addresses on the VMs within the Hyper-V settings. If the VM’s MAC never changes, this problem simply doesn’t exist. So while that defeats the purpose of the dynamic MAC pool on a Hyper-V host it allows the teaming failover to operate properly while you restart VMs and move them between cluster nodes.

I’ve raised the issue with Dell/Broadcom and they agree it’s a driver problem. There is supposedly a driver update due mid-March, but no guarantees this will be addressed in that update. The next update isn’t slated until June which is a long time to wait, hence the recommendation to just use Intel NICs.

Other notes for the inquisitive:

  • Disabling the team and using only a single adapter makes this work properly.
  • Happens with or without all TOE, checksum and RSS features.
  • No VLAN tagging in use.
  • Issue persists when team members are plugged into the same switch.
  • Latest drivers from Dell/Broadcom (12/15/2009) as of this writing.
  • Happens whether teaming is configured before or after Hyper-V role is installed.

Number Display Formatting in MOC

Something I’ve been working on lately was a Microsoft case involving inconsistent formatting of numbers. It turns out that MOC actually displays numbers for users in your contact list differently the first time you sign in (i.e. No GalContacts.db exists yet) compared to subsequent sign-ins. This isn’t a normalization problem because the underlying Tel URI is always correct, but actually just a display issue in how the number is presented within MOC.

Apparently the first time you sign in because there is a slight delay in the ABS download (even if you force it immediately) MOC has nothing to go on for contact card information other than the presence XML. If you view the presence XML you’ll see at first it doesn’t actually carry the display format, just the Tel URI, tel:+12345678901 so MOC has to use its own logic to figure out how to display that number. The format it chooses is +1 (234) 567-8901 and there is no way to change that. Not by disabling normalization, using only built-in rules, or by using only company-specific rules – the result is always the same and that display format is hard coded into MOC.

After a lot of back and forth support gave me the ol’ “It’s by design” answer and ended it there. I was a little disappointed because I think MOC should be able to apply the rules immediately after receiving them, but it seems to take another sign-in to take effect. Let me show you what I mean:

Active Directory Fields:
ad

Company Phone Number Normalization file:
normalization

Address Book File Dump:
absdump

First sign-in uses MOC hard-coded logic:
1st

Subsequent sign-ins display the number as formatted in Active Directory:
2nd

Odd, right? Normally this wouldn’t be a problem, but the reason this popped up in the first place was because CUCIMOC was in use and it caches numbers you’ve called previously. So if a user signed in for the first time and dialed another user, CUCIMOC would show you calling with a +1. Even after signing out and back in, CUCIMOC would keep showing the +1 next time you dialed that user because it cached the original number you called, which had the +1. And now any new users you call would not have a +1, creating an ugly inconsistency. We were able to take care of the actual dialing with rules in Call Manager, but it’s just undesirable for end users to see this inconsistency.

Another gotcha I’ll point out is that MOC also tries to respect the access levels here. What I mean by tries is that as we know if a number is in AD it’s visible to everyone in the organization regardless of access level. Say you have a user’s work and mobile numbers in AD and try to view them from another user who’s assigned the company level in MOC, you’ll see MOC apply its own formatting to the work number only. Assign them to the team level and you’ll see MOC also format the mobile number. Bizarre.

Company Access Level on 1st Sign-In:
company

Team Access Level on 1st Sign-In:
team

There are two workarounds here since Microsoft refuses to acknowledge this behavior as a bug:

  • Format all numbers in AD to the format MOC is going to use on the 1st sign-in. That is, +1 (xxx) xxx-xxxx. Then create a normalization rule in your company file to make sure this gets processed to E.164 by the ABS.
  • When a user is signing in to a new PC for the first time (or you had them delete GalContacts.db), have them sign-in once, sign-out after the address book downloads, and finally sign-in again. MOC will now display the numbers from AD instead of formatting them itself. If a user goes to a new PC they need to repeat this process.

Neither of these are great solutions. The first is probably the best, but aren’t we defeating the entire purpose of normalization here? I should be able to put the numbers in any format I want in AD and normalize them with the ABS . Side note before anyone suggests it: This behavior still happens even if you put the AD fields in E.164 (+12345678901) format. For some organizations changing the formatting of the phone field isn’t an option especially if they have some kind of HR software responsible for syncing phone fields, or other applications dependent on the existing formats.

If you want to duplicate the issue yourself, there is a specific use case to make this happen. Most importantly, the user needs to actually be in your contact list.

  1. Enter your numbers in AD for User A and B.
  2. Ensure the numbers normalize by the ABS.
  3. Sign-in as User A.
  4. Add User B to your contact list.
  5. Sign-in as User B.
  6. Add User A to your contact list.
  7. Sign-out of both accounts.
  8. Delete GalContacts.db and GalContacts.db.idx from both accounts.
  9. Sign-in to User A.
  10. Sign-in to User B.
  11. View User A’s phone numbers from User B’s MOC. You’ll see the MOC internal formatting applied.
  12. Sign-out of User B. (Leave User A signed in)
  13. Sign-in to User B.
  14. View User A’s phone numbers from User B’s MOC. You’ll see the exact format you entered in AD, and for all sign-ins going forward.
  15. You can sign-out of User A, delete GalContacts.db again and sign-in to see the MOC formatting again.

Personally, I think the behavior is wrong and needs to be fixed, but Microsoft says otherwise.

Re-Design, Finally.

I know it’s still not 100% perfect and needs quite a bit of code cleanup, but I think I finally got the site to a point where I felt good pulling out this re-design I’ve been working on for a months. In between periods of zero free time, a move to San Francisco, and countless attempts at starting over I managed to put the content in a (hopefully) more usable format and took a stab at using HTML5.

The site looks more appealing in anything except IE (of course) with thanks to TypeKit for giving me a great way to use real fonts on the web. I’ve also added some Twitter, Flickr and Last.FM content here to give this a little more of a personal feel. Maybe one day I’ll even get a more recent photo of myself on here. I’d love to know your thoughts on the change.