Data MCU Unavailable in Lync

While attempting a Lync conference with one of my customers we noticed the options for whiteboarding and PowerPoint were unavailable in the user interface. These items are served up by the Data MCU, and are not displayed if that MCU cannot be contacted for some reason. We took a look at the Lync server logs and found events like this one continuously appearing:

A Create Conference request sent to an Mcu was rejected. It will be retried but if this error continues to occur conferencing functionality will be affected.

Mcu: https://<Server FQDN>:444/liveserver/datamcu/ Conference: sip:<SIP URI>;gruu;opaque=app:conf:focus:id:FPGCC8M9 Error: otherFailure
Cause: Overloaded or incorrectly functioning MCU.
Resolution: Ensure that the Mcu is functioning correctly.

Everything else in the environment seemed ok, but I also noticed some errors around address book photo permissions going on so I took a peek at the back-end file share for web components. It appeared the file share NTFS permissions had somehow been reset to where the RTC groups did not have the necessary read and write access, so these errors were showing up. The fix was really easy: open Topology Builder, download the existing topology, and re-publish. Publishing the topology will verify the file share permissions and reset them if needed.

Monitoring OCS and Lync Peak Call Capacity

Recently I had a customer interested in checking how many concurrent calls a particular OCS Mediation Server was handling. The challenge with this is that separate perfmon counters exist for inbound calls and for outbound calls, but there is not a built-in counter which measures both. So while we could monitor the peak capacity of each we had no guarantee that these peak values were occurring at the same time.

In order to track this usage I’ve come up with a Powershell script which grabs these two counters, parses their values, adds them together, and dumps the output into a CSV file. At the end of the monitoring period you can take this CSV into Excel and easily find the peak total call count.

Here are some notes on the behavior:

  • The CSV output is date and time, inbound calls, outbound calls, and total calls.
  • Data is output to the console and to CSV for real-time monitoring.
  • The default values track usage for a week, polling the counters every 15 seconds. You can change the total number of loops in the script to your liking if you need a longer track record.
  • If you run the script again it will detect if previous data exists and rename the old file so you don’t lose anything.
  • I’ve run this as a logged in user account, but I imagine you could set it up as a scheduled task to run in the background.
  • In order to run the script you should first run Set-ExecutionPolicy Unrestricted

The caveat with the Lync version is now that a Mediation server can use multiple gateways we can’t see which gateway is being used for each inbound or outbound call. But this still gives an idea of concurrent call capacity flowing through each Mediation role.

I hope to improve this in the future, but wanted to make it available for everyone sooner than later.

File Share Witness and Datacenter Failback

This afternoon we ran across an issue with a fairly new Exchange 2010 Database Availability Group comprised of 3 nodes all running SP1 with Update Rollup 3. The primary datacenter had 2 nodes with a local file share witness while the 3rd node and alternate file share witness were in a DR site. We also had recently performed a successfull datacenter failover and failback test that went swimmingly so everything was back up and running in the primary datacenter.

What we noticed today was that the cluster quorum and file share witness settings persisted as a node and file share majority after the failback instead of reverting to a node majority model like a 3-node DAG should be using. The only time Exchange should be using this model is when we have an even number of servers in the DAG. So without reproducing this again I can only see this as a timing issue – when one of the primary datacenter nodes gets added back to the DAG the quorum settings are flipped, but once the 3rd and final node joins again the quorum settings are not adjusted. This leaves us with a node and file share majority, and the FSW being our alternate FSW.

You can see here if you open the Cluster MMC our DAG is operating as a node and file share majority model even though all 3 nodes are online:

The fix for the issue is really easy – just run the Set-DatabaseAvailabilityGroup with no parameters. This process does not take the databases or cluster offline, but you’ll see the DAG detect it is using the wrong model for an odd number of nodes and adjust itself accordingly:

After the change you can verify in the cluster MMC that the quorum settings have been corrected to be a node majority:

I’m sure there’s a rational reason behind this behavior, but I haven’t quite nailed down why this happens quite yet. In the meantime it’s just one more step to add to your DR documentation!

Source IP Address Preference with Multiple IPs on a NIC

Something I’m finding myself doing more and more lately is using multiple IP addresses on a single NIC for a Windows server. The reasons vary, but it’s generally in order to support a single server running 2 different services on the same port. This can happen for Lync with your Edge servers (or for skirting the reverse proxy requirement on Front-Ends), or with Exchange when creating multiple receive connectors on a server.

A behavior that changed with the introduction of Server 2008 is that the source IP address on a NIC will always be the lowest numerical IP. So that whole idea of your primary IP being the first one you put on the NIC – throw that idea out the window.

For example, let’s say we build a new Exchange server and configure the NIC with IP 10.0.0.100. This IP is registered in DNS and the server uses this IP as the source when communicating with other servers. Our fantastic network administrator has also created a NAT rule on the firewall to map this IP to a particular public IP for outbound SMTP so that our PTR lookups match up.

But now we want to add another IP for a custom receive connector and the network admin hands you a free IP which happens to be 10.0.0.50. You add this as an additional IP on the NIC and voila – you have a couple issues:

  • You just registered two names for the same server in DNS if dynamic registration is enabled.
  • Your server is now sending all outbound traffic from 10.0.0.50! (because 50 is lower than 100)

One of these is easily solved – just turn off dynamic registration and manually create the DNS records for the server. The other one is a little trickier because Server 2008 and 2008 R2 will still be sending traffic as the 10.0.0.50 IP. In the case of Exchange, this could create some ugliness for outgoing SMTP because now your firewall is not NATing to the correct public IP and you start bouncing mail due to PTR lookup failures.

Fortunately, we have a way to tell Windows not to use the lower numbered IP as a source address by adding the IP via the netsh.exe command. For Server 2008 SP2 and 2008 R2 RTM we need to apply a hotfix first. 2008 R2 SP1 included this fix by default so it is no longer required. Without the hotfix or SP1 you’ll find netsh.exe does not display or recognize the special flag.

Hotfix Downloads:

The key to this is the IP address must be added via netsh.exe with a particular flag. So if you’ve already added the IP address via the GUI you’ll need to remove it first. After that, use this command to add the secondary IP:

netsh int ipv4 add address "Local Area Connection" 1.2.3.4/24 SkipAsSource=true

The SkipAsSource flag does two things – first, it instructs Windows not to use this IP as a source IP for outgoing traffic. And secondly, it prevents the registration of this IP in DNS if dynamic registration is enabled. Two birds with one stone!

You can always view the status of the IPs and their SkipAsSource status with the following command:

netsh int ipv4 show ipaddresses level=verbose

OCS Create Pool Wizard Error: Invalid database parameter

Recently I had a project where we were moving the OCS databases to a new clustered SQL 2008 (R1) with SP2 Back-End and ran into a lovely new error I’d never seen before – also not seen before anywhere on Google!

For starters, we followed the steps outlined on Technet. After we had successfully detached and attached all databases and ran the LCSCMD.exe step, we launched the Create Pool wizard and attempted to plug in the info for the new SQL cluster. We got this error back:

An error occurred during the pool backend detection:

Pool backend discovery failed.

Invalid database parameter.

I double-checked the server name, instance, and FQDN and all looked well. We verified the SQL server was accessible via TCP 1433 and no firewall rules were preventing access, so the error didn’t make a lot of sense. Obviously there was some kind of parameter that the wizard GUI was not cool with. I thought maybe this was the SQL allow updates issue, but that solution had no effect on this error. There was definitely some validation check the UI was failing on against our new DB.

Since I couldn’t locate anyone else with this issue I figured my options were to call PSS and extend this process by a few hours, or pull out the ol’ LCSCMD.exe again and try this operation via command line. The Create Pool wizard really is just collecting a bunch of information and then using it to execute the LCSCMD.exe commands in the background so while doing it manually is not fun, it works just as well.

The entire syntax for LCSCMD.exe can be found on Techet, but here is the command we ended up running. Please note, conferencing archiving was not implemented so that paramter is not present.

LCSCMD.exe /Forest /Action:CreatePool /PoolName:MyOCSPool /PoolBE:MySQLServer.ptown.local\OCSInstance /PoolFQDN:MyOCSPool.ptown.local /InternalWebFQDN:MyOCSPool.ptown.local /ExternalWebFQDN:PublicOCSWebComponents.confusedamused.com /RefDomain:ptown.local /ABOutputlocation:\\\\MyFileServer\AddressBook /MeetingContentPath:\\\\MyFileServer\MeetingContent /MeetingMetaPath:\\\\MyFileServer\MeetingMetadata /AppDataLocation:\\\\MyFileServer\AppData /ClientUpdateLocation:\\\\MyFilerServer\ClientUpdates /DBDataPath:"D:\Databases" /DBLogPath:"L:\Logs" /DynDataPath:"D:\Databases" /DynLogPath:"L:\Logs" /ABSDataPath:"D:\Databases" /ABSLogPath:"L:\Logs" /ACDDataPath:"D:\Databases" /ACDLogPath:"L:\Logs"

After running the command manually it succeeded with absolutely no issues. The new cluster has been running for over a week now without any issues so I think this is an problem specific to the UI. I’m not sure exactly what causes it, but our environment was running SQL 2008 with SP2 on top of a 2008 R2 SP1 operating system.

As a sidenote, this process seems to undo any changes made by the OCS2009-DBUpgrade.msi patches. You’ll need to re-run the patch version which lines up with your FE patch levels before the FE services will be able to start.

OCS Create Pool Step Failure Drops Conference Directories

Something to keep in mind before you ever move an OCS database is that you’ll want to grab backups of the user data and conference directories so that you can restore the data just in case anything goes wrong with your move operation. The conference directory objects map conference IDs and passcodes used by PSTN dial-in users to a specific Live Meeting instance. These objects are stored in Active Directory and not in the OCS back-end database like you might expect, but you can still back up all the data these objects hold.

You can export all user data and conference directories with the following command:

dbimpexp.exe /hrxmlfile:everything.xml /sqlserver:SQL.ptown.local\OCS

I usually also grab a separate backup of just the conference directories from the pool:

dbimpexp.exe /hrxmlfile:confdirs.xml /sqlserver:SQL.ptown.local\OCS /restype:confdir

After these run successfully you can copy these files off to a safe place and then proceed with your database operations.

As you are moving the databases around one of the steps on Technet will have you re-run the Create Pool wizard, but if this step fails for any reason the installer will kick into its rollback mode and remove any configuration changes it made. What’s not terribly apparent is that part of this rollback process removes all conference directories on the pool without any warning.

So say this step fails on something silly like a file share permission you’ll suddenly find you dropped all your conference directories. The end result of that is users calling in to meetings via PSTN will no longer be able to enter conference ID and passcode to join the meetings hosted on that pool.

I recently ran a DB move and the user account we used did not explicityly have Full Access rights to one of the OCS file shares (it had been removed at some point for an unknown reason), but the result was the Create Pool operation kicked into rollback mode and removed the pool’s conference directories. We had a solid backup of these to restore from, but this customer had previously lost the directory the first time they tried this operation on their own because of the same problem.

If the directories are dropped and you don’t have a backup via DBImpExp.exe you’ll need to recreate the conference directories on the admin side, but the big pain point is that all users will need to reschedule their meetings (because the previous ID/Passcode mappings are no longer valid). It’s a really ugly user experience and likely to not go over very well. If only you had backed these up in advance!

I would imagine you could restore the conference directory object stored in AD and possibly get that hooked back up to OCS, but your best bet is really to be using DbImpExp.exe instead. A general best practice for any OCS environment is to be taking regular backups of your OCS data and conference directories via DbImpExp.exe so that way you’ll never find yourself in this situation.

If your Create Pool step does fail at least one time you’ll need to restore the directories because they’ve been dumped. After you work out the Create Pool step issue and succeed in starting up your Front-End services you can proceed with the conference diretory restore.

The syntax to restore just the conference directories from the pool is:

dbimpexp.exe /import /hrxmlfile:confdirs.xml /sqlserver:SQL.ptown.local\OCS /restype:confdir

After running this you should be able to dial in via PSTN and enter a conference ID and passcode from a pre-existing meeting again.