One change I’ve noticed with the Lync 2013 CU3 (October 2013) Edge server update is how it validates trusted FQDNs found in the topology. Specifically, a FQDN can now either be configured a hosting provider or a federated partner’s Access Edge address, but not both.
Under normal circumstances this shouldn’t pose an issue, but the Lync Control Panel will not prevent an administrator from performing this configuration. And until CU3, the services didn’t seem to care either way.
As an example, the proper way to allow Lync Mobile push notifications for Windows Phone clients (and Lync 2010 iOS clients), has always been to define sipfed.online.lync.com as a hosting provider like in this screenshot.
The partner push.lync.com should then be defined as a SIP Federated Domain with no Access Edge service specified. But, if an administrator did accidentally specify sipfed.online.lync.com as the Access Edge service FQDN, like in the screenshot here, it had no negative impact on the configuration.
Until CU3 this didn’t present a problem. But after applying CU3 on an Edge server you’ll find the Access Edge service will no longer start. Event 14517 will be logged after it fails:
The server configuration validation mechanism detected some serious problems.
The server at FQDN [sipfed.online.lync.com] is configured as both type ‘allowed partner server’ and type ‘IM service provider’ .
The solution here is to make sure any hosting providers, such as Lync Online, are not also defined as the Access Edge address for a SIP federated domain. This issue isn’t unique to push.lync.com – it applies to any Office 365 tenants specified as a SIP federated domain.
Make sure you validate your hosting provider and federated domain configurations prior to deploying CU3 on an Edge server.
Not sure on the root cause, but I ran in to an instance where the LYSS.exe process was consuming 80-90% CPU on a Lync 2013 Front End server, and consequently causing issues with conference joins and other Lync functions. This process name is new to Lync 2013 and represents the Lync Storage Service. LYSS runs within an additional SQL Express instance on each Front End server, and is responsible for providing the magic pixie dust that allows Lync to now leverage either SQL or Exchange 2013 Web Services for contacts and archiving data. LYSS provides a layer of abstraction for the internal Lync components to deliver content to, and then sorts out how to deliver it to the appropriate end-state data store. It’s essentially a glorified queuing service which replaced the need for MSMQ, so it temporarily stores the data, and then delivers to the appropriate destination (either SQL or Exchange.)
Back to the intent of the post, LYSS doesn’t run as its own service so you cannot simply restart it via the Services MMC. If you do run in to this problem, you can kill the lyss.exe process. The service will restart itself automatically, and you’ll hopefully see the CPU usage drop.
My most recent adventure involved a scenario where files uploaded or attached to a Lync meeting couldn’t be saved by many of the meeting participants. You could press the Save As or Open buttons, but the progress indicator would just sit at 0% and never move. Some users in each meeting could actually download the content, but the behavior did not follow specific users or meetings. Permissions in the meeting were set to allow anyone to download, so it didn’t appear to be an issue specific to the meeting settings. The Lync QoE report submitted by each client gave me a rather unhelpful message:
A resource was unable to be downloaded via HTTP
Time to do some tracing. Firing up Fiddler allowed me to see the client make a download attempt to the external web services site, but I was getting a 404 Not Found response from the server. Sure enough, I was able to find the same hit from my client in the IIS logs on the Front End server.
Odd. IIS was unable to locate the file the Lync client was asking for. IIS serves up these files from the Lync back-end file share, which pointed me in the direction of the DFS share. This was configured to replicate between two file servers, but nothing looked obviously wrong after opening the share folder.
After some more thought I opened the individual shares on each server. Bingo. They each had different, unique content, only one of which matched what I saw in the actual shared Lync namespace! The Front Ends connections were being distributed across the different DFS members and the content was not being replicated on the back-end, which created scenarios where a file would appear to be missing.
Digging in to the DFS logs allowed me to see the servers had stopped replicating almost 90 days earlier. DFS in 2008 R2 and later uses a feature that prevents replication from restarting if an automatic recovery occurred after a network or power outage. So a brief issue a few months back actually caused DFS to stop replication, and it never resolved itself.
You can switch the automatic recovery back to pre-Server 2008 R2 behavior with this command (from an elevated command prompt) on each DFS node:
wmic /namespace:\\root\microsoftdfs path dfsrmachineconfig set StopReplicationOnAutoRecovery=FALSE
Even still, that change does not magically fix DFS. If a node has been unable to replicate for more than 60 days DFS will also prevent the replication from occurring due to a MaxOfflineTimeInDays parameter. You can view that default value with this command:
wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig get MaxOfflineTimeInDays
And if you need to adjust that value you can use this command on each node:
wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig set MaxOfflineTimeInDays=120
This actually still did not resolve my problem. In the interest of time I was able to use the following steps to get replication flowing again:
Disable the secondary node in the Replication Group through the DFS Management Console.
On the secondary node, remove the existing file share to close all connections to it.
Copy all of the content from the secondary node to a temporary location.
Remove all content from the (now unshared) folder on the secondary node.
Share the empty folder on the secondary node.
Re-enable the secondary node in the DFS Management Console.
Copy the temporary saved data to the primary node’s file share, skipping any duplicate files.
Issue the following command on both nodes to poll Active Directory for an update:
You should see Event Log entries logged that indicate replication is starting and that an initial sync has completed. After validating replication was working I was able to successfully download newly uploaded meeting content from any connected client.