Troubleshooting Offline Site Servers
Last modified on 15 August 2020 01:57 PM
“HELP! I’m a PaperCut system administrator and our Site Server is unavailable! What could be wrong?”
What symptoms are you experiencing?
There are several symptoms which can occur as a result of an unexpectedly offline Site Server - we’ll address the 3 main one’s in this article, which are:
This is usually the least troublesome issue to fix, and can often be rectified after making some quick checks on some of the basics:
Statuses such as the one above indicate that although the Site Server is running successfully on its host server, the Application Server is unable to verify the Site Server’s current status - perhaps due to network connectivity or database issues.
A few quick things to check before we get started with more complex troubleshooting:
(Also, be aware that PaperCut is constantly being improved, so check-out Known Issues pages for details on fixes relating to Site Servers in recent versions.)
Testing connectivity between the Site Server and PaperCut
Network connectivity between the Site Server and the Application Server would be the next thing to check.
We want to make sure that the Site Server is configured with the correct details of your Application Server, and that the Application Server is contactable from the Site Server. The following test will let you verify that the Site Server is configured correctly, and that the PaperCut server is reachable over the network.
You can do that by:
1. On the Site Server browse to the ‘<app path>\server’ folder
Then using those server details open a browser and try navigating to the following URLs (replacing ‘PaperCut Server Hostname’ with the details of your PaperCut Application Server) to make sure the PaperCut Application server is accessible, the ports are open, and the traffic isn’t being blocked or rerouted by a proxy.
Have a look at the network latency (speed) during your tests, because very slow network speeds between the Site Server and the Application Server may make it impossible for the 2 servers to successfully connect with each other.
If you believe that network speed might be causing your issue, there are some Config Keys in PaperCut that may help. Any changes to these would ideally be made by a certified PaperCut technician!
system.site.keepalive-interval-secs - default is 3
system.site.register-interval-secs - default is 30
Hints - the ‘register’ value must be greater than the ‘keepalive’ value. Changing the values of these keys needs to be done with care, as it will change the frequency with which the Application Server will check for issues on your Site Servers.
Have you made any changes in your environment?
Any changes in your environment can have knock-on effects on other things. If you have made any changes in your environment around the time that the Site Server started experiencing issues, try undoing those changes if possible to see if it resolves the problem.
Some specific activities that might cause issues for your Site Server are:
How’s your database?
If the network all looks ok (and it’s very important to confirm that, as it’s by far the most common cause of site server issues), and you haven’t made any changes to your environment at all, then you may have a Site Server database issue. Fortunately the Site Server maintains a cache of the database from the Application Server, and this can be recreated if the Site Server can connect with the Application Server.
This is done using the db-tools functions. Follow the steps outlined below (ideally in-conjunction with your PaperCut support provider).
WARNING: Depending on the size of the site this can take a while and would only recommend doing this if the site is already down (or out-of-hours). This completely resets the internal Site Server Database. Re-syncing all the data again from the PaperCut Application server.
The database will now re-sync all data from the Application Server
Out of ideas?
Intermittent issues by their nature usually indicate a situation where when an environment is in its ‘ideal’ state, everything works fine - so we need to find out what could be causing your environment to intermittently be effected.
Where Site Servers are concerned, two dependencies are often the cause of intermittent Site Server connectivity issues:
Speed issues between the Site Server and the Application server are known to sometimes cause the following error:
“Initial setup fails with a 503 Service Unavailable” error:
Although this displays as an error, this is actually PaperCut doing its job - the speed on the network between the servers is inadequate to a point where the Application Server is unable to successfully issue its ‘keep-alive’ prompts to the Site Server, therefore it takes it offline as it’s unresponsive.
If you have a Site Server going offline intermittently, you will see the following records in the logs. You can see that the Site Server keeps going into offline mode (called SLAVE_OFFLINE in the logs) and then switching back online almost immediately (called SLAVE_PROXY in the logs). This is likely because of a slow or ‘flapping’ network connection to the main PaperCut Application server:
2020–01–01 13:51:27,934 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_PROXY to SLAVE_OFFLINE [server-state-monitor]
2020–01–01 13:53:32,037 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_OFFLINE to SLAVE_PROXY [server-state-monitor]
2020–01–01 13:57:16,488 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_PROXY to SLAVE_OFFLINE [server-state-monitor]
2020–01–01 13:59:12,544 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_OFFLINE to SLAVE_PROXY [server-state-monitor]
Identifying an issue with network latency
In an ideal scenario, the Site Server will quickly detect when a PaperCut Application server is unavailable and will quickly shift into “offline” mode. This works well when the Site Server has a stable connection to the main PaperCut Application server, but what happens if this connection is unreliable or has a high latency? In this scenario, the Site Server might go into offline mode before it should. If your sites have an inconsistent network connection then the Site Server may be stuck in offline mode for most of the day.
The Site Server log below indicates that the Application Server response took 4391 ms for a HTTPS request made by Site Server - this indicates possible network speed issues:
2020–01–01 10:04:21,508 DEBUG ProxyBase:199 - Response took 4391ms. POST papercut-server:9192/rpc/api/rest/master/setDeviceServer/404b395f-476a-43cd-86d7-bb8b3e19384f returned a response status of 200 OK (POST - /rpc/api/rest/internal/site/setDeviceServer) [https-102]
Compare that with the Application Server itself for the same request, it just took 1248 ms:
2020–01–01 10:04:21,486 DEBUG Jetty:936 - <<< 10.7.1.93 HTTP/1.1 POST /rpc/api/rest/master/setDeviceServer/404b395f-476a-43cd-86d7-bb8b3e19384f => Status: 200, Content-Type: application/json, Took 1248ms [https-101]
The above logs indicate that 3143 ms was lost during the network transfer.
So what can we do about this? Some customers with similar issues have reported some success with increasing the time-out period before the Site Server switches to Offline Mode.
To make this change, follow these steps to change this setting on BOTH the Application Server and Site Server.
Application Server Steps
Site Server Steps
However, unless the underlying slow or ‘flapping’ network connection can be addressed, it’s likely that end-users will see other issues like slow client pop-ups, long times to log in to copiers, or a delay before receiving scanned files.
Site Server unable to resolve its own hostname:
Every 30 seconds the Site Server updates its registration with the Application Server.
Before making this registration call the Site Server checks its hostname on the network. If this check fails, the Application Server believes that the Site Server is offline.
Site Server hostname check - log examples:
Every now and then there might be some packet drops.
2020–01–01 11:57:42,842 DEBUG ServerStateManagerImpl:360 - Master server keep-alive failed on attempt 1 of 2. Took: 2002ms: java.net.SocketTimeoutException: connect timed out [server-state-monitor]
2020–01–02 12:00:19,988 DEBUG ServerStateManagerImpl:360 - Master server keep-alive failed on attempt 1 of 2. Took: 2001ms: java.net.SocketTimeoutException: connect timed out [server-state-monitor]
If this is your situation, this would point potentially to an underlying DNS resolution issue in your environment that needs to be addressed.
A quick fix if you’re facing this problem could be to edit the Windows Host File on the server, so that it doesn’t have to poll the DNS Server across the network for this information. Although, it may be better to investigate more generally the success of DNS resolution in your environment - especially because using the quick fix solution to manually edit the Windows Hosts file will mean that if the IP address(es) of the PaperCut Server(s) is changed in the future, you will need to remember to manually edit the Windows Hosts file again to reflect the new IP address.
Out of ideas?