Troubleshooting Issues

Troubleshoot RNS and database sync issues

The Registry Name Service and Database Synchronization are complex and there are many factors that can affect the smooth running of the service. Every host, including the Primary Registry Name Server, has a Registry Name Service Cache, which they use to look up all Service Group information that are applicable to them.

All of the servers that provide these services are implemented using the REST services (clients may or may not have REST services installed, but this should not be relevant to the overall service), and as such, one of the first checks is to make sure the relevant REST services are running. Use an appropriate ps (man1) command to show the process information:

# ps -ef | egrep "pblight|pbconfig"
root 124824 1 0 12:12 pts/1 00:00:00 /usr/lib/beyondtrust/pb/rest/sbin/pblighttpd-svc -d -i #mon (wait)
root 124825 124824 0 12:12 pts/1 00:00:01 /usr/lib/beyondtrust/pb/rest/sbin/pblighttpd-svc -d -i #sched (sleep)
pblight 124826 124824 0 12:12 pts/1 00:00:00 pblighttpd -D -m /usr/lib/beyondtrust/pb/rest/lib -f /usr/lib/beyondtrust/pb/rest/etc/pblighttpd.conf
root 124827 124826 0 12:12 pts/1 00:00:00 /usr/lib/beyondtrust/pb/rest/sbin/pbconfigd
root 124828 124826 0 12:12 pts/1 00:00:00 /usr/lib/beyondtrust/pb/rest/sbin/pbconfigd

The number of pbconfigd processes running differs according to the host role within the enterprise.

If these services are not running, check the corresponding system logs and pblighttpd/REST logs for errors, and use the platform specific method of restarting the services.

Next, check that the host has the correct Service Group information.

Find out which Service Groups the host is a member of on the Primary Registry Name Server:

# pbdbutil -P --svc -L -L pbulprimrns
    {
        "hostid": 1,
        "cn": "pbulprimrns",
        "uuid": "3d13a9eb-7340-4199-aa47-1570941bd50f",
        "fqdn": "pbulprimrns.org.com",
        "addrs": [
            {
            "family": 4,
            "addr": "192.168.1.1",
            "port": 24351
            }
            ],
        "tnlzone": 0,
        "updated_usec": "2016-11-11 12:03:47",
        "deleted": false,
        "svcs": [
            {
            "svcgid": 1,
            "hostid": 1,
            "role": "primary",
            "sorder": 1,
            "created_usec": "2016-11-11 12:03:47",
            "updated_usec": "2016-11-11 12:03:47",
            "svcgname": "registry_name_service",
            "svc": "registry",
            "deleted": 0
            },
            {
            "svcgid": 4,
            "hostid": 1,
            "role": "secondary",
            "sorder": 2,
            "created_usec": "2016-11-11 12:04:35",
            "updated_usec": "2016-11-11 12:03:47",
            "svcgname": "dflt_sudopolicy_service",
            "svc": "sudopolicy",
            "deleted": 0
            }
            ]
}

Check the IP addresses and list of service groups

Then check on the host to make sure the Registry Name Service Cache shows corresponding information:

# pbdbutil -P --scache -l
{
    "svcgname": "registry_name_service",
    "svc": "registry",
    "sorder": 1,
    "cn": "pbulprimrns",
    "uuid": "3d13a9eb-7340-4199-aa47-1570941bd50f",
    "fqdn": "pbulprimrns.org.com",
    "addrs": [
        {
        "family": 4,
        "addr": "192.168.1.1",
        "port": 24351
        }
        ],
    "role": "primary",
    "lastupdated_usec": "2016-11-11 12:03:47"
}
{
    "svcgname": "dflt_sudopolicy_service",
    "svc": "sudopolicy",
    "sorder": 1,
    "cn": "pbtest",
    "uuid": "12345676789",
    "fqdn": "pbtest",
    "addrs": [
        {
        "family": 4,
        "addr": "192.168.1.5",    
        "port": 24351
        }
        ],
    "role": "primary",
    "lastupdated_usec": "2016-11-11 12:04:23"
}
{
            "svcgname": "dflt_sudopolicy_service",
            "svc": "sudopolicy",
            "sorder": 2,
            "cn": "pbulprimrns",
            "uuid": "3d13a9eb-7340-4199-aa47-1570941bd50f",
            "fqdn": "pbulprimrns.org.com",
            "addrs": [
            {
            "family": 4,
            "addr": "192.168.1.1",
            "port": 24351
            }
            ],
    "role": "secondary",
"lastupdated_usec": "2016-11-11 12:04:35"
}

If this information differs, and the lastupdated times are significantly different, refresh the cache:

# pbdbutil --scache -R

If this produces an error, then re-initialize the Registry Name Service Cache on the host using:

# pbdbutil --scache -N '{"hostname" : "<primaryRNS>, "appid": "<appid>": "appkey" : "<appkeys>"}' --force

This clears the existing Registry Name Service Cache database and reloads it from the Primary Registry Name Server.

If databases are not correctly synchronizing across servers within the Service Group, firstly identify the primary server within that Service Group:

# pbdbutil -P --svc -g '{ "primary" : "registry_name_service" }'
{            
    "svcgid": 1,
    "svcgname": "registry_name_service",
    "svc": "registry",
    "updated_usec": "2016-11-11 12:03:47",
    "deleted": false,
    "hostid": 1,
    "role": "primary",
    "sorder": 1,
    "created_usec": "2016-11-11 12:03:47",
    "cn": "pbulprimrns",
    "uuid": "3d13a9eb-7340-4199-aa47-1570941bd50f",
    "fqdn": "pbulprimrns.org.com",
    "addrs": [
        {
        "family": 4,
        "addr": "192.168.1.1",
        "port": 24351
        }
        ],
"tnlzone": 0
}

Next, on that host list the contents of the Database Synchronization Summary Database:

pbdbutil --dbsync -l
{"cn":"pbtest","svc":"registry","dbid":1,"dbname":"/etc/pb.db","dbuuid":"9b332c26-b8bb-4546-9ed2-bf93146dd08c","lastupdated":"2016-11-11 15:10:44","lasttid":0}
            {"cn":"pbtest","svc":"registry","dbid":2,"dbname":"/opt/pbul/dbs/pbrstkeys.db","dbuuid":"a22e8a75-fc6c-4f0e-aee9-b0d764c7e820","lastupdated":"2016-11-11 15:10:44","lasttid":0}
{"cn":"pbtest","svc":"registry","dbid":257,"dbname":"/opt/pbul/dbs/pbsvc.db","dbuuid":"97423b4f-2c5e-42c2-a87a-aeeaaa826c5b","lastupdated":"2016-11-11 15:12:14","lasttid":40}
    {"cn":"pbtest","svc":"registry","dbid":258,"dbname":"/opt/pbul/dbs/pbregclnt.db","dbuuid":"f08b8679-8bcb-4f7d-b431-1bbab688e0c1","lastupdated":"2016-11-11 15:12:14","lasttid":0}

And then for each individual database check for outstanding transactions:

# pbdbutil --dbsync -l /opt/pbul/dbs/pbsvc.db
{"path":"/opt/pbul/dbs/pbsvc.db","sz":0}

If necessary, reset the Summary Database information and force a resynchronization:

# pbdbutil --dbsync -R registry

Finally, check that the Scheduling Service is running, and has recently run. This service provides the regular Registry Name Service Cache and Database Synchronization updates:

# pbdbutil --info --sched
{"id":"svccache_update","grp":"system","epoch":"2016-11-11 14:36:57","reoccurs":110,"retry":0,"backoff":0,"retried":0}
{"id":"database_sync","grp":"system","epoch":"2016-11-11 14:35:37","reoccurs":30,"retry":0,"backoff":0,"retried":0}

Message router troubleshooting

The Message Router provides a method of consuming and storing large amounts of different types of messages from various Endpoint Privilege Management services. For example, pbmasterd and pblogd services employ it to write event data to the event log and to other integrated products such as BeyondInsight and Solr. This increases the performance when systems are loaded, while reducing load on the system. However, this also increases the complexity of the overall system. This means that while the service is running smoothly, no maintenance is required. However, if there is a problem with the service, other vital services may be affected.

The pbadmin command is used to monitor normal statistics for the Message Router.

Example

This example output is typical, and shows the Message Router is running and is receiving requests.

pbadmin -P --info --msgs
{
&nbsp;&nbsp;"pid": 8486,                    /* Process ID of the Message Router */
&nbsp;&nbsp;"num_svcs": 2,                /* The number of services it provides */
&nbsp;&nbsp;"curr_clnts": 0,                /* The number of currently connected clients */
&nbsp;&nbsp;"tot_clnts": 3487,                /* The total number of clients connected this ses
sion */
&nbsp;&nbsp;"total_msgs": 10635,                /* The total number of messages routed */
&nbsp;&nbsp;"svcs": [                    /* Per service statistics */
&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;"name": "session authenticate",        /* The name of the service */
&nbsp;&nbsp;&nbsp;&nbsp;"last_active": "2018-10-25 10:13:14",  /* Last event seen by that service */
&nbsp;&nbsp;&nbsp;&nbsp;"restarts": 1,                /* The number of process restarts */
&nbsp;&nbsp;&nbsp;&nbsp;"pid": 8487,                /* The process ID of the Service Router */
&nbsp;&nbsp;&nbsp;&nbsp;"queue_len": 0,                /* The current number of messages in the queue */
&nbsp;&nbsp;&nbsp;&nbsp;"maxq_len": 200,                /* The maximum number of entries in the queue */
&nbsp;&nbsp;&nbsp;&nbsp;"replies": 0,                /* (not applicable for authentication service) */
&nbsp;&nbsp;&nbsp;&nbsp;"maxq_sz": 0,                /* The maximum queue size in bytes */
&nbsp;&nbsp;&nbsp;&nbsp;"requests": 3487                /* The number of messages routed by this service 
*/
&nbsp;&nbsp;},
&nbsp;&nbsp;{
&nbsp;&nbsp;&nbsp;&nbsp;"name": "event log",            /* The name of the service */
&nbsp;&nbsp;&nbsp;&nbsp;"last_active": "2018-10-25 10:13:01",  /* Last event seen by that service */
&nbsp;&nbsp;&nbsp;&nbsp;"restarts": 1,                /* The number of process restarts */
&nbsp;&nbsp;&nbsp;&nbsp;"pid": 8488,                /* The process ID of the Service Router */
&nbsp;&nbsp;&nbsp;&nbsp;"queue_len": 0,                /* The current number of messages in the queue */
&nbsp;&nbsp;&nbsp;&nbsp;"maxq_len": 200,                /* The maximum number of entries in the queue */
&nbsp;&nbsp;&nbsp;&nbsp;"replies": 0,                /* The number of replies to services */
&nbsp;&nbsp;&nbsp;&nbsp;"maxq_sz": 102408,                /* The maximum queue size in bytes */
&nbsp;&nbsp;&nbsp;&nbsp;"requests": 0            /* The number of messages routed by this service 
*/
&nbsp;&nbsp;} ]
}

If the error 6101.53 Error accessing Message Router statistics - No such file or directory is displayed, the Message Router is not running, and should be restarted to continue normal operations.

ℹ️
Note
The setting restservice <yes|no>can be configured in pb.settings so that pblighttpd-svc does not start up the REST service. However, it allows the Message Router to run.

If the Message Router is not running and messages need to be routed to the various services, the requests are queued in the Message Router directory (normally /opt/pbul/msgrouter) and are similar to those in the example:

Example

# ls -l /opt/pbul/msgrouter/
total 100
-rwx------ 1 root root  5482 Oct 25 10:26 wq_0004
-rwx------ 1 root root 10251 Oct 25 10:26 wq_0255
-rwx------ 1 root root 10251 Oct 25 10:26 wq_0501
-rwx------ 1 root root  5482 Oct 25 10:26 wq_0550
-rwx------ 1 root root 10251 Oct 25 10:26 wq_0684
-rwx------ 1 root root  5482 Oct 25 10:26 wq_0755
-rwx------ 1 root root 10251 Oct 25 10:26 wq_0785
-rwx------ 1 root root  5482 Oct 25 10:26 wq_0858
-rwx------ 1 root root  5482 Oct 25 10:26 wq_0869
-rwx------ 1 root root 10251 Oct 25 10:26 wq_0912

These Message Router Write Queue files are used for temporary storage when the Message Router is unavailable or under severe load. Once the Message Router is available again, it consumes these files and stores the data in the appropriate databases.

ℹ️
Note
This presents a significant difference from previous EPM-UL functionality in that event logs and other integrated products are not updated while the Message Router is unavailable. The data is stored securely in the Message Router Write Queues until it can be processed in the normal manner when the Message Router is available again.

The Message Router logs all errors in the REST log, in the specified directory, and so we recommend that this log be regularly monitored. One of the warnings that may be displayed is WARNING: Out of free slots for the Message Router. Consider increasing 'messagerouterqueuesize' to avoid slowdown.

This warning is not critical and is simply stating that the Message Router is under an increased load and would run faster if the specific setting is increased. If your system is displaying this warning on a regular basis, and often experiences heavy load, we recommend that you increase the messagerouterqueuesize setting, incrementing by roughly 25% each time. This does, however, use up more memory resources on the system as larger shared memory queues are used.

ℹ️
Note
pbconfigd has a --call option. This action requires a JSON string parameter to process and processes as if the call was made over REST. This allows specific calls that are required to action licensing and message router queuing calls to be made.

Troubleshoot BeyondInsight for Unix & Linux

Application logs

Application logs are available. The location differs based on the operating system:

For systemd machines, use systemd run journalctl -u pbsmc.
For SysV or Upstart machines, the log is located in /var/log/pbsmc.log.
For Windows machines, the log is located in ProgramFiles (x86)\PBSMC\pbsmc.log.

Common error messages

Hosts section displays credential error when selecting actions

If there are no credentials stored and an action is chosen requiring authentication, an error is displayed.

Oops, No Products Found displayed on Management page

BeyondInsight for Unix & Linux cannot locate either the Endpoint Privilege Management for Unix and Linux or BeyondInsight for Unix & Linux software to deploy.

Unable to install EPM-UL, and AD Bridge

ℹ️
Note
For more information, see the Tasks page.

Discover does not locate a host

Verify the host is available, reachable from the network, and from an SSH-enabled port.

Unable to connect to EPM-UL using REST

ℹ️
Note
For more information, see the Tasks page. In most cases, the port is not available. Check the REST port on the Host Details page, and verify your firewall is accepting connections.

Troubleshoot Password Safe issues

Certificates

Password Safe is installed with a self-signed certificate. If this is not changed to a trusted issuer, the certificate should be added to the BeyondInsight for Unix & Linux systems certificate store to be trusted. The following provides high-level steps on importing certificates.

Copy the public certificate from the Password Safe server to the BeyondInsight for Unix & Linux server. This should be a .crt file.
Install the .crt file to the system key store. The process is different depending on the operating system.

macOS

Open Keychain Access, and drag the .crt file into the System node.
Double-click to open and expand the Trust leaf.
Select Always Trust.

Windows

Click Start and type MMC.
From the File menu, select Add/Remove Snap-In > Certificates > Add.
Select Computer Account, and click Next.
Select Local Computer.
After the snap-in is added, expand Certificates and right-click Trusted Root Certification Authorities.
Select All Tasks > Import and add the .crt file.

CentOS and Red Hat Linux

If not available, install ca-certificates

yum install ca-certificates

Enable dynamic configuration

update-ca-trust force-enable

Copy the .crt

cp <cert.crt> /etc/pki/ca-trust/source/anchors/

Update the trusted list

update-ca-trust extract

Debian and Ubuntu

Copy the .crt file

cp <cert.crt> /usr/local/share/ca-certificates

Update the cert list

sudo update-ca-certificates

Refresh the cert list

sudo update-ca-certificates --fresh

ℹ️
Note
For more in-depth system information, see the appropriate operating system documentation.

Troubleshoot RNS deployment in BIUL

This section serves as a reference for some of the more common issues users might experience.

Primaries

Registry Name Service (RNS) synchronizes data every two minutes by default. As a result, changes on the primaries aren’t propagated to secondaries immediately.

Policy mode

Policy mode is not auto synchronized. Changing from role-based policy (RBP) to script and vice-versa does not replicate. We recommend that the mode be manually toggled on secondaries when the primary changes via Settings & Configuration.

Domain names

We strongly recommend using fully qualified domain names with RNS to reduce complexity of name resolution.

Troubleshooting Issues

Troubleshoot RNS and database sync issues

Check the IP addresses and list of service groups

Message router troubleshooting

ℹ️
Note

ℹ️
Note

ℹ️
Note

Troubleshoot BeyondInsight for Unix & Linux

Application logs

Common error messages

Hosts section displays credential error when selecting actions

Unable to install EPM-UL, and AD Bridge

ℹ️
Note

Discover does not locate a host

Unable to connect to EPM-UL using REST

ℹ️
Note

Troubleshoot Password Safe issues

Certificates

macOS

Windows

CentOS and Red Hat Linux

If not available, install ca-certificates

Enable dynamic configuration

Copy the .crt

Update the trusted list

Debian and Ubuntu

Copy the .crt file

Update the cert list

Refresh the cert list

ℹ️
Note

Troubleshoot RNS deployment in BIUL

Primaries

Policy mode

Domain names

Troubleshoot RNS and database sync issues

Check the IP addresses and list of service groups

Message router troubleshooting

ℹ️Note

ℹ️Note

ℹ️Note

Troubleshoot BeyondInsight for Unix & Linux

Application logs

Common error messages

Hosts section displays credential error when selecting actions

Unable to install EPM-UL, and AD Bridge

ℹ️Note

Discover does not locate a host

Unable to connect to EPM-UL using REST

ℹ️Note

Troubleshoot Password Safe issues

Certificates

macOS

Windows

CentOS and Red Hat Linux

If not available, install ca-certificates

Enable dynamic configuration

Copy the .crt

Update the trusted list

Debian and Ubuntu

Copy the .crt file

Update the cert list

Refresh the cert list

ℹ️Note

Troubleshoot RNS deployment in BIUL

Primaries

Policy mode

Domain names

ℹ️
Note

ℹ️
Note

ℹ️
Note

ℹ️
Note

ℹ️
Note

ℹ️
Note