Failover

Configure failover to provide backup user request processing and activity logging.

Configure backup request processing

To configure backup request processing, specify the backup policy server hosts by using the submitmasters keyword in /etc/pb.settings on the submit host. The order of attempts is either the order in the submitmasters keyword list, or the order specified by RNS, or possibly random if the randomizesubmitmasters keyword is set.

Failover for submitmasters means that if a host doesn’t respond within masterdelay milliseconds, a second connection is attempted to the next host in the (possibly randomized) list. Once a connection is made, the EPM-UL protocol is negotiated. Those negotiations include SSL/Kerberos/networkencryption and other protocol information.

If those negotiations succeed, that connection is used for request approval and further failover does not happen.
If the connection is not answered, or the negotiations do not succeed, more hosts are tried in succession until one is successful or the list is exhausted.

Connections are attempted in order, however depending on the masterdelay value and the speed/busyness of the network and hosts, may finally succeed to a server specified later in the list.

Configure backup logging

To configure backup logging, specify the backup log hosts by using the logservers keyword in /etc/pb.settings on the policy server host.

EPM-UL v22.3 introduces a second level of failover for logservers that takes place after the protocol negotiations are successful, but then an error happens (for example, disk full or insecure log file/directory). This mechanism tries the listed logservers in succession until a logserver reports that it has successfully logged the event. When randomizelogservers is used, only the first logserver attempt is randomized. Diagnostic messages indicating the failed servers are logged. The transparentfailover keyword controls whether the end user sees the diagnostic messages.
Previous versions behave similarly to the policy server connection, in which the relevant keywords are logservers, logserverdelay, and randomizelogservers. Connection attempts are made until the protocol negotiations succeed.

ℹ️
For more information on failover, see submitmasters, logservers.

Fine tuning policy server and failover connection timing

masterdelay

Version 4.0.0 and later: masterdelay setting available.

When a request is submitted, the policy server hosts that are listed in the submitmasters line are tried in the order they appear, from left to right. The masterdelay setting enables the administrator to adjust the amount of time between failover attempts.

Without a specified time-out, the client tries the first policy server host on the submitmasters line. If it does not receive a response within 500 milliseconds, then the client adds the second policy server host. If neither responds in the next 500 milliseconds, then the client adds the third policy server host, and so on. By specifying a masterdelay, you can change the 500 millisecond waiting period before the client goes on to the next policy server host.

If each connection fails to connect within the masterdelay number of milliseconds, then the connect code uses select() to wait for any of the connections to complete. The amount of time allowed for this wait depends on whether the client is a standard client or a cached client.

Standard client: The total time allowed for the select() is masterdelay*5
Cached client: The time allowed is masterdelay*3. The availability of the cached client means that a backup method is in place to prevent the client from failing; therefore a smaller wait for the select() following failure of individual connection attempts is reasonable.

With a masterdelay of 0 milliseconds, you get the fastest possible connection, but the policy server you connect to may not be predictable. You might also increase network traffic, depending on the number of connections that are opened.

With a larger masterdelay, you can increase the predictability, but you might also increase the time needed to form a failover connection. The longer the delay, the more predictable the sequence is.

Example

masterdelay 200

Default

masterdelay 500

Used on

Submit hosts

masterprotocoltimeout

Version 4.0.0 and later: masterprotocoltimeout setting available.

After a connection is established, the programs perform some protocol checks to verify a proper and working connection. Some types of protocol failures could take a long time to determine (for example, wrong service running on the policy server port, or mismatched encryption types/keys).

The masterprotocoltimeout setting enables the administrator to control the maximum time to wait for protocol completion. If a protocol step does not complete within the specified number of milliseconds, then the client continues to try the next policy server host in sequence. A value of -1 indicates no protocol timeout.

Example

masterprotocoltimeout 2000

Default

masterprotocoltimeout 500

Used on

Policy server hosts
Run hosts
Submit hosts

Fine tuning log servers and failover connection timing

logserverdelay

Version 4.0.0 and later: logserverdelay setting available.

When a log request is processed, the log servers that are listed in the logservers line are tried in the order they appear, from left to right. The logserverdelay setting enables the administrator to adjust the amount of time between failover attempts.

Without a specified time-out, the logging program (for example, pbrun, pbmasterd, pblocald, etc.) tries the first log server on the logservers line. If it does not receive a response within 500 milliseconds, then it adds the second log host. If neither responds in the next 500 milliseconds, then it adds the third log host, and so on. By specifying a logserverdelay, you can change the 500 millisecond waiting period before the logging program goes on to the next log server.

With a logserverdelay of 0 milliseconds, you get the fastest possible connection, but the log server that you connect to may not be predictable. You might also increase network traffic, depending on the number of connections that are opened.

With a larger logserverdelay you can increase the predictability, but you might also increase the time needed to form a failover connection. The longer the delay, the more predictable the sequence is.

Example

logserverdelay 2500

Default

logserverdelay 500

Used on

Policy server hosts
Run hosts
Submit hosts by pbksh and pbsh when a policy server is not available

logserverprotocoltimeout

Version 4.0.0 and later: logserverprotocoltimeout setting available.

After a connection is established, the programs perform some protocol checks to verify a proper and working connection. Some types of protocol failures can take a very long time to determine. For example, the wrong service running on the log server port, or mismatched encryption types/keys.

The logserverprotocoltimeout setting enables the administrator to control the maximum time to wait for protocol completion. If a protocol step does not complete within the specified number of milliseconds, then the logging program continues to try the next log server in sequence. A value of -1 indicates no protocol timeout.

If the iologack setting is used, then the logserverprotocoltimeout setting also controls how long a submit host should wait for an acknowledgment from the log host.

Example

logserverprotocoltimeout 2000

Default

logserverprotocoltimeout 500

Used on

Log hosts
Policy server hosts
Run hosts
Submit hosts by pbksh and pbsh when a policy server host is not available

ℹ️
For more information, see iologack.

randomizelogservers

Version 9.2.0 and earlier: randomizelogservers setting not available.
Version 9.3.0 and later: randomizelogservers setting available.

The randomizelogservers setting forces the policy server/submit host/run host to choose a log server host at random, rather than choosing the first available log server host that is specified in the logservers setting. This feature balances the load among multiple log server hosts.

ℹ️
The use of randomizelogservers can cause accept and finish events to be located on different log servers if the log servers are configured with eventdestinations set to a flat file (authevt=) or an SQLite Database (authevt=db).
However, if eventdestinations is set to authevt= (same ODBC Oracle or MySQL database on all the log servers), then the accept and finish events are stored on the same Oracle or MySQL server. The default randomizelogservers setting is no.

ℹ️
The randomizelogservers keyword should not be used with the use of DNS SRV lookups. The randomizelogservers keyword can result in accept and finish events logged on different logservers, causing the need to merge iologs.

Example

randomizelogservers yes

Default

randomizelogservers no

Used on

Submit hosts
Run hosts
Policy servers

Acknowledge failovers

transparentfailover

Version 5.1.1 and earlier: transparentfailover setting not available.
Version 5.1.2 and later: transparentfailover setting available.

A transparentfailover occurs when an initial connection to a policy server or logserver host has failed and the program performs a failover to another available policy server or logserver host in the list. To acknowledge that a user failover has occurred, error messages from the failed connection are displayed to the user.

The transparentfailover setting enables you to suppress the following failover error messages:

Any Kerberos initialization error
3084 initMangle failure during startup
3089 Could not send initial protocol header to Policy Server
3090 Did not receive initial protocol header from Policy Server
8534 Policy Server on %s is not SSL enabled
1913 Invalid Policy Server daemon on Policy Server host %s

When transparentfailover is set to yes, failover error messages listed above are suppressed. To display failover error messages, set transparentfailover to no in the pb.settings file.

Example

transparentfailover yes

Default

transparentfailover yes

Used on

Submit hosts