Wednesday, December 22, 2010

Receiving SNMP Traps in Nagios

SNMP traps are alerts and notifications generated by SNMP-enabled devices. The traps con-tain information about the status or an event on an SNMP-enabled device. For example, an authentication event or the change in status of an interface on a router may generate an SNMP trap that is sent to a management station of some sort, such as HP OpenView, CiscoWorks, Nagios.

Pre-requisites:

1. Net-SNMP with snmptrapd configured.
2. SNMPTT, SNMP trap translator.
3. Nagios.
5. Mib definition files for the equipment or software you need to monitor.

Installing Net-SNMP packages:

The Net-SNMP package is available as a series of installable packages on many distributions. Indeed, it may already be installed on your system or you may be able to install it via your distribution’s package management system, such as yum, apt, or the like. On Red Hat, SuSE, Debian, and Mandrake
distributions, the required packages are called net-snmp, net-snmp-libs, and net-snmp-utils.

Installing Net-SNMP packages on Centos 5.5

# yum install net-snmp net-snmp-libs net-snmp-utils net-snmp-perl perl-Net-SNMP net-snmp-devel

Configuring and Running the snmptrapd Daemon
When incoming traps are received from the snmptrapd daemon, they are passed to the SNMPTT tool. The SNMPTT tool will then try to match the incoming trap against the collection of trap definitions that it has translated. If the trap matches, SNMPTT will see if the translated trap definition contains logic to output it to Nagios and execute that logic. The trap is then out-put to Nagios as a passive check result.

On Centos 5.5

# vi /etc/sysconfig/snmptrapd.options
#OPTIONS="-On -Lsd -c /etc/snmp/snmptrapd.conf -p /var/run/snmptrapd.pid"
OPTIONS="-On -Lsd -p /var/run/snmptrapd.pid"

Make sure to remove the -c /etc/snmp/snmptrapd.conf part, otherwise you will receive TRAP twice, as snmptrapd' is compiled with the default configuration file path being already set to '/etc/snmp/snmptrapd.conf'.

As quoted from SNMP Trap Translator documentation: "The -On is recommended. This will make snmptrapd pass OIDs in numeric form and prevent SNMPTT from having to translate the symbolic name to numerical form."

# vi /etc/snmp/snmptrapd.conf
traphandle default /usr/sbin/snmptthandler
disableAuthorization yes
#donotlogtraps  yes

The traphandle directive tells the snmptrapd daemon how to handle incoming traps and where to send them. Adding the default option tells the daemon that this is the default way to handle all incoming traps. All traps will be sent to the snmptthandler script located in the /usr/sbin directory, the "disableAuthorization yes" tells to accept SNMP traps from all you can configure it to do authentication for detail refer to snmptrapd.conf manual.

Installing SNMPTT (SNMP Trap Translator)
You can get the SNMPTT tool from Sourceforge at http://snmptt.sourceforge.net/. This line shows how to download and unpack the SNMPTT tool:
Download snmptt_1.3.tgz which the latest version/stable release.

tar -zxvf snmptt_1.3.tgz

The SNMPTT package has no installation script, so a number of manual installations steps need to take place. First, copy the SNMPTT binaries to a suitable directory and mark them as executable. I recommend using the /usr/sbin directory

# cp snmptt snmptthandler /usr/sbin/
# chmod +x /usr/sbin/snmptt /usr/sbin/snmptthandler

I specified the snmptthandler binary as the value of the traphandle option in the snmptrapd.conf configuration file in the previous section. When a trap is received, this binary is executed by default and the trap sent to the snmptt daemon

Next, copy the SNMPTT configuration file, snmptt.ini, to the /etc/snmp directory and snmpttconvertmib utility

# cp snmptt.ini /etc/snmp/
# cp snmpttconvertmib /usr/sbin/

Also needed are a user and group to run the SNMPTT daemon as.

# groupadd snmptt
# adduser -g snmptt snmptt

# chown snmptt:snmptt /etc/snmp/snmptt.ini

The SNMPTT tool also needs a spool directory to hold the incoming traps. I usually use the default directory of /var/spool/snmptt. It needs to be owned by the user and group that will run SNMPTT. Create and change the ownership of the directory like so

# mkdir /var/spool/snmptt
# chown snmptt:snmptt /var/spool/snmptt

Finally, in order to start the SNMPTT tool, you can either execute it from the command line or use the init script provided with the package. On the following line

SNMPTT started in daemon mode:
# /usr/sbin/snmptt -daemon

Or copy the init script provided with the package, you can then add it to your startup process.

# cp snmptt-init.d /etc/init.d/snmptt

To start/stop/reload you can do with ..

/etc/init.d/snmptt start/stop/reload

Configuring SNMPTT
The first is configuring the /etc/snmp/snmptt.ini file. The file contains quite a large number of directives, but I’ll only look at those relevant to the process of translating and transmitting the received traps to Nagios

mode = daemon
daemon_fork = 1
daemon_uid = snmptt
spool_directory = /var/spool/snmptt/
sleep = 5
dns_enable = 1
strip_domain = 1
log_enable = 1
syslog_enable = 0
exec_enable = 1
snmptt_conf_files = <
/etc/snmp/snmptt.conf
END


The sample snmptt.ini file contained in the SNMPTT package has detailed explanations of all the directives and options that you can specify. I recommend reading this file for further information and explanations about SNMPTT’s configuration options

Tip:  If you enable DNS resolution, I recommend you add all the hostnames that need to be resolved to the local /etc/hosts file on your host server. This prevents your DNS server from being a bottleneck or preventing SNMPTT from functioning if your DNS server is unavailable.

Compiling MIBs
You must gather all MIBs for monitored software, so you can feed SNMPTT with them. Compiling consists in extracting each OID of type "trap" and its associated comments, and generate a configuration file in SNMPTT format from these informations.

Run the following command on each of your MIB files:

snmpttconvertmib --in= --out=/etc/snmp/snmptt.conf. \
--exec='/usr/local/nagios/libexec/eventhandlers/submit_check_result $r TRAP 1'

The resulting SNMPTT configuration file will contain blocks (one per selected OID)


Catchall Trap Definition
SNMPTT also has a regular expression–matching capability that allows you to use an EVENT line that matches multiple incoming traps, a catchall trap definition. This means you don’t need to define individual translated trap definitions for each possible incoming trap.

Catchall Trap Definition
EVENT CatchAll .1.* "SNMP Traps" Critical
FORMAT $D
EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result "$r" 
"snmp_traps" 2 "$O: $1 $2 $3 $4 $5"

I could also be more selective and select OIDs from a particular vendor or class of trap either using a wildcard or regular expression pattern matching. I’ve added a category called SNMP Traps and severity of Warning.

For example here is example to catchall traps from a specific vendor OID.

EVENT CatchAll .1.3.6.1.4.1.20916.* "Status Events" Normal
FORMAT A room-alert-4e-snmp-trap indicates that an alarm $*
EXEC /usr/lib/nagios/plugins/eventhandlers/submit_check_result $r "snmp_traps" 1 "A room-alert-4e-snmp-trap indicates that an alarm $*"
SDESC
A room-alert-4e-snmp-trap indicates that an alarm
condition has occurred on the sensor indicated
by the alarmmessage variable.
Variables:
  1: alarmmessage
EDESC

When done, add to SNMPTT configuration file /etc/snmp/snmptt.ini the path to compiled configuration files:

[...]
snmptt_conf_files = <
/etc/snmp/snmptt.conf.
/etc/snmp/snmptt.conf.
END


Configuring Nagios
You will use passive checks to receive SNMP traps but they also will be volatiles. If ever two traps are received from the same host, the second one coming in before the first one was reset to OK, we want to be notified twice, although there is no state change. That's why we use a volatile service.

You might define (for example) a service template for SNMP traps, inheriting from a generic service template:

define service{
        name                            generic-service    
        active_checks_enabled           1                     
        passive_checks_enabled          1                       ; Passive service checks are enabled/accepted
        parallelize_check               1                     
        obsess_over_service             1                      
        check_freshness                 0                      
        notifications_enabled           1                       ; Service notifications are enabled
        event_handler_enabled           1                       ; Service event handler is enabled
        flap_detection_enabled          1                       ; Flap detection is enabled
        failure_prediction_enabled      1                       ; Failure prediction is enabled
        process_perf_data               1                       ; Process performance data
        retain_status_information       1                       ; Retain status information across program restarts
        retain_nonstatus_information    1                       ; Retain non-status information across program restarts
        is_volatile                     0                       ; The service is not volatile
        check_period                    24x7                    ; The service can be checked at any time of the day
        max_check_attempts              3                       ; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10                      ; Check the service every 10 minutes under normal conditions
        retry_check_interval            2                       ; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  admins                  ; Notifications get sent out to everyone in the 'admins' group
        notification_options            w,u,c,r                 ; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60                      ; Re-notify about service problems every hour
        notification_period             24x7                    ; Notifications can be sent out at any time
         register                        0                      ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }

define service{
name                    trap-service
use                     generic-service
register                0
service_description     snmp_traps
is_volatile             1
check_command           check-host-alive    ;Used to reset the status to OK when 'Schedule an immediate check of this service' is selected.
flap_detection_enabled  0                               ; Flap detection is disabled
process_perf_data       0                               ; Do not Process performance data
max_check_attempts      1                    ; Leave as 1
normal_check_interval   1                    ; Leave as 1
retry_check_interval    1                    ; Leave as 1
passive_checks_enabled  1                    ; Enables passive checks
check_period            24x7
notification_interval   31536000                ; Notification interval.  Set to a very high number to prevent you from getting                                 pages of previously received traps (1 year - restart Nagios at least once a year! -                                     do not set to 0!).
active_checks_enabled   0                    ; Prevent active checks from occuring as we are only using passive checks.
notification_options    w,u,c                    ; Notify on warning, unknown and critical.
contact_groups          sysadmins
}

define service{
 host_name       AVT-Room-Alert ; hostname is define /etc/hosts file
 use             trap-service
 contact_groups sysadmins
}

TIP: You could also use a wildcard to create this service for all hosts or use the hostgroup_name directive to create the service for all members of a host group or groups.

I’ve defined the service as volatile and set the maximum check attempts to 1. This will cause Nagios to immediately set a HARD service state and trigger any configured notifications or event handlers. I’ve also configured it for passive checks only and disabled active checks.

Putting It All Together
The SNMPTT tool is called via the trap handler defined in the snmptrapd.conf configuration file I defined in the “Configuring and Running the snmptrapd Daemon” section. This trap handler calls the /usr/sbin/snmptthandler script. The script reads the trap and then writes it to the spool directory defined in the spool_directory directive from the snmptt.ini configuration file. The script then exits.

From here the SNMPTT daemon takes over. It reads the trap from the spool file and searches for a match in its trap definitions. If it finds a match, it executes the EXEC statement in the matching trap definition. This EXEC statement sends the passive check result to the Nagios server using the submit_check_result script. The daemon then sleeps for the period specified in the sleep directive in the snmptt.ini file and checks the spool directory for additional traps; if it finds matches, it processes them and sends the check results to Nagios.

The Nagios server has to have host objects defined for every host that generates SNMP traps. Additionally, you need to define service objects for those hosts to receive the service check results. You should configure them to receive passive check results and as volatile services.