PRM monitors the process table on a given system and matches process id's with set resource limits in the config file or per-process based rules. Process id's that match or exceed the set limits are logged and killed; includes e-mail alerts, kernel loggin
PRM (Process Resource Monitor)
Introduction
PRM monitors the process table on a given system and matches process id's with set resource limits in the config file or per-process based rules. Process id's that match or exceed the set limits are logged and killed; includes e-mail alerts, kernel logging routine and more...
How it works?
PRM works on the basis that once a process id is found matching resource limits; there is a corresponding trigger and wait value. The trigger value increments upwards from zero (0) to the defined value, pausing the duration of seconds defined as wait value. There after the status of the flagged pid is checked again, if still above or equal to resource limits the trigger/wait cycle begins again till the max trigger value is reached. When this trigger value is reached the given process is logged/killed.
This all together has the effect that applications with short burst resource spikes (e.g: apache, mysql etc..) are not killed; but rather on applications with prolonged resource consumption. Using the rule system, you can define different wait/trigger/resource values for any application.
Installation
First we must fetch the package:
wget http://www.rfxnetworks.com/downloads/prm-current.tar.gz
And extract it:
tar xvfz prm-current.tar.gz
The current version of prm as of this writing is 0.3, so lets cd to the 0.3 extracted path:
cd prm-0.3/
And finally run the enclosed install.sh script:
./install.sh
Configuration
The prm installation is located at '/usr/local/prm', and the configuration file is labeled 'conf.prm'.
Open the '/usr/local/prm/conf.prm' file with your preferred editor. There is an array of options in this file but we will only be focusing on the main variables.
Lets skip down to the user e-mail alert's section and set the USR_ALERT value to '1'; enabling alerts.
# enable user e-mail alerts [0=disabled,1=enabled] USR_ALERT="1"
And configure our e-mail addresses for alerts:
# e-mail address for alerts USR_ADDR="root, you@domain.com"
Check the 5,10, or 15 minute load average; relative to the later option below for min. load level.
# check 5,10,15 minute load average. [1,2,3 respective of 5,10,15] LC="1"
PRM optionally has a required load average for running. If the load is not equal to or greater than this value; PRM will not run. Setting this value to zero will force the script to always run but this should not be needed.
# min load level required to run (decimal values unsupported) MIN_LOAD="1"
This is the introduction described wait value, used for pauses between trigger increments. The value of wait multiplied by the value of kill_trig equal the duration of time before a process is killed (10x3=30seconds).
# seconds to wait before rechecking a flagged pid (pid's noted resource # intensive but not yet killed). WAIT="10"
The trigger limit before processes are killed, described in detail in the above 'wait' description and introduction.
# counter limit that a process must reach prior to kill. The counter value # increases for a process flagged resource intensive on rechecks. KILL_TRIG="3"
The max percentage of CPU a process should be allowed to use before PRM flags it for killing.
# Max CPU usage readout for a process - % of all cpu resources (decimal values unsupported) MAXCPU="35"
The max percentage of MEM a process should be allowed to use before PRM flags it for killing.
# Max MEM usage readout for a process - % of system total memory (decimal values unsupported) MAXMEM="15"
That is it; you should tweak the MAXCPU/MAXMEM limits to your desired needs but the defaults should be fine for most.
Usage
The executable program resides in '/usr/local/prm/prm' and '/usr/local/sbin/prm'. The prm executable can receive one of two arguments:
-s Standard run
-q Quiet run
The log path for prm is '/usr/local/prm/prm_log', as well pid specific logs are stored in '/usr/local/prm/killed/'.
A default cronjob for PRM is installed to '/etc/cron.d/prm', and is configured to run once every 5 minutes.
There is a provided ignore file, to ignore processes based on string rules. The ignore file is located at '/usr/local/prm/ignore'. This file supports line separated ignore strings. As a default the strings 'root, named and postgre' are ignored by PRM; this script was not intended to monitor root processes but rather user land tasks. It could easily watch root processes by removing the given line in the ignore file but this is strongly discouraged.
 
No comments:
Post a Comment