System Monitoring

Monitoring nodes and networks helps to gauge performance. Knowing where the bottlenecks are allows you to upgrade corresponding components and is a key factor in improving overall performance and assuring proper quality of service. In addition, thorough monitoring also helps you to detect attacks because you can discover anomalies. For example, monitoring the traffic throughput on the Internet uplink allows you to detect infected machines that perform a DDoS attack because the increased bandwidth usage will be visible.

There are different ways to perform monitoring. First, you can use various solutions to read performance counters out of the /proc file system. For example, to graph network traffic throughput on a local Ethernet interface, you can read packet counters from /proc/net/dev, and then feed those values into a monitoring/graphing solution.

However, having to perform the local monitoring on each node does not scale well in large or growing environments. The second way to perform monitoring is to collect performance counter data over a network. The most widespread method is to use Simple Network Management Protocol (SNMP), which you can implement on Linux using the Net-SNMP package. SNMP has the advantage in that it is an open standard and many vendors allow querying of their operating systems or devices through it.

A comfortable network monitoring solution contains three components: A proper management/configuration facility, a data collection agent, and a graphical visualization of the collected data. RRDTool is a popular solution that visualizes numerical input data into graphs. You can use it for custom-made solutions, and many existing projects use it as their backend data storage and data visualizing facility.

RRDTool

RRDTool is the round-robin database tool. It was designed to handle time-series data like network bandwidth, network interface packet counters, CPU/memory/ disk load, and so on. The data is stored in a round-robin database so the storage footprint remains constant over time. You can use RRDTool to write monitoring shell scripts. You can even include RRDTool within applications due to the Perl, Python, and PHP bindings.

Many tools rely upon RRDTool as their data storage engine, for instance, MRTG, Cacti, Munin, and Smokeping.

Multi-Router Traffic Grapher (MRTG) is quite a popular tool, written by the author of RRDTool. Initially, MRTG was intended to graph the traffic of router interfaces, but it was soon used for a large variety of other tasks ranging from graphing other types of computer devices to graphing weather data. You can also use it to monitor and graph CPU, memory, and so on.

Another solution for graphing performance data is Cacti, which is also based on RRDTool. You can configure this application through a web interface, and it allows defining query templates and output templates to gather and graph various types of data. Another useful feature is the built-in user management functionality that allows assigning privileges to user accounts.

Monitoring doesn't necessarily need to focus on getting and analyzing performance data. Monitoring also addresses the availability of a system, which is key to business success and a part of the Confidentiality-Integrity-Availability (CIA) goal security professional warranties. Therefore, security considerations also need to include measures that affect the availability of a network. In a well-maintained network, the IT staff will notice service interruptions before their customers do. Several applications can be used to perform service monitoring and send a notification in case an error occurs. A widely used open-source solution is Nagios, as shown in Figure A-5. You can extend it with plug-ins to fit into almost any situation. NagiosExchange is a good resource with links to

Host State Breakdowns

Type / Reason

Time

'ii Total Time

Unscheduled

27d 23h 48m 11s

99.971 %

99.971 %

UP

Scheduled

Od Oh Om Os

0.000%

0.000%

Total

27(1 23h 48m 11s

99.971%

99.971%

Unscheduled

Od Oh 11m 49s

0.029%

0.029%

DOWN

Scheduled

Od Oh Om Os

0.000%

0.000%

Total

Od Oh 11 m 43s

0.029%

0.029%

Unscheduled

Od Oh Om Os

0.000%

0.000%

UNREACHABLE

Scheduled

Od Oh Om Os

0.000%

0.000%

Total

Od Oh Om Os

0.000%

0.000%

Nagios Not Running Od Oh Om Os

0.000%

Undetermined

Insufficient Data

Od Oh Om Os

0.000%

Total

Od Oh Om Os

0.000%

All

Total

28d Oh Om Os

100.000%

100.000%

Figure A-5 Nagios reports system availability.

Figure A-5 Nagios reports system availability.

a lot of useful plug-ins and tools that can be used to enhance Nagios further. Besides that, many closed-source solutions like Tivoli, OpenView, or Big Brother are available.

Was this article helpful?

0 0
The Ultimate Computer Repair Guide

The Ultimate Computer Repair Guide

Read how to maintain and repair any desktop and laptop computer. This Ebook has articles with photos and videos that show detailed step by step pc repair and maintenance procedures. There are many links to online videos that explain how you can build, maintain, speed up, clean, and repair your computer yourself. Put the money that you were going to pay the PC Tech in your own pocket.

Get My Free Ebook


Post a comment