Using Nagios to Monitor Test Systems

Nagios is a well known tool with operations teams. It is used to monitor all kinds of operational parameters – from simple machine up/down monitoring to detailed data collection. However, I have rarely seen this useful tool used in test environments. Here are three ways Nagios can provide benefits to a test team.

First, just using Nagios to monitor whether test systems are up and running can provide useful information and possible time savings to a test team. Knowing that a database server has gone down might save the entire test team time and frustration from tracking down "bugs" which are just the result of a machine outage.

Second, consider basic CPU and memory utilization monitoring of all test systems. This data can be collected and graphed with a variety of tools. I have had success using RRDTool and nagiosgraph. This toolset allows the team to see the variation of CPU utilization, memory utilization and whatever else you decide to measure over time. This view may allow the team to spot potential performance or scaling issues long before formal performance testing begins.

Finally, consider writing your own plugins for measurements unique to the system under test. Once you start doing this, you will discover all kinds of things Nagios could be used for to aid in not only monitoring the test environment but actually testing the application. For example, I once wrote a plugin that would run a database query to verify the number of record processed in the last 15 minutes. I set appropriate thresholds. Weeks later, I received a monitor email alerting me that no records had been processed in the last 15 minutes. Even though I was not testing that part of the system, I immediately knew we had a major issue with the latest build.

 

Standing Agenda for Weekly Distributed Team Meetings

I am not currently managing any distributed teams. However, I used to and we always struggled with effective team meetings. Through trial and error we came up with the agenda below. This was a waterfall project and the agenda reflects that.

Development

  • What modules are completed ready for handoff to QA?
  • What modules are expected to be completed this week?
  • Proposed changes to code already handed off to QA (refactoring, new features, non-QA generated bug fixes, re-organization of repository, etc.)
  • Any additional tasks that have been identified
  • Current target milestone date(s) – reason for delta (if any)

Test

  • What modules has testing been completed?
  • What modules are expected to be tested this week?
  • Any blocking issues or high priority bugs outstanding?
  • Any additional tasks that have been identified
  • Current target milestone date(s) – reason for delta (if any)

Current Open Topics From Past Meetings

  • First item
  • Second item
  • etc.

 

March 2010 mensming Twitter Posts

Follow mensming on Twitter

10:06 PM Mar 31, 2010
It takes more than 10 years, but OpenSSL finally reaches v1.0.0 – http://www.openssl.org/news/

7:55 AM Mar 31, 2010
When a bug in a beta gets a whole news article – http://bit.ly/bZ2tnl

7:08 AM Mar 30, 2010
Academic paper – "You are Who You Know: Inferring User Profiles in Online Social Networks" – http://bit.ly/9ROgoG

5:45 PM Mar 29, 2010
Powell’s Technical Books by itself is worth a trip to Portland, OR

5:31 PM Mar 28, 2010
Enjoying Netflix streaming on the Wii.

3:14 PM Mar 26, 2010
Grt advice RT@marick: If you are a speaker, ask yourself before talk “what important points am I leaving out?” If none: your talk’s too long

5:33 AM Mar 26, 2010
A handy flow chart of http response codes – http://bit.ly/aRjZ41

7:15 AM Mar 24, 2010
Call for Papers – International Conference on Software Quality – http://bit.ly/9ojGuD

8:57 PM Mar 23, 2010
Dept. of Defense journal article on continuous integration – nothing new – it is interesting to see this in a DOD pub.- http://bit.ly/aGlWTv

6:51 AM Mar 23, 2010
A simple getCSSCount for use with Selenium-RC – http://bit.ly/9aYXVC #selenium

7:21 PM Mar 22, 2010
New business cards today. For the first time – no street address. Name, title, mobile, twitter and linkedin profile.

6:58 AM Mar 22, 2010
University of Haifa – Life and death of online communities – http://bit.ly/bBGzIZ

7:09 AM Mar 18, 2010
A Big Case of …OOPS… (SQL Injection) – http://bit.ly/aBcKRO

6:23 PM Mar 17, 2010
Scott Berkun’s "The 22 minute meeting" – http://bit.ly/bSIPUk

4:17 PM Mar 15, 2010
Capability Immaturity Model (CImM) – http://bit.ly/dc9TZS

8:33 AM Mar 14, 2010
State Of Application Security: Nearly 60 Percent Of Apps Fail First Security Test- I am surprised it is not higher – http://bit.ly/cuN5yA

9:45 AM Mar 13, 2010
RT @QALINKS Dilbert asked to help with software testing: http://bit.ly/aUwR0j

2:29 PM Mar 13, 2010
2010 PNSQC Keynote speakers announced: Tim Lister and Harry Robinson #pnsqc

6:45 AM Mar 13, 2010
Source for podcasts on software testing – http://bit.ly/dll9ae

8:42 PM Mar 11, 2010
What Comes After the iPad? – http://bit.ly/bVVmWl

7:01 AM Mar 11, 2010
Did Microsoft Leave the Social Media Space? – http://bit.ly/cnNlro

7:54 PM Mar 10, 2010
Just finished reading _Blink: The Power of Thinking without Thinking_ by Malcom Gladwell – http://bit.ly/9PeWxR

6:26 PM Mar 10, 2010
Time Flies Dept.: Dot-com craze peaked 10 years ago – http://bit.ly/b2t9du

6:55 AM Mar 10, 2010
Twitter analysis accurately predicted big ‘Hurt Locker’ win – http://bit.ly/d8I3Jg

8:30 PM Mar 9, 2010
Security Quality Requirements Engineering (SQUARE) – http://bit.ly/ca6lqz

7:00 AM Mar 9, 2010
Google PageRank-like algorithm dates back to 1941 – http://bit.ly/dmSSFy

5:55 AM Mar 8, 2010
PHP and Perl crashing the enterprise party – http://bit.ly/czA8DZ

8:12 PM Mar 5, 2010
How MySpace Tested Their Live Site with 1 Million Concurrent Users – http://bit.ly/dfmGDB

6:55 AM Mar 5, 2010
"Knowing is not enough, we must apply. Willing is not enough, we must do." -Goethe

12:42 PM Mar 3, 2010
RT @fredberinger The Long Tail of Bugs: Should we revisit it ? http://is.gd/9CUTW #softwaretesting