Event Monitoring and Support Tools
I have responsibility for both the Unix Operations and the Windows Operation within our company, and I struggle to find tools that work well in both environments. Both groups work autonomously and do a great job of supporting their respective environments, however as our responsibilities grow, I would like to find consistent tools for maintaining, monitoring and supporting our environments.
Two tools that do work in both environments are BMC Patrol for performance gathering and HP Openview for event notification and monitoring. We are not using Openview to monitor our network, instead we use it for system and event monitoring. We have a decent baseline of monitored events, however we struggle with event escalation especially when new errors or a new event occurs. We have installed IBM Director on the IBM nodes and it passes Hardware events to Openview, and we have scripted certain events which also hand off to Openview. There still is a lot of refinement needed in our process as many events enter Openview and are not reacted to because the criticality level is identified incorrectly or the followup and escalation are not defined in Openview. We are looking to reduce the number of system outages that impact our user community by improving our event escalation and notification.
Another tool that is in use in on our Unix platform is Tripwire.Tripwire is an auditing tool that will identify changes in files and will enable you to better track changes in your environment. We have started looking the Tripwire on the Wintel side and hope to have it deployed within a month.
I am open to others thoughts and experiences surrounding event monitoring and event notification in a multi-OS environment and would be glad to discuss this in detail.
Comments