It's time for another Networking Field Day, however, this one is a bit different. This
Have you ever been troubleshooting a network issue, and in the process of trying to fix things make things worse? I am unfortunately guilty of having done this to myself far more times than I am proud to admit.
In reality, this is not a technical issue, it is people issue. In an environment with proper change control and incident management, no one should be making undocumented changes, but as good engineers, we should always design things with the worst in mind—particularly when humans are involved. I believe very strongly that all changes must be documented in an environment, and ideally, should approved by another engineer or management before being implemented. As a fail-safe, all environments should have an automatic system that captures any and every change to the environment, allowing for a reliable and simple analysis of the oh-so-important troubleshooting question of "What changed?" Here are some methods that I have used to achieve this over the years.
Method One - Session Logging
Session logging is a quick and easy win when it comes to capturing changes, especially when you are a small shop with a small and limited engineering staff. Session logging varies from one application to another, but the general concept is the same—Everything that gets input or output via the command line during a remote session on a piece of network equipment gets logged to a file. I won't go into how to achieve this in this post, as it has been covered quite well by others who are far more eloquent than I, however, I will likely cover how I have logging setup in a future post about my SecureCRT setup, as I am quite fond of the tweaks that I have in place.
When logging sessions, I find it very important to include the following:
- Date and time of session
- Device hostname
- Username of human interacting with CLI.
You can organize your logs in various ways, each with their own benefit. One way is to store logs chronologically in a directory tree, and have the filename contain the username, hostname, and timestamp, like so:
Another method would be to have a device-based directory hierarchy like so:
These two examples are by no means exhaustive, and each have their benefits. The device-based directory structure is particularly nice if you are writing these logs to a shared folder that multiple engineers have their logs dumped into automatically, as you can then have a nice aggregation of logs for a particular device. This is great if you are trying to see what has changed on a specific piece of networking gear.
This method is very rudimentary, and will not catch everything, notably and changes made in a GUI, or made by someone who doesn't have logging configured, however it is a nice tool to have in your arsenal, and something that I always have enabled on my computer.
Method Two - Device Logging
Many networking devices will support logging changes on the device itself. In Cisco's IOS, this can be setup using the
archive command in global configuration mode. Details about this method can be gleaned from Cisco's documentation, but the gist is that you write config changes to a separate file on the device that can be viewed when trying to determine a timeline of changes and who made them. You can also output this to syslog and do whatever you please with the data from there. In Linux/Unix systems, you can use the
history command to view all commands entered into the system (by the active user). Add a pipe to
grep, and you have yourself a nice and easy to view the changes that you have made.
This method is very simple to setup, requires little or no additional infrastructure, and many device types and vendors support it out of the box. The downside here, is that it does require the device to support it, and it will only catch people making changes in good faith—Erasing the log would be trivial if somebody was trying to cover their tracks.
Method Three - Centralized Configuration Management
Now we have come to my personal favorite, the Macchina magnifica—Centralized Configuration Management. This method has many benefits at the downside of additional complexity, but to me, it is well worth the work to setup and maintain. There are many ways to perform central config management, but my preferred method is to use the FOSS tool Oxidized.
I won't subjugate you to the detail of my setup or the setup process in this post, however, at a high level, Oxidized connects to a list of network devices that you feed it, and will run a set of commands against the device to gather information at a defined interval. Once it has gathered information from the device, it will run a diff on the data from the last time Oxidized ran against the device. If the most recent information has changed, Oxidized will store the full set of the information gathered in the storage method of your choice—for me, I use a git repository. The beauty of this tool is that you now have a nice clean and centralized log of every change that has occurred on any particular device, and you can compare any two configs on a device from any point in time to see what has changed. Oxidized has a very nice web component (optional) that allows you to see a device's most recent changes, full configuration, and diffs from one point in time to another. You can even use event hooks to email you with a summary of any change that has occurred, a nice way to double check that you implemented all of the changes according to the CR that you definitely submitted before making the changes, right?
This method requires a bit of setup and tweaking, but ultimately I feel that it is the best method for capturing what has changed in a network environment, especially when troubleshooting an issue. Being able to gather changes from a myriad of devices from different vendors, running different software versions, and supporting various features has saved my bacon many times.
These three methods are by no means the only way to gain visibility into the changes that occur in your environment. Additionally, no one method is completely bulletproof. I actually run a combination of all three of these in my environment, and each one serves a slightly different purpose. Ultimately, you know what is best for your environment, and hopefully this has helped point you in the right direction, or perhaps has introduced you to a tool you have never seen before. If you have any other methods that you have used in your experience, please drop it into the comments below. I would love to add another tool to my arsenal. Thanks for taking the time to read this.