Creating an Operations Log

As I mentioned in a previous post, I recently joined the Kiln team at Fog Creek. One of the effects of my move is that I have needed to learn how to help run Kiln both within Fog Creek and for our customers. Essentially I have been trying to learn how to be a good operator for Kiln. I have to admit that I am struggling with this transition. I have been pushing the team to create what I was calling an Operations Handbook. Some of this information is already captured, but I kept thinking we needed to organize it better.

Today I had an epiphany.

Creating an operations handbook is a lot of work, especially at a small company. Handbooks have to be maintained, edited and I want to say curated because that word has the gravitas of how much work a handbook can really be. What the handbook has, that I want, is a collection of information about things like:

  • How to solve specific problems
  • Known scripts and commands operations people use to do their job
  • Architecture information about how the system works
  • Administrative level information about machines, networks, addresses, memory, etc..
  • Pointers to monitoring pages, tasks, scripts and logs
  • Some place for new group members to go to learn about the product's operation

The problem is that all of these things change over time. So making a handbook is not only a hard process at the start, it is an ongoing hard process that requires constant attention.

My epiphany is that by changing my goal from a handbook to a log I can get everything I want without the same kind of maintenance requirements. My idea is simple, using blog software, the team can simply document operations issues and solutions. Each time a problem comes up, the person that investigates it can simply type up a post about what they did and what they found. As we build a set of posts, the log becomes a searchable resource for operations folks to use to find scripts, commands and ideas for solving problems.

We can also start to capture log posts that are more than a simple "found a problem with X and fixed it by running Y." These longer posts, or articles, can capture architecture. Shorter posts by administrators can indicate new machines or changing machine resources.

So why is the log better than the handbook?

A log is a series of entries organized by time. A handbook is an organized collection of information. The handbook has to be updated over time. The log is just added to. Ultimately I think the log will be easier to create and keep up to date. I think the log will be easier for the entire team to contribute to in real time.

Update 5/11/2014 - We have implemented an operations log for the Kiln team and I am looking forward to posting some updates on the Fog Creek Site soon.

architecture process devops