Chris's Blog

Devops Shokunin

DNSRR – rewriting DNS for testing web applications

Comments Off on DNSRR – rewriting DNS for testing web applications

When testing web applications, it is often necessary to rewrite DNS entries to avoid XSS Javascript warnings.

Building on Rubydns my company has open sourced a quick ruby script to easily rewrite DNS queries for web testing

Available on Github

Dashboard Example with Sinatra and Mcollective

Comments Off on Dashboard Example with Sinatra and Mcollective

Having a dashboard to provide real time data to users helps minimize interruptions at work.

The combination of Sinatra handling the incoming HTTP requests and Mcollective pulling real time data from the infrastructure provides the responsiveness and self-service that saves everyone time and effort.

The example code is available on Github

Here are the screen shots running on my internal network.

Welcome Screen

Filtering Form

Results from Monitoring Agent

Results from Puppetd Agent

Using Open Source to Provide Infrastructure Services

1 Comment »

Operations Teams need to provide eight critical services to the developers and users of their environment.  At my current employer, I use open source software to provide these services that allow our developers to be more productive and our customers to experience stable, responsive service.

Click for Full Image

Source Code Management

Keep all of our bespoke software, configurations and notes under strict version control.

Sofware: Git
Pros: Fast
Many developers familiar with it due to Github’s popularity
Cons: Steep learning curve
Somewhat cryptic commands
Option: Subversion

Continuous Integration

Build, test, version and package our software so that it may be quickly and safely deployed to our staging environment

Sofware: Jenkins
Pros: Easy Integration with GIT
Nice GUI
Flexible enough to meet our needs
Cons: Configuration limited to GUI
Written in Java*
Option: Cruise Control



Spin up nodes to become part of the processing farm and decommission nodes no longer required

Sofware: Custom scripts using Fog
Pros: Simple scripts
Easy to customize
Support multiple cloud providers
Cons: Custom tool
Option: Cobbler, RightAWS


Configuration Management

Ensure that all nodes are automatically and correctly configured and remain in a known configured state

Sofware: Puppet
Pros: Easy configuration language
Well supported
Active community
Cons: Have to learn said configuration language
Requires serious investment of time
Option: Chef



Check on services and nodes to ensure that things are behaving as expected before the customer notices


Sofware: Icinga
Pros: Can be easily auto-configured by Puppet
Well understood
Nagios syntax
Works well with nagios checks and plugins
Cons: Requires serious investment of time and constant care
Option: Nagios, Zenoss


Capacity/Performance Management

Collect system metrics for assessing performance and capacity planning.  Some organizations have monitoring perform this role, but I have very strong opinions on this being kept separate.


Sofware: Collectd/Visage
Pros: Light, fast daemon on each box
Flexible server
Many plugins availble
Cons: Separate process to run
Requires a lot of disk and disk I/O
Option: Ganglia


Log Collection

Centrally collect, store and monitor system and application logs

Sofware: Rsyslog/Graylog2
Pros: Rsyslog provides flexible configs
MongoDB backed server performs well
Easy front end for log viewing
Cons: Takes a while to learn Mongo
Harder to pull/backup then text logfiles
Options: Syslog-ng Logstash


Deployment Management

Allow developers and technical staff to deploy and monitor application activity.  Since each infrastructure is unique, it makes sense to build a custom solution to this problem.


Software: Mcollective/Sinatra/ActiveMQ
Pros: Sinatra makes it easy to write simple web applications
Mcollective is extremely fast
ActiveMQ is very flexible and resilient
Cons: Sinatra is not a full featured as Rails
Mcollective requires a change of thinking about command/control
ActiveMQ is Java*
Options: Control Tier


* I list Java as a con because we do not have extensive in-house Java expertise and it rquires us to install something we would not have normally

How mcollective and puppet play nice

1 Comment »

At work, I have invested a lot of time in two tools that have made configuration and deployment as close to a painless process as I think is possible.

Puppet (available from Puppet Labs) is an amazing configuration tool that I have been working with for over a year.  Since my place of work is cloud based, I need to spin up dozens of virtual machines that need to be identically configured automatically.  Puppet allows you to achieve consistency over time as machines are configured by runs into a known state.

Now I have greater than 100 nodes and I want to perform some action on them to collect data or perform some action on each node in real time.  SSH loops are fine for a couple of machine with a static list, but I have many nodes spread out in different locations and I am not a patient individual.  Mcollective makes it possible to run massively parallel jobs across my infrastructure in seconds as opposed to minutes.

The use case that got me started was a co-worker says “We just got a call from customer XYZ and they say there is a problem.  Quick – do something”.  Because all of the nodes are in puppet and my configurations are in source control, I can immediately be sure of the state of my system configurations.  I could log into monitoring and check each host and wait for some information, but instead I just run my mcollective check that goes out to each box and performs all monitoring checks in real time to see if there is some failure*.  Within 30 seconds, I am confident that I can rule out the main two causes of trouble – configuration drift and network/host level issues and concentrate on the application itself.  In the past this might have taken 10s of minutes to ascertain system state and it was most likely the culprit as to the current outage.

When I’m asked why you need both Puppet and Mcollective, I use the following shopping analogy to explain the relationship:

Puppet is the weekly shopping trip where you buy necessities and follow a list to ensure you have everything you need for a well stocked pantry of basic ingredients and what you need for dinner.

Mcollective is the quick run to the store to pick up a wine to compliment dinner.

The food is great, but the wine puts it over the top and the wine, while certainly nice by itself, lacks the foundation of a good meal.

Mcollective now handles deploys of software, monitoring checks, audits and many other functions on my company’s infrastructure when immediate action is required and is itself installed and configured by Puppet.  It does require a significant upfront investment in time and a change in the way you think about processing requests, but is, in my opinion, necessary to grow your infrastructure and be responsive to business needs.

* for speed reference on part of my company’s infrastructure I can run approximately 1736 monitoring checks over 129 hosts in the following time

Finished processing 129 / 129 hosts in 3411.43 ms

How to monitor like a grown up

Comments Off on How to monitor like a grown up

Go to your monitoring system right now.

What color is everything?

If you even have to look you need to rethink how you’re monitoring

The answer is that everything is green or acknowledged.

Here are my rules for making monitoring useful again by monitoring like an adult.

Monitoring is configured automatically

Monitoring configurations should be generated on the fly when a node is added to the pool of available servers. It helps if servers can be tagged as non-operational so that they do not alert until they are added to the pool. I prefer configurations generated by a configuration management system that’s version controlled to just version controlled hand edited files.

Stats collection and monitoring should be separate

Move the statistic collection jobs out of monitoring and let them be handled by something else specific to that task. This makes it easier to turn off any checks that cause trouble on either part of the system.

Checks are periodically culled based on usefulness

“If it can’t stay green, it’s gone”. This step is probably the second most useful of all. Nothing defeats the purpose of monitoring more than false positives. Alerts start getting ignored if a specific check gets a reputation as less than reliable. This slowly undermines confidence in the system as a whole. I have seen many environments where a system was implemented, then utterly ignored to the detriment of all.

End-to-end checks are only useful and should not be implemented until every step along the way is already monitored

While end to end checks can be very useful if there is no way to figure out why they are failing they can drive some extremely poor decision making. People tend to latch on to a few key metrics and drive decisions from those they see frequently. If the end to end check slows down because the monitoring box is out of memory then all of the nodes you throw into the service cluster will not improve. Make sure that every step along the way is monitored and has performance data or you will end up repeating some of my most regrettable moments. Push hard to implement them last.

Dependencies, escalation paths and response times are clear and reasonable

This is the softest and most important bit of monitoring. If there is a NOC then it should be clear how to escalate issues otherwise if one of four nodes is down and your end-to-end check is fine handle it at a reasonable hour. Also, the development team needs to be pat of the escalation procedure. Handling non-critical services in a non-critical manner means that there is more energy to be focused on revenue impacting outages.

Like many worthwhile things you only get out of monitoring what you put into it. View it as an investment into helping your and your team pinpoint problems quickly and efficiently not as a means to CYA.

Puppet – so now what?? (Part 1 – Git It)

1 Comment »

Keep your puppet manifests under some sort of source code management.
There I said it.
Rolling back will save your bacon at least once, probably more than that.

Here’s how you setup puppet under git on a remote host.

Install gitosis ( a tool for easily managing git repos) – The installation creates user gitosis with homedir /srv/gitosis

sudo apt-get install gitosis
cp ~/.ssh/ /tmp/.a
chmod 644 /tmp/.a
sudo su - gitosis -c "gitosis-init  < /tmp/.a"
rm /tmp/.a

Generate an ssh key as the user puppet on your puppet master

sudo su - puppet

Gitosis is configured as a git repository, so checkout the admin repo and add in the puppet user and the puppet project.

git clone gitosis@:gitosis-admin.git
cd gitosis-admin/
cat << EOF  >> gitosis.conf

[group puppetmasters]
members = chris@chimp puppet@PUPPET_MASTER
writable = puppet

copy over the puppet key /home/puppet/.ssh/ into the keydir and push the changes to git

scp puppet@puppet:/home/puppet/.ssh/ keydir/
git commit -a -m "puppet added to gitosis"
git push

#make sure /etc/puppet is owned by the user puppet on the puppet master

sudo chown -R puppet:puppet /etc/puppet
cd /etc/puppet
git init
git remote add origin gitosis@:puppet.git
git add *
git commit -m "initial add"
git push origin master:refs/heads/master

Finally, add in the following to crontab on your puppet master to make sure changes get checked out every two minutes
*/2 * * * * cd /etc/puppet &&
/usr/bin/git pull origin master:refs/heads/master > /tmp/puppetmaster.log 2>&1

This might seem like a lot of trouble to go to, but it will prove useful in the future,especially when I get to the subjects of environments and testing.

Git will also prove useful when managing large file trees and can also be used as a puppet provider.

Puppet – so now what?? (Introduction)

Comments Off on Puppet – so now what?? (Introduction)

After installing puppet a lot of people ask – “Now what?”

My plan is to write a few posts on things that I have found useful in puppet other than the actual configuration of your nodes.

These are things that have saved me trouble and/or made my infrastructure run more smoothly.

Better EC2 facts for Puppet

1 Comment »

I didn’t like the facts that came with the standard facter for ec2, so I wrote a custom fact plugin returning more detailed information.

It’s available on my GitHub

Sample output is below

ec2_ami_id => ami-cdXXXXXX
ec2_ami_launch-index => 0
ec2_ami_manifest-path => myamis/lenny-XXXXXXX-x86-20101207.manifest.xml
ec2_ancestor_ami-ids => ami-XXXXXXXX,ami-XXXXXXXXXX
ec2_block_device-mapping_ami => sda1
ec2_block_device-mapping_ephemeral0 => sda2
ec2_block_device-mapping_root => /dev/sda1
ec2_block_device-mapping_swap => sda3
ec2_hostname =>
ec2_instance_action => none
ec2_instance_id => i-XXXXXX
ec2_instance_type => m1.small
ec2_kernel_id => aki-XXXXX
ec2_local_hostname =>
ec2_local_ipv4 =>
ec2_placement_availability_zone => us-west-1b
ec2_profile => default-paravirtual
ec2_public_hostname =>
ec2_public_ipv4 => XXX.XXX.XXX.XXX
ec2_ramdisk_id => ari-XXXXXX
ec2_reservation_id => r-XXXX
ec2_security_groups => default,application1,application2

Boys in J-town

Comments Off on Boys in J-town


Time for New Year’s shopping

Puppet Syntax Highlighting

Comments Off on Puppet Syntax Highlighting

To get nicely formatted Puppet code in your blog like the following:

class test1 {
  file => "/tmp/test":
    ensure => present,
    owner => "chris",
    require => Package["test1"],

1) Install the WP-GeSHi-Highlight plugin in your word press

2) Find your geshi directory

find /usr -name systemverilog.php

3) Down load the puppet.php syntax file to the same directory


4) To use add the following tag before your puppet code
<pre lang=”puppet”>
and close with