Chris's Blog

Devops Shokunin

How-To: Mcollective/RabbitMQ on Ubuntu

Comments Off on How-To: Mcollective/RabbitMQ on Ubuntu

1 ) Install RabbitMQ Prerequisites

apt-get install -y erlang-base erlang-nox

2 ) Install RabbitMQ from the Download Site

dpkg -i rabbitmq-server_2.8.1-1_all.deb

3 ) Enable the stomp and AMQ plugins

rabbitmq-plugins enable amqp_client
rabbitmq-plugins enable rabbitmq_stomp

4 ) Create the rabbitmq config file in /etc/rabbitmq/rabbitmq.config

%% Single broken configuration

  {rabbitmq_stomp, [{tcp_listeners, [{"", 6163},
                                     {"::1",       6163}]}]}

5 ) Create the MCollective Users

rabbitmqctl add_user mcollective PASSWORD
rabbitmqctl set_user_tags mcollective administrator
rabbitmqctl set_permissions -p / mcollective ".*" ".*" ".*"

6 ) Restart RabbitMQ

/etc/init.d/rabbitmq-server restart

7 ) Edit /etc/mcollective/client.conf to point to the local host

# Example config

topicprefix = /topic/
main_collective = mcollective
collectives = mcollective
libdir = /usr/share/mcollective/plugins
logfile = /dev/null
loglevel = info

# Plugins
securityprovider = psk
plugin.psk = unset

connector = stomp = localhost
plugin.stomp.port = 6163
plugin.stomp.user = mcollective
plugin.stomp.password = PASSWORD

# Facts
factsource = yaml
plugin.yaml = /etc/mcollective/facts.yaml

8 ) Edit /etc/mcollective/server.cfg to point to the localhost

# Example server.cfg
topicprefix = /topic/
main_collective = mcollective
collectives = mcollective
libdir = /usr/share/mcollective/plugins
logger_type = syslog
loglevel = info
daemonize = 1
registerinterval = 30
classesfile = /var/lib/puppet/state/classes.txt

# Plugins
securityprovider = psk
plugin.psk = unset

connector = stomp =
plugin.stomp.port = 6163
plugin.stomp.user = mcollective
plugin.stomp.password = PASSWORD

# Facts
factsource = yaml
plugin.yaml = /etc/mcollective/facts.yaml

Load Balancing Puppet with Nginx


Due to the holidays, I’ve had to add a large number of new nodes to our infrastructure. This started putting too much CPU and memory load on the puppet master. Instead of moving to a larger instance, I looked to spread out to multiple boxes.

This presented the problem of how the ops team could run tests against their own environments, how to handle the revocation and issuance of certs and keeping the manifests on the backends in sync.

Using nginx as a software load balancer solved all of these issues.

After talking with an ex-collegue ( I owe you some ramen eric0 ) I took a closer look at the URL paths being requested by the puppet clients.

Certificate requests start with /production/certificate so get routed to the puppet instance that only serves up certificates. - - [14/Nov/2011:20:02:03 +0000]
  "GET /production/certificate/ HTTP/1.1" 404 60 "-" "-"

Each ops team member has their own environment for testing and the URLs start with the environment name - - [14/Nov/2011:17:24:02 +0000]
 "GET /chris/file_metadata/modules/unixbase/fixes/file.conf HTTP/1.1" 200 330 "-" "-"

Everything else gets routed to a group of puppet backend servers.

The full nginx.conf file is available from GitHub.

Configurations are tested on the ops dev server then checked into a git repo that is pulled by all of the puppet backend servers.

Packaging – Deploying Ruby Applications Like an Adult – Part 2

1 Comment »

Continuing from Part 1

Build gems!


It’s not that hard and your efforts will be rewarded.

Here are my arguments for learning packaging


What’s running on your system now?

When you’re running hundreds of servers you need a programatic way of auditing what is running on your system. Compiling from source will not give this to you for every single package on your system. Git is wonderful as a source code management system but did a4f85e72894895a8269d65cb3fa2ab012804d3ef come before or after aa7c72e6a15ae37db7beb6450f4db3d30069a7dd and what developer or product manager would be able to give you a git hash as to the version they want running on production? Even with tags going back and forth is hard.

Are all of your dependencies met and consistent?

What if some dependency of a dependency is updated causing a bug? Deploying from source code and running bundler to handle dependencies means you might have different gem versions running between the time you brought up the original server to when you added a new node into the cluster. It happens and it is very time consuming to troubleshoot.

How long does it take you to deploy an application?

Takes me 20 seconds to release across a 100 node cluster. It can take up to 10 minutes to download and install all of the dependencies on my old system and then there are plenty of failures due to network issues or rate-limiting from the upstream server. Internal and external customers don’t do delayed gratification.

Can I give a gem version to a developer and be sure they’re running what’s on production so they can troubleshoot?



I’m still waiting for a good argument against packaging.

There are excellent gem tutorials available

Example Gemspec file for building

Before building the gem, I take another step and use

bundle install --deployment

this downloads all of the gems and compiles all of the extensions necessary to run the gem in the vendor directory. Now when you start your application with

bundle exec START_COMMND

it will use only those gems in the vendor folder. You can view the full Rakefile here

Deploying Ruby Applications Like an Adult

Comments Off on Deploying Ruby Applications Like an Adult

“Push button deploy” is something that is often hear people requesting or mentioning as something they would like to have. What’s more important, in my opinion, is to provide a reliable and scalable system for both internal and external developers to deploy code to the staging environment for clients to QA. Staging deployments should be as simple as possible. Production releases are slightly more complicated as Operations needs to carefully monitor and coordinate with external parties, but should still use the same base system.

Larger Image

Requirements for a deployment system


  • deploying a package to 1 server or 100 servers should take the same amount of time and effort


  • Deploy only sanity checked code.
  • Break loudly.
  • Fit with developer culture.


  • Everyone likes thing to happen quickly.
  • Clients don’t do delayed gratification.


  • What is running right now?
  • How can I trace back to a change in the source code?
  • Is what I think really running?
  • Logs, logs, logs


  • It’s Ops, so no one else will be available at 3am to fix it
  • Have to be able to quickly troubleshoot


  • Requirements will change over time.
  • Owned by operations, so changes can be separate from production releases.


Here is what I came up with:

Larger Image

The criteria for the components chosen is described here
The next posts will go into more detail on individual components.

DNSRR – rewriting DNS for testing web applications

Comments Off on DNSRR – rewriting DNS for testing web applications

When testing web applications, it is often necessary to rewrite DNS entries to avoid XSS Javascript warnings.

Building on Rubydns my company has open sourced a quick ruby script to easily rewrite DNS queries for web testing

Available on Github

Dashboard Example with Sinatra and Mcollective

Comments Off on Dashboard Example with Sinatra and Mcollective

Having a dashboard to provide real time data to users helps minimize interruptions at work.

The combination of Sinatra handling the incoming HTTP requests and Mcollective pulling real time data from the infrastructure provides the responsiveness and self-service that saves everyone time and effort.

The example code is available on Github

Here are the screen shots running on my internal network.

Welcome Screen

Filtering Form

Results from Monitoring Agent

Results from Puppetd Agent

How to monitor like a grown up

Comments Off on How to monitor like a grown up

Go to your monitoring system right now.

What color is everything?

If you even have to look you need to rethink how you’re monitoring

The answer is that everything is green or acknowledged.

Here are my rules for making monitoring useful again by monitoring like an adult.

Monitoring is configured automatically

Monitoring configurations should be generated on the fly when a node is added to the pool of available servers. It helps if servers can be tagged as non-operational so that they do not alert until they are added to the pool. I prefer configurations generated by a configuration management system that’s version controlled to just version controlled hand edited files.

Stats collection and monitoring should be separate

Move the statistic collection jobs out of monitoring and let them be handled by something else specific to that task. This makes it easier to turn off any checks that cause trouble on either part of the system.

Checks are periodically culled based on usefulness

“If it can’t stay green, it’s gone”. This step is probably the second most useful of all. Nothing defeats the purpose of monitoring more than false positives. Alerts start getting ignored if a specific check gets a reputation as less than reliable. This slowly undermines confidence in the system as a whole. I have seen many environments where a system was implemented, then utterly ignored to the detriment of all.

End-to-end checks are only useful and should not be implemented until every step along the way is already monitored

While end to end checks can be very useful if there is no way to figure out why they are failing they can drive some extremely poor decision making. People tend to latch on to a few key metrics and drive decisions from those they see frequently. If the end to end check slows down because the monitoring box is out of memory then all of the nodes you throw into the service cluster will not improve. Make sure that every step along the way is monitored and has performance data or you will end up repeating some of my most regrettable moments. Push hard to implement them last.

Dependencies, escalation paths and response times are clear and reasonable

This is the softest and most important bit of monitoring. If there is a NOC then it should be clear how to escalate issues otherwise if one of four nodes is down and your end-to-end check is fine handle it at a reasonable hour. Also, the development team needs to be pat of the escalation procedure. Handling non-critical services in a non-critical manner means that there is more energy to be focused on revenue impacting outages.

Like many worthwhile things you only get out of monitoring what you put into it. View it as an investment into helping your and your team pinpoint problems quickly and efficiently not as a means to CYA.

Puppet – so now what?? (Part 1 – Git It)

1 Comment »

Keep your puppet manifests under some sort of source code management.
There I said it.
Rolling back will save your bacon at least once, probably more than that.

Here’s how you setup puppet under git on a remote host.

´╗┐Install gitosis ( a tool for easily managing git repos) – The installation creates user gitosis with homedir /srv/gitosis

sudo apt-get install gitosis
cp ~/.ssh/ /tmp/.a
chmod 644 /tmp/.a
sudo su - gitosis -c "gitosis-init  < /tmp/.a"
rm /tmp/.a

Generate an ssh key as the user puppet on your puppet master

sudo su - puppet

Gitosis is configured as a git repository, so checkout the admin repo and add in the puppet user and the puppet project.

git clone gitosis@:gitosis-admin.git
cd gitosis-admin/
cat << EOF  >> gitosis.conf

[group puppetmasters]
members = chris@chimp puppet@PUPPET_MASTER
writable = puppet

copy over the puppet key /home/puppet/.ssh/ into the keydir and push the changes to git

scp puppet@puppet:/home/puppet/.ssh/ keydir/
git commit -a -m "puppet added to gitosis"
git push

#make sure /etc/puppet is owned by the user puppet on the puppet master

sudo chown -R puppet:puppet /etc/puppet
cd /etc/puppet
git init
git remote add origin gitosis@:puppet.git
git add *
git commit -m "initial add"
git push origin master:refs/heads/master

Finally, add in the following to crontab on your puppet master to make sure changes get checked out every two minutes
*/2 * * * * cd /etc/puppet &&
/usr/bin/git pull origin master:refs/heads/master > /tmp/puppetmaster.log 2>&1

This might seem like a lot of trouble to go to, but it will prove useful in the future,especially when I get to the subjects of environments and testing.

Git will also prove useful when managing large file trees and can also be used as a puppet provider.

Puppet – so now what?? (Introduction)

Comments Off on Puppet – so now what?? (Introduction)

After installing puppet a lot of people ask – “Now what?”

My plan is to write a few posts on things that I have found useful in puppet other than the actual configuration of your nodes.

These are things that have saved me trouble and/or made my infrastructure run more smoothly.

Better EC2 facts for Puppet

1 Comment »

I didn’t like the facts that came with the standard facter for ec2, so I wrote a custom fact plugin returning more detailed information.

It’s available on my GitHub

Sample output is below

ec2_ami_id => ami-cdXXXXXX
ec2_ami_launch-index => 0
ec2_ami_manifest-path => myamis/lenny-XXXXXXX-x86-20101207.manifest.xml
ec2_ancestor_ami-ids => ami-XXXXXXXX,ami-XXXXXXXXXX
ec2_block_device-mapping_ami => sda1
ec2_block_device-mapping_ephemeral0 => sda2
ec2_block_device-mapping_root => /dev/sda1
ec2_block_device-mapping_swap => sda3
ec2_hostname =>
ec2_instance_action => none
ec2_instance_id => i-XXXXXX
ec2_instance_type => m1.small
ec2_kernel_id => aki-XXXXX
ec2_local_hostname =>
ec2_local_ipv4 =>
ec2_placement_availability_zone => us-west-1b
ec2_profile => default-paravirtual
ec2_public_hostname =>
ec2_public_ipv4 => XXX.XXX.XXX.XXX
ec2_ramdisk_id => ari-XXXXXX
ec2_reservation_id => r-XXXX
ec2_security_groups => default,application1,application2