Chris's Blog

Devops Shokunin

How-To: Mcollective/RabbitMQ on Ubuntu

Comments Off on How-To: Mcollective/RabbitMQ on Ubuntu

1 ) Install RabbitMQ Prerequisites

apt-get install -y erlang-base erlang-nox

2 ) Install RabbitMQ from the Download Site

dpkg -i rabbitmq-server_2.8.1-1_all.deb

3 ) Enable the stomp and AMQ plugins

rabbitmq-plugins enable amqp_client
rabbitmq-plugins enable rabbitmq_stomp

4 ) Create the rabbitmq config file in /etc/rabbitmq/rabbitmq.config

%% Single broken configuration

  {rabbitmq_stomp, [{tcp_listeners, [{"", 6163},
                                     {"::1",       6163}]}]}

5 ) Create the MCollective Users

rabbitmqctl add_user mcollective PASSWORD
rabbitmqctl set_user_tags mcollective administrator
rabbitmqctl set_permissions -p / mcollective ".*" ".*" ".*"

6 ) Restart RabbitMQ

/etc/init.d/rabbitmq-server restart

7 ) Edit /etc/mcollective/client.conf to point to the local host

# Example config

topicprefix = /topic/
main_collective = mcollective
collectives = mcollective
libdir = /usr/share/mcollective/plugins
logfile = /dev/null
loglevel = info

# Plugins
securityprovider = psk
plugin.psk = unset

connector = stomp = localhost
plugin.stomp.port = 6163
plugin.stomp.user = mcollective
plugin.stomp.password = PASSWORD

# Facts
factsource = yaml
plugin.yaml = /etc/mcollective/facts.yaml

8 ) Edit /etc/mcollective/server.cfg to point to the localhost

# Example server.cfg
topicprefix = /topic/
main_collective = mcollective
collectives = mcollective
libdir = /usr/share/mcollective/plugins
logger_type = syslog
loglevel = info
daemonize = 1
registerinterval = 30
classesfile = /var/lib/puppet/state/classes.txt

# Plugins
securityprovider = psk
plugin.psk = unset

connector = stomp =
plugin.stomp.port = 6163
plugin.stomp.user = mcollective
plugin.stomp.password = PASSWORD

# Facts
factsource = yaml
plugin.yaml = /etc/mcollective/facts.yaml

Getting Puppet Stats into Graphite


Graphs are awesome.

At work I provide all kinds of graphs to the front end/support teams and Graphite is rapidly becoming my tool of choice.  In the past, I have relied heavily on RRD.  However, the easy to use front end, scalability and ease of data injection into Graphite is unparalleled.

Since puppet is such a large part of my infrastructure, I want lots of graphs to glance over in the event I think there is a problem.  Puppet outputs stats to syslog and instead of changing that I decided to pull syslog into Graphite.



My company has allowed me to make this code publicly available on our GitHub site, with a short explanation on how to make it work.  It uses Ruby and EventMachine to handle all of the requests and has an example of some simple calculations that can be done to aggregate data.

Tech Talk Slides

Comments Off on Tech Talk Slides

Slides from tonight’s tech talk “Getting to Push Button Deploys


Thanks to everyone who attended.

The Priority 0 Rule

Comments Off on The Priority 0 Rule

Many years ago when I worked for a Japanese shipping company, Taga-san bought a little red rubber stamp that she used that said “urgent”.  She would stamp documents with the “urgent” stamp to try and garner more attention.  Soon almost every document had the “important” stamp, so the diligent Taga-san went out and bought a “very urgent” stamp.  This too began to appear on documents more and more frequently.

It was only when I joked with her that I visited Kinokuniya Stationary store and noticed they were “having a sale on super duper special urgent stamps” and being confronted with the tears of laughter streaming down her co-workers faces that Taga-san backed down with the stamps.


I apologized for the cruel joke with some very expensive cake from a local bakery and she told me that she was often frustrated trying to communicate her priorities to other people.  So for about $5 in rubber stamps and $25 in cake we learned a valuable lesson – setting priorities is difficult and communicating them is even more difficult.


In my current position as “Operations Team”, I bounce around many different things everyday.  Communicating my priorities is still extremely difficult.  Inspired by the Ironport Rule 0 : “Don’t do anything stupid”, I have my own Priority 0


Priority 0: Production Works


Simple, but not easy.

Priority 0 is only one thing and it never changes =>  Our production environment is running and earning revenue for the company.

  • All requests get dropped at a moments notice for any Priority 0 issue.
  • All projects get delayed for Priority 0 issues.
  • All issues are resolved after Priority 0 issues.


Whiz-bang new feature, new flavor of the month data store test server, staging issues all have to take a back seat to Priority 0.


Priority 0 costs vary by company.  I’ve worked at places where Priority 0 cost 90% of time and places where it’s been as low as 20%.  It’s never free and rarely taken into consideration, but setting it and making sure it’s understood by others is critical.


Taga-san, moshiwake gozaimasen



Load Balancing Puppet with Nginx


Due to the holidays, I’ve had to add a large number of new nodes to our infrastructure. This started putting too much CPU and memory load on the puppet master. Instead of moving to a larger instance, I looked to spread out to multiple boxes.

This presented the problem of how the ops team could run tests against their own environments, how to handle the revocation and issuance of certs and keeping the manifests on the backends in sync.

Using nginx as a software load balancer solved all of these issues.

After talking with an ex-collegue ( I owe you some ramen eric0 ) I took a closer look at the URL paths being requested by the puppet clients.

Certificate requests start with /production/certificate so get routed to the puppet instance that only serves up certificates. - - [14/Nov/2011:20:02:03 +0000]
  "GET /production/certificate/ HTTP/1.1" 404 60 "-" "-"

Each ops team member has their own environment for testing and the URLs start with the environment name - - [14/Nov/2011:17:24:02 +0000]
 "GET /chris/file_metadata/modules/unixbase/fixes/file.conf HTTP/1.1" 200 330 "-" "-"

Everything else gets routed to a group of puppet backend servers.

The full nginx.conf file is available from GitHub.

Configurations are tested on the ops dev server then checked into a git repo that is pulled by all of the puppet backend servers.

Mcollective Use Case – Operational Dashboard

Comments Off on Mcollective Use Case – Operational Dashboard

I’ve been asked a few times about use cases for mcollective.


One of the biggest wins at my company has been using mcollective to build “Oppy” an operational dashboard. Oppy allows developers and support staff to perform deploys on staging servers as well as to audit and monitor client environments in real time.  Developers and Support staff do not have access to production or staging environments for a variety of reasons, so it was necessary to provide a tool that could quickly and efficiently provide all of the information and access that these teams require.





  1. Deployment – Developers can deploy code to staging environments by connecting to an mcollective agent that installs the latest gem packaged version of our software, cleans out the old versions and restarts the application.
  2. Auditing – Support and Developers can run auditing scripts on all nodes in a certain class and check to make sure that software versions, monitoring settings and software settings are as expected.
  3. Corrective Actions – Support and Developers can flush the varnish caches by triggering a run of a varnish agent or reset the application on demand.
  4. Debugging – Developers can run an agent to turn on debug logging for the application and collect the logs by pulling them from the centralized logging server.




Oppy was built on top of Sinatra, which allows for extremely rapid development of basic web applications.   A simple example of using Sinatra to run mcollective agents is on my Github account.

The above screen shot is the results of running the nrpe agent, which runs every single monitoring check on a group of hosts and reports back.

All actions are logged, so that there is an audit trail


Plotting Time Series Data with Gnuplot


When dealing with external customers and non-technical people I find it beneficial to provide some sort of visual representation.

Dumping a ton of data on people rarely conveys the message effectively.

My go to tool for generating graphs, especially of time-series data, is gnuplot.  It’s free, flexible and runs everywhere.

Data files httpa.reqs and httpb.reqs are comma separated with the first column as time in epoch seconds and the second the captured value.


create the following file and save as http.gnuplot

set datafile separator ","
set terminal png size 900,400
set title " Web Traffic"
set ylabel "Requests per second"
set xlabel "Date"
set xdata time
set timefmt "%s"
set format x "%m/%d"
set key left top
set grid
plot "httpa.reqs" using 1:2 with lines lw 2 lt 3 title 'hosta', \
     "httpb.reqs" using 1:2 with lines lw 2 lt 1 title 'hostb'

Generate the graph and save it

gnuplot < http.gnuplot  > requests.png

The break down of the lines is as folows:

set datafile separator ","

set the field delimiter to a comma , the default is a space. Just do not include this line for space separated data.

set terminal png size 900,400

Have gnuplot output a PNG file with the size specified. You can also run gnuplot in interactive mode where you do no need this line.

set title " Web Traffic"

Set the title of the graph at the top.

set ylabel "Requests per second"
set xlabel "Date"

Always label your graph so that when it gets passed around to people unfamiliar with the history of the request it is readily apparent to what is going on. This will also keep my high school math teacher quiet.

set xdata time

Tell gnuplot that the x axis will be time data. This allows for more flexible time series manipulations.

set timefmt "%s"

Let gnuplot know what format the time string will be in. “%s” is epoch seconds, “%m/%d/%Y:%H:%M:%S” will read 01/28/2011:00:01:14 in as a time value. Not having to convert date formats with some script first on my data is one of the big wins from using gnuplot.

set format x "%m/%d"

Set the date output format for the x axis.

set key left top

Set the legend to the top left corner.

set grid

Turn on grid lines, they are off by default.

plot "httpa.reqs" using 1:2 with lines lw 2 lt 3 title 'hosta', 
"httpb.reqs" using 1:2 with lines lw 2 lt 1 title 'hostb'

1:2 is the field numbers. The first is the x-axis value and the second the y-axis values.

“with lines” uses lines instead of simply data points.

lw is the line width, with 1 being the default.

lt is the line type or color, gnuplot will pick colors for you automatically if you do not specify them.

title is for the legend.

You can plot multiple data sources on the same graph.

The gnuplot documentation is available here

An excellent gallery of the amazing capabilities of gnuplot is here

Packaging – Deploying Ruby Applications Like an Adult – Part 2

1 Comment »

Continuing from Part 1

Build gems!


It’s not that hard and your efforts will be rewarded.

Here are my arguments for learning packaging


What’s running on your system now?

When you’re running hundreds of servers you need a programatic way of auditing what is running on your system. Compiling from source will not give this to you for every single package on your system. Git is wonderful as a source code management system but did a4f85e72894895a8269d65cb3fa2ab012804d3ef come before or after aa7c72e6a15ae37db7beb6450f4db3d30069a7dd and what developer or product manager would be able to give you a git hash as to the version they want running on production? Even with tags going back and forth is hard.

Are all of your dependencies met and consistent?

What if some dependency of a dependency is updated causing a bug? Deploying from source code and running bundler to handle dependencies means you might have different gem versions running between the time you brought up the original server to when you added a new node into the cluster. It happens and it is very time consuming to troubleshoot.

How long does it take you to deploy an application?

Takes me 20 seconds to release across a 100 node cluster. It can take up to 10 minutes to download and install all of the dependencies on my old system and then there are plenty of failures due to network issues or rate-limiting from the upstream server. Internal and external customers don’t do delayed gratification.

Can I give a gem version to a developer and be sure they’re running what’s on production so they can troubleshoot?



I’m still waiting for a good argument against packaging.

There are excellent gem tutorials available

Example Gemspec file for building

Before building the gem, I take another step and use

bundle install --deployment

this downloads all of the gems and compiles all of the extensions necessary to run the gem in the vendor directory. Now when you start your application with

bundle exec START_COMMND

it will use only those gems in the vendor folder. You can view the full Rakefile here

Deploying Ruby Applications Like an Adult

Comments Off on Deploying Ruby Applications Like an Adult

“Push button deploy” is something that is often hear people requesting or mentioning as something they would like to have. What’s more important, in my opinion, is to provide a reliable and scalable system for both internal and external developers to deploy code to the staging environment for clients to QA. Staging deployments should be as simple as possible. Production releases are slightly more complicated as Operations needs to carefully monitor and coordinate with external parties, but should still use the same base system.

Larger Image

Requirements for a deployment system


  • deploying a package to 1 server or 100 servers should take the same amount of time and effort


  • Deploy only sanity checked code.
  • Break loudly.
  • Fit with developer culture.


  • Everyone likes thing to happen quickly.
  • Clients don’t do delayed gratification.


  • What is running right now?
  • How can I trace back to a change in the source code?
  • Is what I think really running?
  • Logs, logs, logs


  • It’s Ops, so no one else will be available at 3am to fix it
  • Have to be able to quickly troubleshoot


  • Requirements will change over time.
  • Owned by operations, so changes can be separate from production releases.


Here is what I came up with:

Larger Image

The criteria for the components chosen is described here
The next posts will go into more detail on individual components.

DNSRR – rewriting DNS for testing web applications

Comments Off on DNSRR – rewriting DNS for testing web applications

When testing web applications, it is often necessary to rewrite DNS entries to avoid XSS Javascript warnings.

Building on Rubydns my company has open sourced a quick ruby script to easily rewrite DNS queries for web testing

Available on Github