Chris's Blog

Devops Shokunin

Opensource Infrastructure Revisited

Comments Off on Opensource Infrastructure Revisited

In a previous article, I detailed the open source projects that I used to implement a PaaS infrastructure.

Since that time the number of instances in the infrastructure has grown by 2.5X and several of the components needed to be rethought.

Capacity/Performance Management

Previous: Collectd/Visage
Replacement: Collectd/Graphite
Reasons: The collectd backend was too slow and I/O heavy
Graphite graphs are easily to embed in dashboard applications
Ability to easily transform metrics, such as average CPU across a cluster of servers

Continuous Integration

Previous: Selenium
Replacement: Custom Tests
Reasons: Selenium tests failed too often for undiscernable reasons
False positives slowed development too often

Log Collection

Previous: Rsyslog/Graylog2
Replacement: Logstash/ElasticSearch/Kibana
Reasons: Mongodb too slow in EC2 for storing and searching

Logstash offers better parsing and indexing of logs with powerful filtersElasticSearch is super fast and scales horizontally on EC2

Kibana is simple to use and allows Developers to quickly find the relevant information

All of these components are easily integrated into our dashboard application

These changes not only allow the infrastructure to scale, but provide APIs that allow easy integration with custom dashboards.

Getting Puppet Stats into Graphite


Graphs are awesome.

At work I provide all kinds of graphs to the front end/support teams and Graphite is rapidly becoming my tool of choice.  In the past, I have relied heavily on RRD.  However, the easy to use front end, scalability and ease of data injection into Graphite is unparalleled.

Since puppet is such a large part of my infrastructure, I want lots of graphs to glance over in the event I think there is a problem.  Puppet outputs stats to syslog and instead of changing that I decided to pull syslog into Graphite.



My company has allowed me to make this code publicly available on our GitHub site, with a short explanation on how to make it work.  It uses Ruby and EventMachine to handle all of the requests and has an example of some simple calculations that can be done to aggregate data.