During my 10% time, I created two simple clojure tools to aid in basic sysadmin tasks. Today I’m open sourcing them on github and clojars.

parallel-ssh

The first tool I built is a library for running commands in parallel on multiple servers. It takes a BASH command and a csv of server names to run the command on. Internally, one clojure agent is spawned per server and each agent is responsible for running the command and storing the result. After all the agents have completed, or a specified timeout is reached, the agents are dereferenced and their output is returned. Currently, I just shell out to run ssh and I make the assumption that password-less login is available.

This library also has a command line interface:

 

I found it useful to wrap that in a BASH script that would run a command on all of our servers. This is clearly not a replacement for sophisticated server management tools like puppet, but it is helpful when you quickly want to an answer a question such as: “How much disk space is free?” or “How many servers is this process running on?”, etc..

server-stats


The second tool is a micro-framework built on top of parallel-ssh. Similar to python’s fabric, it allows the user to define custom commands to be run on a specified group of servers. It also has the capability to respond to the results of the command run based on custom triggers. Here’s an example configuration file:

First we define the ssh username to be used and our server groupings:

 

Now we can start adding commands. Here we add a command called ‘top’ that will be run only on web-servers and app-servers:

 


Note that the doc string is used in the auto-generated usage page, so you should never have to open the config file to figure out what a command does. We can now run this command from the command line:
We can make things a little more interesting by adding alert triggers. First we need to define an alert handler function. An alert handler takes 3 arguments: the alert message, the name of the server, and the output from the command that was run. Here we add a handler called ‘email’ that will send us an email when a trigger condition is met:
Now lets define a command and trigger that will use this alert:
This command has an extra field called ‘alerts’; this is an array of trigger conditions for this command. The command ‘disk’ only has one trigger, which states “when the Use% column of ‘df -ah’ is greater than 85%, send an email with the message ‘Disk space over 85% full’”. Heres a breakdown of an alert:

  • ‘column’ is used for commands that return column-formatted output (eg. df, iostat, top), and it instructs server-stats to look at a specific column for the value. If it is not specified it will assume the command output is a scalar value.
  • ‘value-type’ tells server-stats how to parse the command result string in to a clojure value. Right now there are only three possible value types: percent, bool, and number.
  • ‘handlers’ is a vector of alert handlers to call when this condition is met. In this case, it is just the email handler.
  • ‘msg’ is the alert message that gets passed as the first argument to the handler function.
  • Finally, ‘trigger’ actually defines the condition that has to be met. It is a tuple which has a Boolean operator and a value to compare against.

Additionally, you can define a global function to be called whenever a command can not be successfully completed for some reason (eg., server timesout). Here we send a text message using Twilio whenever that happens:


 

 

conclusion

Currently this is just used for basic server monitoring, but this could easily be used for much more advanced reactive behavior. Building this in clojure was a lot of fun and pretty easy since clojure has macros, easy to use concurrency, higher order functions, and full access to java libraries. The one downside of using clojure on the command line is you have to eat the JVM startup time on every run.

resources

http://clojars.org/server-stats

http://clojars.org/parallel-ssh

https://github.com/RJMetrics/Parallel-SSH

https://github.com/RJMetrics/Server-Stats



Tags:
  • http://clojure.lang.dk Karsten Lang

    Did you remove all the code examples from this article?