Jeff Longland

Relax, don't worry – have a home brew!

Archive for July 2012

#OSCON – One tiny daemon to harvest your server statistics (and more)

leave a comment »

  • Presenter: Brandon Philips, Rackspace
  • External monitoring: ping, nmap, curl. Lest we forget nagios, ganglia, cacti, zenoss, noitd
  • Internal monitoring: scripts gathering stats, pushing to a monitoring server
  • Beware cron’d monitoring agents – not going to help you much during panic conditions.  Whereas a daemon, you might be able to communicate with it.
  • Overview: agent runs in the cloud, with TLS connection to a stud in each datacentre
  • Rackspace’s monitoring agent is Virgo
    • Designed for low memory usage
    • Simple secure ‘proxyable’ protocol
    • High level scripting language
    • Designed to be statically linked
    • Avoided C++, used Lua instead (JavaScript’s long lost Brazilian cousin)
    • SSL + JSON newline delimitted + JSONRPC-esque
    • libuv
    • luvit
    • luajit
    • Lua code lives in a single zip file. Makes it easy to do upgrades and reduces presence on a client’s host.
    • Sigar for gathering system metrics.
  • Virgo agent uses ~5mb memory, less than python (~9mb), ruby (~15mb) or c++ (~20mb)
  • Slides posted here

Written by jlongland

July 20, 2012 at 12:27 pm

Posted in OSCON 2012

Tagged with

#OSCON – Assholes Are Killing Your Project

leave a comment »

  • Presenter: Donnie Berkholz, RedMonk
  • Based on experiences working with Gentoo.
  • Community is critical because reputation affects your ability to attract more contributors.
  • Ohloh is a great tool for quantifying and visualizing number contributors.  You can then correlate drops/peaks to events within (and external) to your community.
  • On the Gentoo project, after the assholes were kicked out of the project – things stabilized.
  • But even after removing these people, the reputation is still out there.  Years later, people still ask “But isn’t there a problem with this community?”
  • The importance of fun is frequently mis-understood.  People work on projects because they want to have fun.
  • Conflict is good.  Ad hominem attacks aren’t.  Intel, for example, has a training program about how to argue/debate/fight constructively.
  • The Debian mailing list for example, used to have a few assholes.  People would subscribe, read a few posts, then leave – because it wasn’t fun.
  • How to find assholes?  Look for patterns.
    • ‘Cookie licking’ – who is the person who licks a cookie and puts it back on the table, so no one will touch it.  Happens all the time in open source projects.
    • Politicians – people who talk about the ‘community good’ when in fact, it’s about their personal benefit.
    • Backchannelers – people who are subversive in the background.
  • How do you test whether someone’s an asshole?  Two tests:
    • Do they feel oppressed, humiliated, de-energized or belittled when you approach them about their behaviour?
    • Does the asshole target those less powerful?
  • How many good interactions does it take to cancel out one bad one?  On average 5 good : 1 bad
  • Check out ‘The No Asshole Rule‘ for one of the best books on the subject.
  • Assholes have an unequal impact.
  • The diversity of your community affects the extent to which assholes have impact.  Generally, men tend to fight.  Women tend to leave – ‘screw this, I’ll go do something else’.
  • 25% of people targeted by assholes, will leave.  Worse still, 20% of witnesses will leave – which has a tremendous impact when your entire community is a witness via mailing lists / IRC.
  • Cascading effects amplify the problem.
  • Unfortunately, assholes tend to attract more assholes.  In a large community, this can form in pockets and quickly spread throughout the project.
  • Measuring the cost:  TCA.  Total Cost of Asshole = Team lead + dev relations team + project leadership + recruiting & training new devs + targets and witnesses
  • Being a good participant in a project, means being co-operative and collaborative.
  • How to deal with assholes?
    • You cannot avoid it.
    • You need to talk to them, like they didn’t mean to cause this problem.  They’re trying to accomplish something and may not realize they’re being an asshole.
    • Guide them to the right types of behaviour.
  • Personal interaction is key.  You need a conference or user group meeting – something to get people face-to-face.
  • You need a way for people to report problems – and then act on them.
  • But you may reach a point where something more needs to be done.  We are not psychologists or therapists.  Sometimes people have to be told to leave.
  • How do you prevent asshole problems?
    • Culture is hard to change.
    • Reputation is hard to change.
    • The best thing to do, is be quantitative.  Get numbers.  Show that the social things have metrics: mailing lists, forums, community events, etc.
    • Keep your standards high. Not just technically, but socially. You’re not there to have a bad community, so make sure everyone meets your standards.
    • People can learn to write better code easily.  Learning to be a better person, is much harder.
    • Provide expectations.  At Gentoo, there’s a 3 strikes rule with escalating consequences – ending with dismissal from the project.
  • Bottom line – dealing with assholes isn’t worth it.
  • Audience comments and insights:
    • Break your project into small pieces to prevent the cookie-lickers.
    • Avoid social single points of failure.
  • Book recommendations:

Written by jlongland

July 20, 2012 at 11:40 am

Posted in OSCON 2012

Tagged with

#OSCON – Wrangling Logs with Logstash and ElasticSearch

leave a comment »

  • Nate Jones, David Castro from Media Temple
  • Per week, approximately 1.8 TB uncompressed log data for their mail servers
  • Need to make log access easy for front-line support and ops team
  • Architecture: logstash-agent on each host pushes to RabbitMQ, then to elasticsearch, and searched using Kibana
  • logstash groks the logs, then mutates to JSON
  • Prebuilt patterns allow you to extract more than you’d necessarily get with regex
  • elasticsearch head helps monitor performance of the shards and allows you to browse the data directly, but it’s not front-line support friendly
  • Kibana provides a user friendly front-end for elasticsearch
  • logstash can output gelf for Graylog2.  But not the best approach for Media Temple since they use RHEL and package dependencies can be a PITA.
  • Kibana support streaming, allowing for real-times searching/monitoring – ie. as a user is doing something, support can be watching the events come in.
  • logstash hasn’t done much to reduce the size of logs, in fact, they’ve increased by ~50% – but it’s worth it for all the benefits.
  • No more grepping logs for hours. Couple of minutes and you have everything you need.
  • Using statsd to push log metrics into Graphite for visualization
  • Keep ~7 days data (for mail).  Compressed copies of raw logs, should they ever need something historical.
  • Have kindly provided a VM to kickstart everyone with logstash, rabbitmq, kibana, etc: logwrangler.mtcode.com

Written by jlongland

July 20, 2012 at 11:01 am

Posted in OSCON 2012

Tagged with

#OSCON – AMQP in Production: Building a High Performance Compute Cluster

leave a comment »

  • Presenter: Nicholas Silva, Box
  • Not going to discuss how they decided on RabbitMQ at Box.  But briefly: low latency, high throughput, high reliability, open source, and AMQP libraries for pretty much every platform.
  • Topic exchanges allow for more complicated routing to queues, rather than pushing directly to a queue.
  • Demo of how easily you can get up and running RabbitMQ – yup, that’s easy.
  • For PHP folks making long-poll requests, take a look at rabbitmq_shovel and just run another rabbitmq instance on the webserver.
  • Monitoring with Nagios plugin for rabbitmq
  • Metrics and trending with OpenTSDB
  • Using an exchange for analytics info
  • Over the last year, have introduced clustering and HA queues
  • Need to distribute over more datacentres? Federation Exchanges and Shovel
  • Worker crash recovery mostly works…  but if you have a huge job that kills a worker, it’ll kill the rest of the workers too.
  • Messages are opaque, so you can’t see into them – which you may find problematic at times.

Written by jlongland

July 20, 2012 at 10:37 am

Posted in OSCON 2012

Tagged with

#OSCON – Develop and Test Configuration Management Scripts with Vagrant

leave a comment »

  • Presenter: Mitchell Hashimoto
  • Gave a similar talk at Velocity 2012, but it was a much larger time slot.
  • From the ops perspective
    • Testing configuration management changes is a repetitive, inefficient process of deploy, test, deploy, test…
    • The feedback loop is slow and it discourages incremental development.  Further, it’s not real automation.
  • From the dev perspective:
    • It’s unlikely that you have a setup script for your local sandbox.  So you’re installing in a different way than production.  The configuration gap will be painful later.
    • And what if devs want to work on different platforms than what’s used in production?
    • Sure, you could try an uber readme with instructions on how to get things setup…  but it’s error prone and requires a lot of maintenance.  Not to mention, people will get apathetic about following the instructions.
    • To boot, devs aren’t too keen on spending their time setting up environments – they want to develop.
    • All in all, productivity failure.
  • Development setup is an operations problem.
  • When you’re paying developers, it’s grossly expensive and inefficient to have them stuck in this mess.
  • A solution, is to use local VMs.  Developers can use whatever platform they like and ops can model the complexities of the production environment.
  • And….  no more (or fewer) “it works on my machine” type statements.
  • Mitchell’s recommendation is Vagrant (likely, given he’s the author).
  • Configured properly, all a developer need sis one command: vagrant up
    • Create VM
    • Configure VM
    • Configure networking and file system
    • Provisions the guest
  • VMs exist in the context of a Vagrant ‘project’, ie. vagrantfile
  • For automated provisioning, Vagrant supports shell, Puppet, Chef and coming soon – CFEngine.
  • vagrant provision makes the iterative testing of configuration management changes sooo much easier.

Written by jlongland

July 19, 2012 at 5:42 pm

Posted in OSCON 2012

Tagged with

#OSCON – Performance Tuning with Cheap Drink and Poor Tools

leave a comment »

  • Presenter: Kirk Pepperdine, JClarity
  • Why is my application slow?
    • Duh, you’re waiting for something.  Possibly resource oversubscription.
    • Application profilers are good at finding out what your application is doing…  but if the app isn’t doing anything, it doesn’t help much.
    • So you need to broaden your toolset.
    • You need to understand what’s happening within the JVM.
  • Conceptual Model
    • Actors – with usage patterns
    • Application – locks, external systems
    • JVM – memory, process
    • Hardware/OS – CPU, memory, disk i/o, network
  • Things We Need
    • Environment that is just like production
    • Monitoring
    • Data (feed the beast!)
    • Test harness
  • You need to be careful that you don’t get lead in the wrong direction when analyzing a performance issue.
  • Beware database caching in your test environment – you won’t be testing with the same execution plans you see in your production environment.
  • You need to have production levels of data and users.
  • If the app is slow and CPU is not 100%, what is keeping the task out of the CPU?
    • Execution profiling
    • JVM: memory profiling
    • OS: bad app/OS interaction -> resource monitoring
    • Nothing: thread starvation -> thread inventory
  • Tools

Written by jlongland

July 19, 2012 at 4:58 pm

Posted in OSCON 2012

Tagged with

#OSCON – Scaling Community by Nurturing Leaders

leave a comment »

  • Presenter: Meghan Gill from 10gen (ie. MongoDB)
  • Fostering community is a core goal at 10gen.
  • Identify core people in the community who can broadcast your message – ‘MongoDB Masters’
  • MongoDB Masters get early access to features, news, and other developments.  They’re the go-to group of community leaders.
  • Messages are more powerful coming from a person, rather than a company.
  • Empower and encourage users to share their stories (prizes help)
  • Mongo DB User Groups
    • Economically, there is a ROI in nurturing leaders.  But there isn’t really a playbook for executing a community strategy.
    • Comparing costs of a user group ($3,575) vs a trade show ($9,725) – it makes more sense to go with a user group, especially when you look at the value of the return.  To boot, the returns persist longer than a trade show.
    • A user group can eventually become self-sustainable.  So up-front, you have to make a bit of an investment, but longer term you’ll see better returns.
    • 10gen organizes a lot of conferences.  Some cities don’t get much in the way of tech conferences, so this can raise quite a bit of excitement.
    • After the conferences, 10gen watched for the people who arrive early, stay late, actively participate – these are the people you want running your user group.
    • 10gen offers financial and logistical support (meetup.com fees, pizza, finding meeting locations, etc.)
    • Best practices are key.  Guides for running groups; getting other user group leaders to share their experiences on blogs; etc
    • Tools
      • Meetup.com account
      • Mailing lists for organizers
      • Swag
      • Docs
    • Currently: 50 user groups and 8,000 members.
    • Looking at developing regional user group coordinators given the growing size of their user groups.
    • 10gen looking to engage the user groups in testing activities or min-development sprints.  But generally, the groups are focused on driving adoption.
  • MongoDB Masters
    • Inspired by Microsoft MVP
    • Started small w/ 30 people
    • Held a Masters Summit to meet face-to-face, an unconference
    • The Summit generated a lot of excitement, but it was hard to maintain that momentum.
    • Lessons learned
      • Keep inviting new people to keep ideas fresh
      • Organize sub-groups over Google Hangout and IRC
      • Regional events and leader meetings

Written by jlongland

July 18, 2012 at 3:07 pm

Posted in OSCON 2012

Tagged with