Do you have Fun at Work?

What sort of question is that? I don’t know about you, but I feel privileged and happy that I enjoy what I do at work, and would even go as far to call it fun. I want to be remembered for making work fun for everyone.

(Read more...)

Security Groups

I ran into a couple of problems this week involving security groups A security group acts like a virtual firewall on your instance. It controls what traffic enters and leaves and is attached to an instance on start.

(Read more...)

Getting authenticated with Mongo

The challenge this week was to find out why the authentication appeared to be broken on the automated mongodb build. Several weeks ago I had written a puppet module to build a mongodb cluster using a number of arguments, like number of nodes, nodenames, certificates, etc. Despite having certificates generated from a CA (Certificate Authority), and the certificate with the client to log on, this user could do anything. and .auth() was not needed.

mongo admin --ssl --sslCAFile /etc/mongodb/ssl/mongoCA.pem \  
    --sslPEMKeyFile /etc/mongodb/ssl/mongo1.pem \
    -u mongoReadony -p mongotest --host mongo1

In the /etc/mongod.conf file, security clusterAuthMode: x509 was set, but security.authorization: was disabled It was assumed that specifying net.ssl.mode was enough and the security.authorization setting would be ignored. Sorry, false assumption.

(Read more...)

The Importance of testing backups

Another incidence of a tired admin fixing an outage to cause a bigger outage isn't news as such, however I have to hand it to gitlab with their open honesty about this weeks incident.

After a spam storm created serious (4GB) replication lag on the firms postgresql database cluster, to fix the replication a very very tired on-call team-member then deleted the data folder on the active rather than the replicating server.

The full incident is documented here

I embrace the honesty that they have shown as this enables the whole community to learn from this and offer better services to our clients. This is very much the message in Black Box Thinking by Matthew Syed. Matthew describes the difference between closed cultures where mistakes are hidden vs an open hostest culture where mistakes are open and much learning and prevention occurs as a result.

As shown by the support on Twitter the DevOps and cloud reliability engineers agree.

Lessons so far? Test your backups, you never know when you will really need them.

With my ethos about servers being disposible, I love destroying and rebuilding servers, to prove in any Disaster Recovery situation, the service can be restored. This relies on well designed recovery processes and code, keeping the focus away from avoiding failure, to focus on embracing failure and reducing the mean time to recovery.

(Read more...)

Wordpress to jekyll

Publishing a blog with Jekyll

I had an idea. Why not publish the blog using jekyll and host it on AWS S3? Working with Puppet and ruby, I’m already very familiar with gems and getting jekyll working on my windows 10 workstation with RubyMine was relatively easy.

(Read more...)