Friday, October 4, 2013

From zero to cloud - guide to setting up a dev process with the Amazon AWS Cloud pt. 1

Hi all,

once again I managed to get some time off to write.

There have been interesting things happening in my career as I had the opportunity to plan and execute an entire companies move from Rackspace to Amazon AWS with a very diverse app base with PHP Symfony 1, 1.4, 2.X apps, Perl (mod perl) apps and Java (Red5) apps, Magento stores, vendor specific tools like Active Collab.

It has taken me quite some time to figure out some problems as there is no real comprehensive thing out there to start easy with the Amazon AWS adventure and the Amazon guides are a bit dry.

To move you closer to the problem domain faced here, I will point out the steps that you need to take to have a complete application development cycle. Later I will hopefully manage to write a post about every step of the way, so it is much easier to understand the big picture, how things work together in an AWS based application.

I will not write about all the problems, that we had to overcome, because of the nature of the applications themselves, as the capacity for an app to be clustered is beyond the scope of this article. The most important rule however is - do not write anything on the server that is important to anyone, either upload it to Amazon S3, some attached webdav or the database (some db's have the option to save files like MongoDB's GridFS).

So the first usual step in the apps development is the actual development no surprises here.

1. Version control and simple collaboration (e.g. code reviews).

GIT and to keep the code, collaborate and make it easy to pull from anywhere. I have actually pushed the company to move away from CVS, which wasn't very difficult as it is frustrating to work with it. Github has a fantastic interface and with the addition of their in-code search it is quite more practical than the dreadful search in Eclipse.

All you need is to register with and follow simple instructions on getting started. We have a pro account with github, so we can keep our repositories private, for small teams you can however go to their competition -, which has a pay-by-users model and lets you have free private repos.

2. Continuous Integration - making sure your code stays in good condition

Jenkins to run test and build. Well that is also one of the easy ones, there are some more sophisticated vendor solutions out there, but Jenkins with some extensions offers everything we needed. I added GIT, SVN, Google OpenID login, so that everyone from the organisation can log in and Apache Maven support.

The other cool thing Jenkins lets you do is have code statistics, like phpmd and copy/paste detection, code style checking and keeping the statistics for later. We also plan on making Jenkins directly fire up testing infrastructure in Amazon AWS via CloudFormation scripts so basically a file describing exactly your deployment and the relationships between machines. For the sake of simplicity and cost (which will come later in the post) just put Jenkins locally or on a traditional server.

3. Figuring out if everything belongs in the cloud

Now you have the means to code, collaborate and make it a really nice product so now you need to figure out which parts of your infrastructure should go into Amazon AWS.

As with everything, the cloud is only the answer to a very specific question, but certainly no to all of it. The true advantage of cloud computing isn't scaling up, as you can do that with normal infrastructures - it's scaling down in minutes. The advantage of being able to scale the number and size of instances is only relevant if you have fluctuating traffic. That means if the differences of usage during the day are quite higher than 1-2 servers of difference (in terms of computing power) than it's a classic case of lets-go-cloud.

There are some applications however, that are not really suited or would be missing the point of cloud computing. We have found that some applications simply don't belong there:

  • Windows based stuff - it's just a question of cost. If  you don't need the scalability for windows boxes, it's far less expensive to put them with a traditional provider, like OVH
  • All utility apps - everything you need to have running in single-server mode, so in other words there will never be need to make 2+ of them, but also making less than 1 would not be good. Every utility app, like Jenkins for instance, may or may not need additional nodes to work with, if it doesn't buy a normal server (or 2 if you like resiliency) and put it on there, make regular backups though. In our deployment Jenkins went onto a normal server.
  • Everything requiring constantly needing hight computing power. Well every app that doesn't need to scale beyond one/two powerful servers and is constantly on-line is IMHO better off on a traditional server. 

What you have to take into account, that the AWS instances, even though even called "large" are nowhere near a £200 per month server, when it comes to computing power and usually costs the same or more. You get however the ability to run 1 tiny instance during the nights and 20 of them during heave days traffic which at the end might save you some major money.

Our traffic is not only fluctuating weekly, daily but is also strongly seasonal, as we are a ed-tech company. During the summer and all school breaks there is just need for a minimum infrastructure.

4. Moving domains to Route53

Route53 is Amazons DNS service. It offers a very nice interface for your dns administration and it's certainly easier to configure than bind9. This step of course isn't necessary, but it is good to move your domains so that you would have "one (key)ring to rule them all". As with AWS you get API access to almost everything so it is much easier to automate if everything is in one place, plus it's a lot tidier than having to log into somewhere else (or several locations) to control your domains.
We found a small Perl script that runs in CRON, that actually checks the instance tags of each instance for a specific tag and creates domains based on that name, so you can have a bit more simple to write name than the usual public DNS AWS instance domain names (e.g. In order for the script to work, you need to have the Amazon CLI tools installed.

This concludes the things you need to know to start getting familiar with AWS. In the next posts I will write about:

  • Getting your team to have access to AWS services with IAM
  • Managing some aspects of your deployments with the AWS Console
  • Setting up and managing your workflows with Netflix's Asgard control panel
  • Deploying your applications to your cloud servers with Capistrano
  • Setting up and testing your auto scaling cloud deployments with jMeter and AWS CloudWatch

Hope you enjoyed reading, See you soon.