Saturday, April 5, 2014

Fixing a problem with s3fs, updatedb and high S3 bills

S3FS is a very convenient piece of software. It lets you easily keep all your data in "the cloud" without the possibility of ever overfilling. It is pretty good for things like back end backups copies and infrequently accessed files, but I would never advise for it to be used in conjunction with a web application that may actually serve the files from the mounted resource.

In  continuous effort to get our AWS bill as low as possible, I've stumbled upon an extraordinary number of Tier2 and Tier1 request on one of our Regions.
Out S3 buckets had over 400 million Tier 1, and almost 1 million Tier2 requests per month which I thought is odd, as we paid more for accessing S3 than for storing several TB of data in it.

The problem

As our AWS infrastructure is quite modest compared to some, there is no way we could have as much traffic to generate this amount of S3 usage.

Amazon Simple Storage Service EU-Requests-Tier1
$0.005 per 1,000 PUT, COPY, POST, or LIST requests - 86,694,315 Requests

Amazon Simple Storage Service EU-Requests-Tier2
$0.004 per 10,000 GET and all other requests - 447,163,447 Requests

Looking at our traffic usage on our monitoring TV, powered by ElasticSearch and Kibana (very cool) I could quickly see that 447mln requests is more than we get in 6 months on our servers. There had to be another explanation for this, than just plain traffic.

The search

Amazon gives you a few possibilities to drill down into your usage of their resources. The easiest would be to look at usage reports, under Billing and Cost Management. You get a CSV file in which you can easily check, which actual buckets are responsible for the high usage on a per-month, per-day and per-hour basis.

Looking at the usage patterns on the buckets, it was very quickly apparent, that 95% of the usage was on 2 buckets, used by a legacy perl application, which in order to connect to S3, unfortunately had to use S3FS.

After googling around a bit I found the culprit, it was updatedb ! The indexing process had hammered s3fs with indexing a 4TB bucket every day, all day, for who knows how long...

The solution

On our MAPS cluster, I added 2 lines in the /etc/init/updatedb.conf file, one with the path to be excluded, the other one with the file system:

PRUNEPATHS="/tmp /var/spool /media /home/.ecryptfs /mnt/s3"

PRUNEFS="NFS nfs nfs4 rpc_pipefs afs binfmt_misc proc smbfs autofs iso9660 ncpfs coda devpts ftpfs devfs mfs shfs sysfs cifs lustre tmpfs usbfs udf fuse.glusterfs fuse.sshfs curlftpfs ecryptfs fusesmb devtmpfs fuse.s3fs"

The last step was baking new AMI's on the cluster and launching a new one that was to replace the faulty one. I would also suggest to check in a couple of hours the hourly based s3 usage report if it has gone down.

The other thing you absolutely have to do is to enable tag based billing and programmatic access, as it will work for the months after you have enabled it. This will help to get insight into billing in the future with tools like Netflix's Ice.

Hope you enjoyed this solution.