Monday, December 28, 2009

How to avoid InnoDB counter row level locking?

Let's think about a situation where a counter gets updated very often, for example the view counter on a very popular New York Times post. Every time a user refreshes the page, a SQL query updates the counter. It would look something like this:

UPDATE `posts` SET view_counter = view_counter + 1 WHERE id = 123

Now on a very busy site there can be problems with frequent counter row locks. If you send a query like that above, what actually can happen is that InnoDB locks the row so often it produces some deadlocks. It is a pretty inprobable situation in small systems hovever in some high-traffic production evironments, there can be problems with such counters getting out of sync. Ther is a very simple and efficient solution to this problem:

  1. Create a table with the first field beeing the counter id/number, second field should be the foreign key to the element you really want to count and of course a field with the counter value (default 0),
  2. For every stressed counter, you feel that could get out of sync, create 10-50 rows ( the number is arbitrary) int the created table with remebering which primary key ids are to what counter.
If you inserted the apropriate keys for a counter, now you need to generate a coresponding random id number before updating. Thre read/write operations are based simple trick. When you have more than one counter rows that count the same thing, you just randomly select one of them and increment it. The benefit of having 20 rows to keep the same counter is that the probability of desynchronization is that much reduced.

Now all that's left is getting the numbers added up. We will do that by using a simple aggregate function - GROUP BY + SUM(). Counting 20-50 rows every time, when you want to show the counter, isn't a big load for the database hovewer you still can write a cron that generates a single counter again for select queries.

Saturday, December 26, 2009

How to build a fast, good scaling user notification system pt. 1

Problem definition:

Most social media sites have something called user notifications. Systems specially designed to generate user actions. Every notification, now matter how insignificant can generate a lot of click traffic. But how about the inside of the system? In school, or some book on SQL you would find something like this:

  1. Make a table with users;
  2. Make a table with notifications;
  3. Make a table with many to many relation that describes who has what notification;

What actually happens is You have to make one big or two small queries. The big one is a JOIN query that selects the relations of the notifications and the data itself/ It ultimately looks something like this:

Select * from user_to_notifications u JOIN notifi_data n on u.n_id = WHERE u.user_id = 1 ORDER BY DESC

The two small queries would be a select for relations and a "where id in" select of the data.

There are a lot of problems with this solution. First the data is a unique notification that will be used/viewed only one time and won't ever change "Your friend [friends login] has birthday in [calculated time from friends birth time]". If you have many unique actions like user activity notifications that generates some other user activity, you will have two very large tables, very fast. Event though you can buy another SQL sever and try to scale horizontally, web server scaling is much cheaper, easier and gives better results in such problems.

To summarize the problem definition:

  1. A whole lot of unique notifications - the more notifications, the more clicking;
  2. Big read-write ratio - users are usually notified in the main screen about every action, generally generating a lot of selects and only some inserts;
  3. Limited pagination of notifications - your users don't need to have 50+ pages of notifications, 10 suffice;
  4. Complicated (slow) SQLs with joins and order bys - you don't need those ;-)
Well that's all for now, Christmas if a busy season... In the second part of my post I will discuss some techniques you can use to avoid such problems and build a great notifications system.

Friday, December 25, 2009

Sphinx, search and spamming

Ever tried searching for something on a fulltext search based website?

Each time You write a sentence in the "search" field, the web server most probably communicates with a search engine that priorly indexed some data in which you can search. Some of those sites are social media sites, some are sites in which being higher up the search ladder, can be of some benefit.

My experience is mostly with the sphinx search engine so i'll just go with that. When you search a phrase in sphinx, for example "php is the best", the engine tries to rank the results with the BM25 ranking function depending on the frequency of the words.

So if you have a confirmed sphinx based site, and it can be of some value to be found on some phrases, you can (but shouldn't) spam the hell of the site ;-)