Showing posts with label php. Show all posts
Showing posts with label php. Show all posts

Thursday, May 19, 2011

SilverStripe CMS for PHP programmers, not for dummies

Hi there!

Recently I've came across a pretty standard problem:

1. I had a simple php website with little php really, some routing, contact form handling, teamplating, nothing fancy
2. I hate doing boring stuff like CRUD's and CMS's
3. I make a lot of changes to the website
4. The website comes in 2 languages

If you think "just user some cms" - well I had the same idea. I started from the best known to me - Wordpress. It's pretty robust, but not being written as a CMS, but as a blog, it didn't really fit in. Of course I could force WP to do what I wish, but anyone who developed something in WP knows it's not really for programmers (or by programmers IMO).

The second choice was on pretty standard CMS's like Joomla or  Drupal, still programming was really problematic, extending stuff hard, since I wanted pretty much nothing form Drupal, and had to leave all that was on the old page.

Finally a friend of mine introduced SilverStripe, a New Zeland made CMS/Framework.
I'm a sceptic by nature but this I loved at first sight.



1. You program first what you want (pretty much no prerequisites), and let the framework build the rest. It's really for programmers, you can't just click away compromising modularity or re-usability. Everything is a class, that you can create and use over and over.

2. Every one of my custom objects (like for example ProjectRealisations) are encapsulated in a single class, that was pretty easy to set up and extend.

3. Multilingual setup for the CMS (options to get any additional language) - 5 minutes.

So what I've done is set up a completely custom website with a CMS for multilingual content (with ACL cf course) in a few days, while learning the CMS/Framework (ssbits.com - a big help).

I strongly recommend the framework for small websites, for lazy programmers (like myself;) )

See you next time,
Peter

Thursday, January 14, 2010

Get a load of your database - paginated caching

Your site is getting awfully slow? There's just to much reads to your database and you have already tweaked the performance of every query? In most cases data caching is the solution to your problem!


The idea is to cache all processed data you heave retrieved from the database. Let us look on a example. It uses a mockup class that basically can handle any caching system like memcached or xcache:


php:
//just a mockup
abstract class cache{
    static public function save($pool, $data, $ttl){
    //some cache processing
    }
    static public function load($pool){
    }
}



Now what we want to do is to save time by implementing caching on heavy load environments. The way to save some execution time and/or decrease database load is to search the cache first, before even querying the db:




//first define what we will save/load adn the time
//the pool name is mainly for verification (if you even write/read the right thing) and is optional


php:
$pool_name = 'somePoolName_';
$data_name = 'someDataName-userId';
$ttl = 3600;

//load data from cache
$data = cache::load($pool_name.$data_name);


//if theres no data in the cache, we will get a false/null value
if(!$data || $data == 'thereIsNoData'){
    $data = DB::exec($someHeavySql);
    //here's a sensitive point where many people make a mistake
    //if you would just save the data and go on with your life, there is a big probability that
    //you would save a NULL/false value. We need to avoid that


    if(!$data){
        cache::save($pool_name.$data_name, 'thereIsNoData',$ttl);
    }else{
        cache::save($pool_name.$data_name, $data, $ttl);
    }
}



Every time someone generates a page view, the data is either cached, or retrieved and the cache is field. Either way we avoid to execute "$someHeavySql" for $ttl seconds. That's just the easy part. What we need to accomplish here is cache almost everything including paginated results.
It's not hard to imagine the need to paginate query results. Let's just think about a site with products. There are about 500 products in the sites database, and there is no way of showing everything on one page (maybe in 1px font size ;-) ) Because the products page is so popular that our database is barely handling all requests, we decided that we will use a caching layer to help the database a little. The problem is the whole pagination issue. When we cached every site like above, adding the page, we encountered a problem.


php:
//pool name, data name, page
$pool_name = 'products_';
$data_name = 'mainProducts-'.$page;


Every time we change the price of a product, we need to delete the cached data. The problem is that we never know on which page is the product we changed, therefore which one to clear. Obviously we need to delete all the cached pages, the product could change, it could be deleted or there could be a new one inserted. Either way the whole cache becomes obsolete. We could iterate and delete all the pages, but that would be very time costly unnecessary. What we want to achieve is to give the cache name an additional parameter.


php:
$cache_version = cache::load('someCacheWithTheVersion');
$pool_name = 'products_';
$data_name = 'mainProducts-'.$cache_version.'-'.$page;




Now when we want do delete the cache, we just increment the version via cache. All the older cached pages become unused and eventually get deleted by the caches garbage collector (if present). Unfortunately we need to make and additional cache request for every page, but it still saves us a lot of resources.
Another problem is how the development cycle of a site. When for instance you have an array with 10 keys that you always cache, and you've made some changes to the structure of said array. What will happen if you sync the new code do your site? You can imagine the things that could happen from simple errors to site wide errors, data inconsistency and so on. You can flush the whole cache but then your databases will get overwhelmed by all the requests, that of course get cached eventually but firs will produce a very nasty vain on your DBA's
 forehead ;-). The easiest way to ensure the usage of new data is a additional version attribute for each cache pool:



php:
//name-version array
$caches = array(
'namesCache'=>1,
'someOtherCache'=>3,
};
$pool_version = $caches[ $cache_name ][0];
$pool_name = // the poll name and version number, then data, pages and so on


You don't even need to increase the version numbers, just make sure they change every time, and include them in the pool names within your caching layer class.


Hope this helps someone to get started with caching data :-)


BTW: I'm really starting to hate blogger's WYSIWYG... it is really annoying...







Sunday, January 3, 2010

How to build a fast, good scaling user notification system pt. 2

This post is the second part of my earlier post on How to build a fast, good scaling user notification system where we discussed the problem areas of the said system. This post will be mostly about a strategy on the infrastructure to store all those notifications and retrieve them as fast as possible.
The most common approach to store data like user notifications would be to create a table containign a PK id, timestamp when the notification was added and of course some additional meta data, for instance:

CREATE TABLE IF NOT EXISTS `notifications` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `date_added` timestamp NOT NULL default '0000-00-00 00:00:00',
  `user_id` int(10) unsigned NOT NULL,
  `content` char(255) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

Every user entering his/hers notification page (which btw is the most frequently visited page) will most probably generate an sql like this:


SELECT id, date_added, content FROM `notifications` WHERE user_id = 123 ORDER BY date_added DESC limit 0,10

You will need the date_added to show to the user when the event occurred, the PK id to take some additional actions on the event and finally the content of the specified notification, we also have to apply a limit to the query to paginate the results. To make queries faster let us create a covering index for the query e.g. modifying the primary index or adding a separate index:

ALTER TABLE  `test`.`notifications` DROP PRIMARY KEY ,
ADD PRIMARY KEY (  `id` ,  `user_id` ,  `date_added` )

Don't get me wrong, that above is a perfectly good system to store user notifications, it is simple, pretty fast up to a certain point, but we can do it better.

First let us think about the holding of chronological data. When you insert data in a cronological order, the PK id is always in some correlation with the timestamp of when the event occurred. Both fields are always growing but we don't really need the id, only the timestamp is of some value. What we can do is create an ID before inserting the data:


php:
$micro = explode(' ',microtime());
define('MICRO_ID', $micro[1].substr($micro[0],2,5));


That piece of code defines a constant consisting of 10 digits representing the current timestamp and 5 digits representing the micro timestamp. So we have a unique time-related ID that we insert instead of the auto-incremented id and the date_added, and we get a free! index on the timestamp value. To prepare the new structure first we need to change the `notifications` table:

ALTER TABLE  `notifications` DROP  `date_added`;
ALTER TABLE  `notifications` CHANGE  `id`  `id` BIGINT( 15 ) NOT NULL;
ALTER TABLE  `notifications` DROP PRIMARY KEY ,
ADD PRIMARY KEY ( `id`, `user_id` );


Let me show you an example of an insert in a ORM like Doctrine:

php:
$nt = new Notifications();
$nt->id = MICRO_ID;
$nt->content = 'Some event description';
$nt->user_id = 1;
$nt->save();
 


Surely ;-) you wonder "How about an multi-server environment?". Well, actually I've seen it work in a a big (100+ servers) environment with just 3 digits of micro time and it worked just great. The key here is to make a combined primary index. The only issue we need to face is when we insert two notifications to the same user in the lifetime of the same program. To be extra careful we can use a ON DUPLICATE KEY clause or simply increment the MICRO_ID constant by 1 via a new variable of course. To construct a proper index, we need to look at the new query retrieving the data:



SELECT id, user_id, content FROM  `notifications` WHERE user_id = 123 ORDER BY id DESC LIMIT 0,10


Because of the user specific nature of the query we have to rebuild the primary index. The trick is to make a combined index on the user_id first and the id, which is the pk, second. When you'll try (try it) to make it the other way around, MySQL will have to make a extra pass to sort the values by id, therefore making it a lot slower (I had a performance drop of over 10 times !).

ALTER TABLE  `test`.`notifications` DROP PRIMARY KEY ,
ADD PRIMARY KEY (  `user_id` ,  `id` );


The explain plan for the SELECT query above:

select type: simple;
type: ref;
possible keys: PRIMARY;
key: PRIMARY;
key len: 4;
ref: const;
Extra: Using Where

That looks pretty nice. Our query uses the index we have created earlier and it is very fast, in addition we save a whole lot of space that the timestamp field would take up.


The normalization of the table was left out for last on purpose. We need to divide the `content` field into some smaller ones. The real advantage of this kind of notification system is that the whole text is in a template which is used by a object in the program itself (example hierarchy generated with yuml.me). You can of course use some complicated design patterns to enhance the flexibility of the system but the baseline is pretty simple:


In this example \tThe render() method generates the html for each notification row via the template set seperatly for each subclass and a prepareData() method to make additional processing of raw data. We obviosly need another field in the notification table containing the type id:

ALTER TABLE `notifications` DROP `content`;
ALTER TABLE `notifications` ADD `type_id` TINYINT NOT NULL ,
ADD `info` CHAR( 100 ) NOT NULL;


The info field stores all the ids, of short text information you need e.g. two ids and some text '123||456||some text' that you can split later on in the mentioned prepareData() method before using the render() method. I'm not going to carry on about the implementation specifics, we just need to know the baseline of the system. We have successfully created a table structure and a class hierarchy to select and insert user notifications. The next post will handle about inserting multiple notifications (notify multiple users about an event), and the structures to provide such functionality.

by the way... WYSIWYG in blogger is really lame... thinking about going to wordpress...