Posts Tagged ‘scalability’

Project: RSS

My first project, which I will be starting in about a week, will be a web-based RSS/ATOM feed aggregator with the following properties:

  1. It will store a complete archive of every item it sees.
  2. Queries of the first ~50 items of each feed will be fast and take constant time.
  3. The interface will be responsive as possible, despite server/communication latency.

The most important question thus far is whether to use a fully-fledged database with server (mysql, postgres, etc.), or to use a file based database (Berkeley db, etc.). The crucial factor is the database’s ability to achieve both properties 1 & 2 above. In other words, how well does each approach scale?

I believe that the second property could be achieved by using a table of all feed items sorted by their dates and limiting the first query to the first 20 items, fetching the next 20 items whenever the user reaches the end of the list (a la Google Reader).

The third property may be implemented using asynchronous commands to the server. Since the javascript code will be mostly event driven, I don’t think it’d be a problem making it perform several simultaneous actions (server requests) at once, provided they don’t conflict (e.g., access the same data structures).
Another way is to use a queue of commands waiting to be issued to the server. This has the added benefit of being able to reissue any failed commands at a later time, and being able to work offline.

Read Full Post »

Twitter’s architecture

Rolling on Rails: Under the Hood at Twitter – talks about the “scramble to scale” and the importance of thought out database schemas and queries.

In order to scale your service, it is usually not necessary to change the entire framework. Profiling your code and queries can lead to small incremental changes that solve major bottlenecks (caches, query optimization, etc.).

Read Full Post »