Typically the way heavily visited websites get more bandwidth is by a front end that sends the visitor to one of a group of virtual machines (or real ones). However in the case of FR the data viewed isn’t mostly static. People are beating on threads that they expect to see updated in real time. If multiple machines are sharing the load, they all also have to get all updates from one another and merge them. For applications like this, a big hunk of modern IBM iron with its raw, screaming I/O capability might beat a gaggle of Wintel blades running Linux. But the budget isn’t there to buy that million dollar computer.
This might be a really dumb question from a non-techie:
How do chat rooms hold so many people in real time constantly chatting? While the chat would not be save able like FR threads, wouldn’t it be easier to get into and keep running in fast real time?
Couldn’t FR create a chat room only for special occasions? It would be up to JR to determine which events were worth opening the room for. Surely debates and election nights would be essential.
Would that help? Everyone just wants to be in a room with likemindeds at these times. We want to express ourselves and see how others are feeling. It has to be fast. But it doesn’t have to be a thread for eternity.
What about a special event chat room rather than a live thread?
This is why I think the memcache would help.
I don't know exactly how their database is organized. But, if it is broken down into a row for every posting, then each of those could be cached with a unique key. That way, you can construct a thread from the posts in the cache.
Alternatively, there could be a cache entry for each posting, and the number of posts to the thread are embedded in the key. A quick check to a separate table in the cache to determine the number of postings would tell you if that cache entry would be valid, and it can be used to generate a posting. The cache entry would only become invalid when another post was added to the thread. The idea is simply to use the cache in place of a database queries, when possible.
Since it really seems to slow down during these live thread periods, I suspect what is really wrong is a table-locking constraint. If they are using MySQL for the back-end, it locks an entire table for an update -- which means that all read queries to that table have are blocked. Postgres and Oracle have row-based locking.
If that is the case, memcache would help, but only if was used it to reduce the number of read queries.
But the budget isnt there to buy that million dollar computer.
_________________
Or the talent to keep it running. I wonder if there isnt an interested mainframe owner who has space???
I disagree with you on a few points:
1. Real time shouldn’t be demanded or expected. There should be cacheability for short time bursts, and since people tend to hit the same pages, the potential for cache misses should be low.
2. Transactional consistency shouldn’t be the rule here. No reason for it. Eventual is fine, and even that might be on the strict side.
3. There are plenty of improvements that can be made on the software here. I’m sure John has done an excellent job of making sure a lot of static content is cacheable, though.
There’s no real reason that the servers should be this slow, and from what John’s indicated, the problem doesn’t seem to be that the database or hardware is stressed. Synchronization points aren’t always obvious, though. In a distributed system, everything pulls on everything else, often in unexpected ways; it’s not a series of isolated components connected by a network. Could be improper backoff. Could be a case of improperly managed blocking queues. Or it could be external altogether & be DDOS.
Interesting problem to tackle, I’d love to give it a try.