Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Twitter says they fixed the Memcache calcification problem. Dormando disagrees. (github.com/twitter)
73 points by diwup on July 11, 2012 | hide | past | favorite | 37 comments


Just to clarify things since I helped write the initial blog post about twemcache... in our original post, we had an error regarding the slab calcification problem we mentioned, this problem ONLY applied to our v1.4.4 fork of memcached. After speaking with the upstream maintainers, we learned that recent memcached versions have addressed some of these problems. These are the type of conversations we want to have.

At the time we adopted memcached, that's the version we went with and made sure it worked well in our production environment as we scaled as a company. We also open sourced twemproxy [https://github.com/twitter/twemproxy] which is a lightweight proxy for memcached which has worked well for us in combination with twemcache and may work well for others too.

We just want to reiterate that twemcache has worked well for our unique environment and any teams evaluating memcached should try all their try all their options, just like any other piece of software you adopt in your stack.

One of the reasons of open sourcing our work was to share our ideas with the memcached community to see what worked well for us and help everyone. For example, this is also how we treat our work with our MySQL fork [https://github.com/twitter/mysql] which we maintain in the open and have signed an OCA with Oracle to help get work pushed upstream so everyone benefits in the long run.


I enjoyed the jab at twitter's "we are the only website on the planet to have scaling issues" holier than thou attitude. Rails is still trying to get over the character assassination by twitter when they failed to scale it. I know first hand that rails can scale very well. Do bad carpenters blame their tools?


Admittedly I am not well versed in Rails at present. However, the question in regard to twitter isn't so much whether Rails can scale but whether it could scale when twitter needed it too, which was a fair number of years ago. Did Mongrel or Unicorn even exist back in, say, 2007?


I believe the main earlier problem was that M:N broadcasting in mysql didn't scale well, but I could be mistaken.


Mongrel did.


Was it mature enough then to have been used in production at twitter's scale?


Yes. I was on a call with a couple of the twitter devs around this time. They were running on 30 odd instances at joyent. So while they were decent size, they were nothing like they are now.

Mongrel was always pretty solid actually. It was originally developed for verisign as I recall, and even in its early versions stuck much more closely to the http specs than other webservers.

It caught a lot of flack because of application code blasting out the ruby heap by touching too many objects. Mongrel isn't really to blame for that. With early versions of ActiveRecord it was easy to materialize large result sets without realizing it. A lot of people felt the pain of using associations everywhere without thinking about what would happen when the joins would go north of 10k objects. Not really mongrel's fault.


I had that thought as well. And honestly I'm not entirely sure which version they were running. By now the community has learned a bunch of lessons and we take some of those for granted.

Still, the guys over at Twitter could have said "We couldn't scale this technology given our requirements and current knowledge so we went with what we are comfortable with, and thats Java." Instead, they blamed Rails. And now any Rails hater brings up Twitter in a flame war. Even though it was some 4-5 years ago.


Twitter is still the largest Rails site on the web as far as I know, so anyone who says that Twitter hates Rails is either being taken out of context or confused.


There may be a difference between scaling well and scaling 9th on alexa well.

Sometimes bad carpenters blame their tools, but sometimes the tools weren't planned for something of that scale.


Regardless of the Twitter/Memcache controversy, bad carpenters do blame their tools. Good carpenters succeed in spite of their tools.


Twitter's attitude is laughable. Yes, you have a decent amount of traffic but you're also only dealing with ~160 uncompressed bytes (plus whatever overhead) per event. The hurdles you've overcame aren't that particularly amazing nor challenging.


Your assumptions are laughable. Tweets have considerable metadata that pushes them far beyond 160 bytes. Fragmenting this into a secondary object is counterproductive due to the constant factor of 2 requests vs one larger payload.

Someone who's been around the block a few times understands that it's difficult to make pronouncements without informed observation. That you are not willing to extend twitter's engineering staff the benefit of the doubt considering your lack of visibility into their measurements speaks loudly.


What does content size matter? The challenge is that every single page of content except for each individual tweet is utterly unique for every user. That defeats the vast majority of straightforward caching implementations. You can't cache fully rendered pages ever because the chance that one random timeline view at a given time will be identical to any other view (even by the same person at a different time) is pretty much as close to zero as possible. Every view is dynamically generated content from up to several hundred or thousand different streams of data and needs to be put in order and have all of the per-user metadata set correctly.

Once you start looking into the actual mathematical constraints of the problem of twitter you realize that it's a scaling nightmare. Hundreds of millions of updates per day and tens of thousands of views per second (billions per day). There's only a few people in the world who have the right to look down on stats like that.


Again, as the parent poster also posted, I think you have never worked on large data. Twitter is like a big mailbox, only that every mail only has 160 bytes. This has been solved 10 years ago.


If you don't understand that the request distribution matters more than payload size, you aren't even seeing the problems.

I encourage you to analyze infrastructure for a twitter style app using inbox duplication. Once you model this against hardware costs you'll learn something about how utterly expensive write amplification is in a hot data set that must be backed by ram due to availability requirements.


Wait, see my comment below. Twitter received 15B (yes, B) API calls/day last July. How does that compare to your typical email client?

I don't want to argue that Twitter is astoundingly hard, but serving ~170K requests/sec can't really be that trivial, even if they're 160 bytes (they're not, since Twitter sends metadata, logs those messages, tracks service metrics, etc. for those messages)


If you treat twitter like a big mailbox, things will work "ok". It's not the worst approach ever, that's for certain. But end-user perceptible performance would be a fraction of what twitter has today.

P.S. How many images does twitter serve up per day at present? That's a tad more than 160 characters of data.


Another key difference is that email users generally contribute directly to their provider's infrastructure costs in providing email as a service. Email infrastructure (and the user experience) is fragmented, and global funding generally scales with global load.

Instead twitter must monetize via advertising of some form, and so the percentage of folks who do not respond to ads acts as a really strong factor in your cost calculations. In this sense, email software has it easy, and can be extremely wasteful in the resources it consumes.

It's not just that the availability expectations of twitter are higher than email, it's also that the economic base of the infrastructure is far more sparse.


And yet another key difference is that there are few email server installations that support half a billion users. Saying that the scaling problem is "solved" because all you have to do is copy, say, gmail, is kind of silly.


Yes, you have a decent amount of traffic but you're also only dealing with ~160 uncompressed bytes (plus whatever overhead) per event. The hurdles you've overcame aren't that particularly amazing nor challenging.

>42M uniques last month.[0] Are you really going to assert Twitter hasn't dealt with amazing or challenging hurdles in getting this far?

[0] http://siteanalytics.compete.com/twitter.com/

EDIT: this ignores that twitter.com is not the only Twitter client--they served 15B (!!) requests/day (!!!) as of a year ago.

Not to mention metadata, instrumentation for services, logging, DB backups, and managing configuration of all of those distributed resources. Are we still talking about the ease of 160B?

http://www.readwriteweb.com/hack/2011/07/twitter-serves-more...


Yes.

In 2008/2009, another engineer and myself built an ad-platform that received around 500M impressions per day, 5M clicks per day. And it wasn't just recording a tweet or publishing out to followers. We took the user input query, had to do some keyword/relevancy targeting, geofiltering, matching to advertisers and deliver back a large result set of adverts. All within 100ms.

Our platform was also apache, mod_php, memcached, mysql and rabbitmq. So definitely not the most optimal of platforms by any means. We had two colos with ~20 servers (dell r410s) at each facility.

Twitter just recently announced 400M tweets/day. I'm not trying to brag about my experiences, because looking back now we made numerous amateur mistakes, but just showing that Twitter's "scale" is a joke compared to everyday challenges at any large internet ad network.


You understand that 400M tweets a day is the number of tweets posted to their system, right? That speaks not at all to the consumption of those tweets, which is the metric you're using for your ad platform.

Additionally, they don't just deal with 160 characters, because again, somehow you're still talking about data being posted, and not data being consumed. Data is consumed off their site via polling APIs, streaming APIs, and a website, all of which are pushing those 400M tweets a day out to plenty of consumers.

They may not have as ridiculous a scale as they act like they do. But let's be clear: it is nowhere near as trivial as you make it out to be, either. Armchair quarterbacking is always easy, because you aren't exposed to the complexity that arises when you've spent a few months and years hitting the corner cases of the problem you're commenting on.


So you had 500m reads on a relatively static data set + 5m writes on an unrelated log? Sounds like a fun problem, but I agree I doesn't sound like rocket science. On the other hand, it also doesn't sound like Twitter, having 400m writes per day, and 400*x million reads on that very dynamic data set. Just seems that's a slightly harder problem.


Adserving is not really static. Cachebusters are named so for good reason. Nowadays ad server developers are clever enough to separate click tracking and impression tracking (the non-Enterprise version of OpenX still deserves a lot of ಠ_ಠ though).

In an RTB environment, there is an additional constraint of having to serve up your ad (or decision) within 60ms (Google ADX sets a hard limit of 80ms), and the fastest best bid wins.

I don't think that's a less hard problem compared to Twitter, especially at high volumes. You can't just say "scale sideward!".

That said, the first link was totally misleading. I was actually quite shocked to see that Twitter only had 42M uniques per month, because a typical ad network does a lot more

EDIT: ah.. 15B requests/day makes more sense. Wtf is with the wrong stats?


Are you talking about 15B vs. the visits chart I linked? If so, the 15B number comes from API calls, which do not have to happen through the website (think of all the Twitter clients).


Requests are requests.15B is a gigantic amount.


Right I agree 100%. I just couldn't tell if you were trying to reconcile the 15B with the 45M number from Compete.


See my edit. Your ad impressions reached approximately 3% of Twitter's daily request load last year. Note that those requests can serve up to 200 tweets + metadata.

This doesn't account for Twitter's budding ad service, which one can assume has some of the same functionality (targeted advertising, information retrieval) as traditional ad networks.


You are off by nearly 2 orders of magnitude from twitter's scale. They have billions of views per day and each of those views is a stream comprised of hundreds of different sub-streams.


Add to that, the challenges of sub-60ms RTB. All the fun!


Decent amount of traffic ?

Sorry but the only thing laughable is that comment.


tl;dr your memcache branch fragments my consulting clients, you should use my memcache branch instead


Apparently another Twitter engineer says it's doing 23M/s

"23 million queries per second with zero fucks given"

https://twitter.com/timtrueman/status/222793786345013248


That response from manjuraj was kind of evil


Has open source changed? I remember when people didn't fork projects, rebadge them as there own and then promote them over the original.

Now don't get me wrong, I think its amazing that Twitter are opening up these enhancements to the community, but it feels like a kick in the teeth to the memcached folks to slap a twitter badge on it, why isn't this a collaboration that benefits the whole community? you know, like open source used to work.

I know at 34 I'm a dinosaur in this industry but I do try to keep up with the new way of doing things... This just feels wrong to me.


I think open source has changed a bit with the rise of the GitHub era. IMHO, I think @mikeal did a good post regarding this change "Apache considered harmful" http://www.mikealrogers.com/posts/apache-considered-harmful....

Even in the old days, it's easier to fork than work with upstream. I think these days it's just easier share those forks with services like GitHub. It should help spread ideas and improved solutions IMHO so downstream consumers actually benefit.

In Twitter's case, they are planning to do what works for them at the moment: "While we initially focused on the challenging goal of making Memcached work extremely well within the Twitter infrastructure, we look forward to sharing our code and ideas with the Memcached community in the long term."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: