Jeremy Edberg, the first paid employee at reddit, teaches us a lot about how to create a successful social site in a really good talk he gave at the RAMP conference. Watch it here at Scaling Reddit from 1 Million to 1 Billion–Pitfalls and Lessons.
Jeremy uses a virtue and sin approach. Examples of the mistakes made in scaling reddit are shared and it turns out they did a lot of good stuff too. Somewhat of a shocker is that Jeremy is now a Reliability Architect at Netflix, so we get a little Netflix perspective thrown in for free.
Some of the lessons that stood out most for me:
A growing number of industries are trying to reduce or at least curtail carbon footprints and energy use. Emissions standards have been set for the automotive, construction, and even telecommunications industries. Yet the internet’s carbon footprint is growing out of control: a whopping 830 million tons of CO2 annually, which is bigger than that of the entire aviation industry. That amount is set to double by 2020.
The following is an architectural overview of salesforce.com’s core platform and applications. Other systems such as Heroku's Dyno architecture or the subsystems of other products such as work.com and do.com are specifically not covered by this material, although database.com is. The idea is to share with the technology community some insight about how salesforce.com does what it does. Any mistakes or omissions are mine.
This is by no means comprehensive but if there is interest, the author would be happy to tackle other areas of how salesforce.com works. Salesforce.com is interested in being more open with the technology communities that we have not previously interacted with. Here’s to the start of “Opening the Kimono” about how we work.
Since 1999, salesforce.com has been singularly focused on building technologies for business that are delivered over the Internet, displacing traditional enterprise software. Our customers pay via monthly subscription to access our services anywhere, anytime through a web browser. We hope this exploration of the core salesforce.com architecture will be the first of many contributions to the community.
I've been a Microsoft developer for decades now. I weaned myself on various flavors of home computer Microsoft Basic, and I got my first paid programming gigs in Microsoft FoxPro, Microsoft Access, and Microsoft Visual Basic. I have seen the future of programming, my friends, and it is terrible CRUD apps running on Wintel boxes!
Of course, we went on to build Stack Overflow in Microsoft .NET. That's a big reason it's still as fast as it is. So one of the most frequently asked questions after we announced Discourse was:
Why didn't you build Discourse in .NET, too?
Let me be clear about something: I love .NET. One of the greatest thrills of my professional career was getting the opportunity to place a Coding Horror sticker in the hand of Anders Hejlsberg. Pardon my inner fanboy for a moment, but oh man I still get chills. There are maybe fifty world class computer language designers on the planet. Anders is the only one of them who built Turbo Pascal and Delphi. It is thanks to Anders' expert guidance that .NET started out such a remarkably well designed language – literally what Java should have been on every conceivable level – and has continued to evolve in remarkably practical ways over the last 10 years, leveraging the strengths of other influential dynamically typed languages.
All that said, it's true that I intentionally chose not to use .NET for my next project. So you might expect to find an angry, righteous screed here about how much happier I am leaving the oppressive shackles of my Microsoft masters behind. Free at last, free at least, thank God almighty I'm free at last!
Like any pragmatic programmer, I pick the appropriate tool for the job at hand. And as much as I may love .NET, it would be an extraordinarily poor choice for an 100% open source project like Discourse. Why? Three reasons, mainly:
The licensing. My God, the licensing. It's not so much the money, as the infernal, mind-bending tax code level complexity involved in making sure all your software is properly licensed: determining what 'level' and 'edition' you are licensed at, who is licensed to use what, which servers are licensed... wait, what? Sorry, I passed out there for a minute when I was attacked by rabid licensing weasels.
I'm not inclined to make grand pronouncements about the future of software, but if anything kills off commercial software, let me tell you, it won't be open source software. They needn't bother. Commercial software will gleefully strangle itself to death on its own licensing terms.
The friction. If you want to build truly viable open source software, you need people to contribute to your project, so that it is a living, breathing, growing thing. And unless you can download all the software you need to hack on your project freely from all over the Internet, no strings attached, there's just … too much friction.
If Stack Overflow taught me anything, it is that we now live in a world where the next brilliant software engineer can come from anywhere on the planet. I'm talking places this ugly American programmer has never heard of, where they speak crazy nonsense moon languages I can't understand. But get this. Stand back while I blow your mind, people: these brilliant programmers still code in the same keywords we do! I know, crazy, right?
Getting up and running with a Microsoft stack is just plain too hard for a developer in, say, Argentina, or Nepal, or Bulgaria. Open source operating systems, languages, and tool chains are the great equalizer, the basis for the next great generation of programmers all over the world who are going to help us change the world.
The ecosystem. When I was at Stack Exchange we strove mightily to make as much of our infrastructure open source as we could. It was something that we made explicit in the compensation guidelines, this idea that we would all be (partially) judged by how much we could do in public, and try to leave behind as many useful, public artifacts of our work as we could. Because wasn't all of Stack Exchange itself, from the very first day, built on your Creative Commons contributions that we all share ownership of?
You can certainly build open source software in .NET. And many do. But it never feels natural. It never feels right. Nobody accepts your patch to a core .NET class library no matter how hard you try. It always feels like you're swimming upstream, in a world of small and large businesses using .NET that really aren't interested in sharing their code with the world – probably because they know it would suck if they did, anyway. It is just not a native part of the Microsoft .NET culture to make things open source, especially not the things that suck. If you are afraid the things you share will suck, that fear will render you incapable of truly and deeply giving back. The most, uh, delightful… bit of open source communities is how they aren't afraid to let it "all hang out", so to speak.
So as a result, for any given task in .NET you might have – if you're lucky – a choice of maybe two decent-ish libraries. Whereas in any popular open source language, you'll easily have a dozen choices for the same task. Yeah, maybe six of them will be broken, obsolete, useless, or downright crazy. But hey, even factoring in some natural open source spoilage, you're still ahead by a factor of three! A winner is you!
As I wrote five years ago:
I'm a pragmatist. For now, I choose to live in the Microsoft universe. But that doesn't mean I'm ignorant of how the other half lives. There's always more than one way to do it, and just because I chose one particular way doesn't make it the right way – or even a particularly good way. Choosing to be provincial and insular is a sure-fire path to ignorance. Learn how the other half lives. Get to know some developers who don't live in the exact same world you do. Find out what tools they're using, and why. If, after getting your feet wet on both sides of the fence, you decide the other half is living better and you want to join them, then I bid you a fond farewell.
I no longer live in the Microsoft universe any more. Right, wrong, good, evil, that's just how it turned out for the project we wanted to build.
However, I'd also be lying if I didn't mention that I truly believe the sort of project we are building in Discourse does represent most future software. If you squint your eyes a little, I think you can see a future not too far in the distance where .NET is a specialized niche outside the mainstream.
But why Ruby? Well, the short and not very glamorous answer is that I had narrowed it down to either Python or Ruby, and my original co-founder Robin Ward has been building major Rails apps since 2006. So that clinched it.
I've always been a little intrigued by Ruby, mostly because of the absolutely gushing praise Steve Yegge had for the language way back in 2006. I've never forgotten this.
For the most part, Ruby took Perl's string processing and Unix integration as-is, meaning the syntax is identical, and so right there, before anything else happens, you already have the Best of Perl. And that's a great start, especially if you don't take the Rest of Perl.
But then Matz took the best of list processing from Lisp, and the best of OO from Smalltalk and other languages, and the best of iterators from CLU, and pretty much the best of everything from everyone.
And he somehow made it all work together so well that you don't even notice that it has all that stuff. I learned Ruby faster than any other language, out of maybe 30 or 40 total; it took me about 3 days before I was more comfortable using Ruby than I was in Perl, after eight years of Perl hacking. It's so consistent that you start being able to guess how things will work, and you're right most of the time. It's beautiful. And fun. And practical.
Steve is one of those polyglot programmers I respect so much that I basically just take whatever his opinion is, provided it's not about something wacky like gun control or feminism or T'Pau, and accept it as fact.
I apologize, Steve. I'm sorry it took me 7 years to get around to Ruby. But maybe I was better off waiting a while anyway:
Ruby is a decent performer, but you really need to throw fast hardware at it for good performance. Yeah, I know, interpreted languages are what they are, and caching, database, network, blah blah blah. Still, we obtained the absolute fastest CPUs you could buy for the Discourse servers, 4.0 Ghz Ivy Bridge Xeons, and performance is just … good on today's fastest hardware. Not great. Good.
Yes, I'll admit that I am utterly spoiled by the JIT compiled performance of .NET. That's what I am used to. I do sometimes pine away for the bad old days of .NET when we could build pages that serve in well under 50 milliseconds without thinking about it too hard. Interpreted languages aren't going to be able to reach those performance levels. But I can only imagine how rough Ruby performance had to be back in the dark ages of 2006 when CPUs and servers were five times slower than they are today! I'm so very glad that I am hitting Ruby now, with the strong wind of many solid years of Moore's law at our backs.
Ruby is maturing up nicely in the 2.0 language release, which happened not more than a month after Discourse was announced. So, yes, the downside is that Ruby is slow. But the upside is there is a lot of low hanging performance fruit in Ruby-land. Like.. a lot a lot. On Discourse we got an across the board 20% performance improvement just upgrading to Ruby 2.0, and we nearly doubled our performance by increasing the default Ruby garbage collection limit. From a future performance perspective, Ruby is nothing but upside.
Ruby isn't cool any more. Yeah, you heard me. It's not cool to write Ruby code any more. All the cool people moved on to slinging Scala and Node.js years ago. Our project isn't cool, it's just a bunch of boring old Ruby code. Personally, I'm thrilled that Ruby is now mature enough that the community no longer needs to bother with the pretense of being the coolest kid on the block. That means the rest of us who just like to Get Shit Done can roll up our sleeves and focus on the mission of building stuff with our peers rather than frantically running around trying to suss out the next shiny thing.
And of course the Ruby community is, and always has been, amazing. We never want for great open source gems and great open source contributors. Now is a fantastic time to get into Ruby, in my opinion, whatever your background is.
Even if done in good will and for the best interests of the project, it's still a little scary to totally change your programming stripes overnight after two decades. I've always believed that great programmers learn to love more than one language and programming environment – and I hope the Discourse project is an opportunity for everyone to learn and grow, not just me. So go fork us on GitHub already!
[advertisement] Hiring developers? Post your open positions with Stack Overflow Careers and reach over 20MM awesome devs already on Stack Overflow. Create your satisfaction-guaranteed job listing today!
CloudFlare's CDN is based on Anycast, a standard defined in the Border Gateway Protocol—the routing protocol that's at the center of how the Internet directs traffic. Anycast is part of how BGP supports the multi-homing of IP addresses, in which multiple routers connect a network to the Internet; through the broadcasts of IP addresses available through a router, other routers determine the shortest path for network traffic to take to reach that destination.
Using Anycast means that CloudFlare makes the servers it fronts appear to be in many places, while only using one IP address. "If you do a traceroute to Metallica.com (a CloudFlare customer), depending on where you are in the world, you would hit a different data center," Prince said. "But you're getting back the same IP address."
That means that as CloudFlare adds more data centers, and those data centers advertise the IP addresses of the websites that are fronted by the service, the Internet's core routers automatically re-map the routes to the IP addresses of the sites. There's no need to do anything special with the Domain Name Service to handle load-balancing of network traffic to sites other than point the hostname for a site at CloudFlare's IP address. It also means that when a specific data center needs to be taken down for an upgrade or maintenance (or gets knocked offline for some other reason), the routes can be adjusted on the fly.
That makes it much harder for distributed denial of service attacks to go after servers behind CloudFlare's CDN network; if they're geographically widespread, the traffic they generate gets spread across all of CloudFlare's data centers—as long as the network connections at each site aren't overcome.
Today, a large collection of Web hosting and service companies announced that they will support Railgun, a compression protocol for dynamic Web content. The list includes the content delivery network and Web security provider CloudFlare, cloud providers Amazon Web Services and Rackspace, and thirty of the world’s biggest Web hosting companies.
Railgun is said to make it possible to double the performance of websites served up through Cloudflare’s global network of data centers. The technology was largely developed in the open-source Go programming language launched by Google; it could significantly change the economics of hosting high-volume websites on Amazon Web Services and other cloud platforms because of the bandwidth savings it provides. It has already cut the bandwidth used by 4Chan and Imgur by half. “We've seen a ~50% reduction in backend transfer for our HTML pages (transfer between our servers and CloudFlare's),” said 4Chan’s Chris Poole in an e-mail exchange with Ars. “And pages definitely load a fair bit snappier when Railgun is enabled, since the roundtrip time for CloudFlare to fetch the page is dramatically reduced. We serve over half a billion pages per month (and billions of API hits), so that all adds up fairly quickly.”
Rapid cache updates
Like most CDNs, CloudFlare uses caching of static content at its data centers to help overcome the speed of light. But prepositioning content on a forward server typically hasn’t helped performance much for dynamic webpages and Web traffic such as AJAX requests and mobile app API calls, which have relatively little in the way of what’s considered static content. That has created a problem for Internet services because of the rise in traffic for mobile devices and dynamic websites.