Skip navigation

Transaction processing

warning: Creating default object from empty value in /var/www/vhosts/ on line 33.
Original author: 
Todd Hoff

The paper MegaPipe: A New Programming Interface for Scalable Network I/O (video, slides) hits the common theme that if you want to go faster you need a better car design, not just a better driver. So that's why the authors started with a clean-slate and designed a network API from the ground up with support for concurrent I/O, a requirement for achieving high performance while scaling to large numbers of connections per thread, multiple cores, etc.  What they created is MegaPipe, "a new network programming API for message-oriented workloads to avoid the performance issues of BSD Socket API."

The result: MegaPipe outperforms baseline Linux between 29% (for long connections) and 582% (for short connections). MegaPipe improves the performance of a modified version of memcached between 15% and 320%. For a workload based on real-world HTTP traces, MegaPipe boosts the throughput of nginx by 75%.

What's this most excellent and interesting paper about?

Your rating: None
Original author: 
Todd Hoff

Now that we have the C10K concurrent connection problem licked, how do we level up and support 10 million concurrent connections? Impossible you say. Nope, systems right now are delivering 10 million concurrent connections using techniques that are as radical as they may be unfamiliar.

To learn how it’s done we turn to Robert Graham, CEO of Errata Security, and his absolutely fantastic talk at Shmoocon 2013 called C10M Defending The Internet At Scale.

Robert has a brilliant way of framing the problem that I’ve never heard of before. He starts with a little bit of history, relating how Unix wasn’t originally designed to be a general server OS, it was designed to be a control system for a telephone network. It was the telephone network that actually transported the data so there was a clean separation between the control plane and the data plane. The problem is we now use Unix servers as part of the data plane, which we shouldn’t do at all. If we were designing a kernel for handling one application per server we would design it very differently than for a multi-user kernel. 

Which is why he says the key is to understand:

  • The kernel isn’t the solution. The kernel is the problem.

Which means:

  • Don’t let the kernel do all the heavy lifting. Take packet handling, memory management, and processor scheduling out of the kernel and put it into the application, where it can be done efficiently. Let Linux handle the control plane and let the the application handle the data plane.

The result will be a system that can handle 10 million concurrent connections with 200 clock cycles for packet handling and 1400 hundred clock cycles for application logic. As a main memory access costs 300 clock cycles it’s key to design in way that minimizes code and cache misses.

With a data plane oriented system you can process 10 million packets per second. With a control plane oriented system you only get 1 million packets per second.

If this seems extreme keep in mind the old saying: scalability is specialization. To do something great you can’t outsource performance to the OS. You have to do it yourself.

Now, let’s learn how Robert creates a system capable of handling 10 million concurrent connections...

Your rating: None

eldavojohn writes "As a web developer in the 'Agile' era, I find myself making (or recognizing) more and more important decisions being made in developing for the web. Scalability Rules cemented and codified a lot of things I had suspected or picked up from blogs but failed to give much more thought to and had difficulty defending as a member of a team. A simple example is that I knew state is bad if unneeded but I couldn't quite define why. Scalability Rules Read below for the rest of eldavojohn's review.

Read more of this story at Slashdot.

Your rating: None

The Little Engine(s) That Could: Scaling Online Social Networks

Google Tech Talk (see below) June 17, 2010 Presented by Josep M. Pujol. ABSTRACT The difficulty of partitioning social graphs has introduced new system design challenges for scaling of Online Social Networks (OSNs). Vertical scaling by resorting to full replication can be a costly proposition. Scaling horizontally by partitioning and distributing data among multiple servers using, for eg, key-value stores using DHTs, can suffer from expensive inter-server communication and other performance issues. Such challenges have often led to costly re-architecting efforts for popular OSNs like Twitter and Facebook. We design, implement, and evaluate SPAR, a Social Partitioning and Replication middle-ware that mediates between the application and the database layer of an OSN. SPAR exploits the underlying social graph structure to partition user data and selectively replicate users to ensure that users have their neighbors' data co-located on their machine. The gains from this are multi-fold: application developers can assume local semantics, ie, develop as they would for a single machine; scalability is achieved by adding commodity machines with low memory and network I/O requirements; and N+K redundancy is achieved at a fraction of the cost. We provide a complete system design, extensive evaluation based on datasets from Twitter, Orkut, and Facebook, and a working implementation. We show that SPAR performs well in terms of reducing the overhead, and dealing with high dynamics <b>...</b>

More in
Science & Technology

Your rating: None