Skip navigation
Help

Cloud infrastructure

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.

This aricle, F1: A Distributed SQL Database That Scales by Srihari Srinivasan, is republished with permission from a blog you really should follow: Systems We Make - Curating Complex Distributed Systems.

With both the F1 and Spanner papers out its now possible to understand their interplay a bit holistically. So lets start by revisiting the key goals of both systems.

0
Your rating: None
Original author: 
Todd Hoff

This is a guest post by Yelp's Jim Blomo. Jim manages a growing data mining team that uses Hadoop, mrjob, and oddjob to process TBs of data. Before Yelp, he built infrastructure for startups and Amazon. Check out his upcoming talk at OSCON 2013 on Building a Cloud Culture at Yelp.

In Q1 2013, Yelp had 102 million unique visitors (source: Google Analytics) including approximately 10 million unique mobile devices using the Yelp app on a monthly average basis. Yelpers have written more than 39 million rich, local reviews, making Yelp the leading local guide on everything from boutiques and mechanics to restaurants and dentists. With respect to data, one of the most unique things about Yelp is the variety of data: reviews, user profiles, business descriptions, menus, check-ins, food photos... the list goes on.  We have many ways to deal data, but today I’ll focus on how we handle offline data processing and analytics.

In late 2009, Yelp investigated using Amazon’s Elastic MapReduce (EMR) as an alternative to an in-house cluster built from spare computers.  By mid 2010, we had moved production processing completely to EMR and turned off our Hadoop cluster.  Today we run over 500 jobs a day, from integration tests to advertising metrics.  We’ve learned a few lessons along the way that can hopefully benefit you as well.

Job Flow Pooling

0
Your rating: None
Original author: 
Ben Cherian

software380

Image copyright isak55

In every emerging technology market, hype seems to wax and wane. One day a new technology is red hot, the next day it’s old hat. Sometimes the hype tends to pan out and concepts such as “e-commerce” become a normal way to shop. Other times the hype doesn’t meet expectations, and consumers don’t buy into paying for e-commerce using Beenz or Flooz. Apparently, Whoopi Goldberg and a slew of big name VCs ended up making a bad bet on the e-currency market in the late 1990s. Whoopi was paid in cash and shares of Flooz. At least, she wasn’t paid in Flooz alone! When investing, some bets are great and others are awful, but often, one only knows the awful ones in retrospect.

What Does “Software Defined” Mean?

In the infrastructure space, there is a growing trend of companies calling themselves “software defined (x).” Often, it’s a vendor that is re-positioning a decades-old product. On occasion, though, it’s smart, nimble startups and wise incumbents seeing a new way of delivering infrastructure. Either way, the term “software defined” is with us to stay, and there is real meaning and value behind it if you look past the hype.

There are three software defined terms that seem to be bandied around quite often: software defined networking, software defined storage, and the software defined data center. I suspect new terms will soon follow, like software defined security and software defined management. What all these “software-defined” concepts really boil down to is: Virtualization of the underlying component and accessibility through some documented API to provision, operate and manage the low-level component.

This trend started once Amazon Web Services came onto the scene and convinced the world that the data center could be abstracted into much smaller units and could be treated as disposable pieces of technology, which in turn could be priced as a utility. Vendors watched Amazon closely and saw how this could apply to the data center of the future.

Since compute was already virtualized by VMware and Xen, projects such as Eucalyptus were launched with the intention to be a “cloud controller” that would manage the virtualized servers and provision virtual machines (VMs). Virtualized storage (a.k.a. software defined storage) was a core part of the offering and projects like OpenStack Swift and Ceph showed the world that storage could be virtualized and accessed programmatically. Today, software defined networking is the new hotness and companies like Midokura, VMware/Nicira, Big Switch and Plexxi are changing the way networks are designed and automated.

The Software Defined Data Center

The software defined data center encompasses all the concepts of software defined networking, software defined storage, cloud computing, automation, management and security. Every low-level infrastructure component in a data center can be provisioned, operated, and managed through an API. Not only are there tenant-facing APIs, but operator-facing APIs which help the operator automate tasks which were previously manual.

An infrastructure superhero might think, “With great accessibility comes great power.” The data center of the future will be the software defined data center where every component can be accessed and manipulated through an API. The proliferation of APIs will change the way people work. Programmers who have never formatted a hard drive will now be able to provision terabytes of data. A web application developer will be able to set up complex load balancing rules without ever logging into a router. IT organizations will start automating the most mundane tasks. Eventually, beautiful applications will be created that mimic the organization’s process and workflow and will automate infrastructure management.

IT Organizations Will Respond and Adapt Accordingly

Of course, this means the IT organization will have to adapt. The new base level of knowledge in IT will eventually include some sort of programming knowledge. Scripted languages like Ruby and Python will soar even higher in popularity. The network administrators will become programmers. The system administrators will become programmers. During this time, DevOps (development + operations) will make serious inroads in the enterprise and silos will be refactored, restructured or flat-out broken down.

Configuration management tools like Chef and Puppet will be the glue for the software defined data center. If done properly, the costs around delivering IT services will be lowered. “Ghosts in the system” will watch all the components (compute, storage, networking, security, etc.) and adapt to changes in real-time to increase utilization, performance, security and quality of service. Monitoring and analytics will be key to realizing this software defined future.

Big Changes in Markets Happen With Very Simple Beginnings

All this amazing innovation comes from two very simple concepts — virtualizing the underlying components and making it accessible through an API.

The IT world might look at the software defined data center and say this is nothing new. We’ve been doing this since the 80s. I disagree. What’s changed is our universal thinking about accessibility. Ten years ago, we wouldn’t have blinked if a networking product came out without an API. Today, an API is part of what we consider a 1.0 release. This thinking is pervasive throughout the data center today with every component. It’s Web 2.0 thinking that shaped cloud computing and now cloud computing is bleeding into enterprise thinking. We’re no longer constrained by the need to have deep specialized knowledge in the low-level components to get basic access to this technology.

With well documented APIs, we have now turned the entire data center into many instruments that can be played by the IT staff (musicians). I imagine the software defined data center to be a Fantasia-like world where Mickey is the IT staff and the brooms are networking, storage, compute and security. The magic is in the coordination, cadence and rhythm of how all the pieces work together. Amazing symphonies of IT will occur in the near future and this is the reason the software defined data center is not a trend to overlook. Maybe Whoopi should take a look at this market instead.

Ben Cherian is a serial entrepreneur who loves playing in the intersection of business and technology. He’s currently the Chief Strategy Officer at Midokura, a network virtualization company. Prior to Midokura, he was the GM of Emerging Technologies at DreamHost, where he ran the cloud business unit. Prior to that, Ben ran a cloud-focused managed services company.

0
Your rating: None
Original author: 
Jon Brodkin

The Linux Foundation has taken control of the open source Xen virtualization platform and enlisted a dozen industry giants in a quest to be the leading software for building cloud networks.

The 10-year-old Xen hypervisor was formerly a community project sponsored by Citrix, much as the Fedora operating system is a community project sponsored by Red Hat. Citrix was looking to place Xen into a vendor-neutral organization, however, and the Linux Foundation move was announced today. The list of companies that will "contribute to and guide the Xen Project" is impressive, including Amazon Web Services, AMD, Bromium, Calxeda, CA Technologies, Cisco, Citrix, Google, Intel, Oracle, Samsung, and Verizon.

Amazon is perhaps the most significant name on that list in regard to Xen. The Amazon Elastic Compute Cloud is likely the most widely used public infrastructure-as-a-service (IaaS) cloud, and it is built on Xen virtualization. Rackspace's public cloud also uses Xen. Linux Foundation Executive Director Jim Zemlin noted in his blog that Xen "is being deployed in public IaaS environments by some of the world's largest companies."

Read 4 remaining paragraphs | Comments

0
Your rating: None
Original author: 
Soulskill

AleX122 writes "I have an idea for a web app. Things I know: I am not the first person with a brilliant idea. Many others 'inventors' failed and it may happen to me, but without trying the outcome will always be failure. That said, the project will be huge if successful. However, I currently do not have money needed to hire developers. I have pretty solid experience in Java, GWT, HTML, Hibernate/Eclipselink, SQL/PLSQL/Oracle. The downside is project nature. All applications I've developed to date were hosted on single server or in small cluster (2 tomcats with fail-over). The application, if I succeed, will have to serve thousands of users simultaneously. The userbase will come from all over the world. (Consider infrastructure requirements similar to a social network.) My questions: What technologies should I use now to ensure easy scaling for a future traffic increase? I need distributed processing and data storage. I would like to stick to open standards, so Google App Engine or a similar proprietary cloud solution isn't acceptable. Since I do not have the resources to hire a team of developers and I will be the first coder, it would be nice if technology used is Java related. However, when you have a hammer, everything looks like a nail, so I am open to technologies unrelated to Java."

Share on Google+

Read more of this story at Slashdot.

0
Your rating: None
Original author: 
Arik Hesseldahl

cloud1Here’s a name I haven’t heard in a while: Anso Labs.

This was the cloud computing startup that originated at NASA, where the original ideas for OpenStack, the open source cloud computing platform, was born. Anso Labs was acquired by Rackspace a little more than two years ago.

It was a small team. But now a lot of the people who ran Anso Labs are back with a new outfit, still devoted to cloud computing, and still devoted to OpenStack. It’s called Nebula. And it builds a turnkey computer that will turn an ordinary rack of servers into a cloud-ready system, running — you guessed it — OpenStack.

Based in Mountain View, Calif., Nebula claims to have an answer for any company that has ever wanted to build its own private cloud system and not rely on outside vendors like Amazon or Hewlett-Packard or Rackspace to run it for them.

It’s called the Nebula One. And the setup is pretty simple, said Nebula CEO and founder Chris Kemp said: Plug the servers into the Nebula One, then you “turn it on and it boots up cloud.” All of the provisioning and management that a service provider would normally charge you for has been created on a hardware device. There are no services to buy, no consultants to pay to set it up. “Turn on the power switch, and an hour later you have a petascale cloud running on your premise,” Kemp told me.

The Nebula One sits at the top of a rack of servers; on its back are 48 Ethernet ports. It runs an operating system called Cosmos that grabs all the memory and storage and CPU capacity from every server in the rack and makes them part of the cloud. It doesn’t matter who made them — Dell, Hewlett-Packard or IBM.

Kemp named two customers: Genentech and Xerox’s research lab, PARC. There are more customer names coming, he says, and it already boasts investments from Kleiner Perkins, Highland Capital and Comcast Ventures. Nebula is also the only startup company that is a platinum member of the OpenStack Foundation. Others include IBM, HP, Rackspace, RedHat and AT&T.

If OpenStack becomes as easy to deploy as Kemp says it can be, a lot of companies — those that can afford to have their own data centers, anyway — are going to have their own clouds. And that is sort of the point.

0
Your rating: None


Leaders in Big Data

Google Tech Talk October 22, 2012 ABSTRACT Discussing the evolution, current opportunities and future trends in big data Presented by Google and the Fung Institute at UC Berkeley SPEAKERS: Moderator: Hal Varian, an economist specializing in microeconomics and information economics. He is the Chief Economist at Google and he holds the title of emeritus professor at the University of California, Berkeley where he was founding dean of the School of Information. Panelists: Theo Vassilakis, Principal Engineer/Engineering Director at Google Gustav Horn, Senior Global Consulting Engineer, Hadoop at NetApp Charles Fan, Senior Vice President at VMware in strategic R&D
From:
GoogleTechTalks
Views:
4980

77
ratings
Time:
58:47
More in
Science & Technology

0
Your rating: None

The tech unit's sign, autographed by its members.

The reelection of Barack Obama was won by people, not by software. But in a contest as close as last week's election, software may have given the Obama for America organization's people a tiny edge—making them by some measures more efficient, better connected, and more engaged than the competition.

That edge was provided by the work of a group of people unique in the history of presidential politics: Team Tech, a dedicated internal team of technology professionals who operated like an Internet startup, leveraging a combination of open source software, Web services, and cloud computing power. The result was the sort of numbers any startup would consider a success. As Scott VanDenPlas, the head of the Obama technology team's DevOps group, put it in a tweet:

4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests. Design, deploy, dismantle in 583 days to elect the President. #madops

Read 53 remaining paragraphs | Comments

0
Your rating: None