Skip navigation

Web services

warning: Creating default object from empty value in /var/www/vhosts/ on line 33.
Original author: 

Moving from physical servers to the "cloud" involves a paradigm shift in thinking. Generally in a physical environment you care about each invididual host; they each have their own static IP, you probably monitor them individually, and if one goes down you have to get it back up ASAP. You might think you can just move this infrastructure to AWS and start getting the benefits of the "cloud" straight away. Unfortunately, it's not quite that easy (believe me, I tried). You need think differently when it comes to AWS, and it's not always obvious what needs to be done.

So, inspired by Sehrope Sarkuni's recent post, here's a collection of AWS tips I wish someone had told me when I was starting out. These are based on things I've learned deploying various applications on AWS both personally and for my day job. Some are just "gotcha"'s to watch out for (and that I fell victim to), some are things I've heard from other people that I ended up implementing and finding useful, but mostly they're just things I've learned the hard way.

Your rating: None

The tech unit's sign, autographed by its members.

The reelection of Barack Obama was won by people, not by software. But in a contest as close as last week's election, software may have given the Obama for America organization's people a tiny edge—making them by some measures more efficient, better connected, and more engaged than the competition.

That edge was provided by the work of a group of people unique in the history of presidential politics: Team Tech, a dedicated internal team of technology professionals who operated like an Internet startup, leveraging a combination of open source software, Web services, and cloud computing power. The result was the sort of numbers any startup would consider a success. As Scott VanDenPlas, the head of the Obama technology team's DevOps group, put it in a tweet:

4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests. Design, deploy, dismantle in 583 days to elect the President. #madops

Read 53 remaining paragraphs | Comments

Your rating: None

Not everyone wants to run their applications on the public cloud. Their reasons can vary widely. Some companies don’t want the crown jewels of their intellectual property leaving the confines of their own premises. Some just like having things run on a server they can see and touch.

But there’s no denying the attraction of services like Amazon Web Services or Joyent or Rackspace, where you can spin up and configure a new virtual machine within minutes of figuring out that you need it. So, many companies seek to approximate the experience they would get from a public cloud provider on their own internal infrastructure.

It turns out that a start-up I had never heard of before this week is the most widely deployed platform for running these “private clouds,” and it’s not a bad business. Eucalyptus Systems essentially enables the same functionality on your own servers that you would expect from a cloud provider.

Eucalyptus said today that it has raised a $30 million Series C round of venture capital funding led by Institutional Venture Partners. Steve Harrick, general partner at IVP, will join the Eucalyptus board. Existing investors, including Benchmark Capital, BV Capital and New Enterprise Associates, are also in on the round. The funding brings Eucalyptus’ total capital raised to north of $50 million.

The company has an impressive roster of customers: Sony, Intercontinental Hotels, Raytheon, and the athletic-apparel group Puma. There are also several government customers, including the U.S. Food and Drug Administration, NASA, the U.S. Department of Agriculture and the Department of Defense.

In March, Eucalyptus signed a deal with Amazon to allow customers of both to migrate their workloads between the private and public environments. The point here is to give companies the flexibility they need to run their computing workloads in a mixed environment, or move them back and forth as needed. They could also operate them in tandem.

Key to this is a provision of the deal with Amazon that gives Eucalyptus access to Amazon’s APIs. What that means is that you can run processes on your own servers that are fully compatible with Amazon’s Simple Storage Service (S3), or its Elastic Compute cloud, known as EC2. “We’ve removed all the hurdles that might have been in the way of moving workloads,” Eucalyptus CEO Marten Mickos told me. The company has similar deals in place with Wipro Infotech in India and CETC32 in China.

Your rating: None


Amazon has released some fairly impressive numbers showcasing the growth of Amazon Simple Storage Service (S3) over the years. By the end of the first quarter of 2012, there were 905 billion objects stored, and the service routinely handles 650,000 requests per second for those objects, with peaks that go even higher. To put that in perspective, that’s up from 262 billion objects stored just two years ago and up from 762 billion by Q4 2011.

Or maybe it’s more impressive when you look further back: 2.9 billion in 2006, for example. And how fast is it growing? Well, says Amazon, every day, over a a billion objects are added. That’s how fast.

The S3 object counts grows even when Amazon recently added ways to make it easier for objects to leave, including through object expiration and multi-object deletion. The objects are added via S3 APIsAWS Import/Export, the AWS Storage Gateway, various backup tools, and through Direct Connect pipes.

Note, that the above chart shows Q4 data up until this year, as Amazon only has data up to Q1. So that’s not any sort of slowdown you’re seeing there – by Q4 2012, that number is going to be much, much higher.

Your rating: None

Christopher Brown Opscode

You never know how big something will be while you're working on it, says Christopher Brown, one of the guys that helped build Amazon's cloud. 

Brown is now the CTO of a hot startup he co-founded, Opscode. But along the way he worked at Microsoft three times and at Amazon once -- just long enough to help build EC2.

The time was 2004. Amazon already had cutting edge tech. "Amazon is a high tech company that just looks like it sells books," he laughs.

The powers that be wanted to somehow make money on their IT.

Rick Dalzell (CIO at the time) and Chris Pinkham (the vice president of IT Infrastructure) had been pondering a paper written by Amazon website engineer Ben Black that summarized the idea, says Brown.

CEO Jeff Bezos needed almost no convincing. He "was on board from the beginning," says Brown. The team "had a plan to build and sell it as a service sell from Day 1," he says.

There was a catch. Pinkham was moving back to his home country, South Africa. But Amazon convinced him to keep his job and build EC2 from there. 

So Pinkham invited Brown to come with him. Brown packed up his family and left the U.S.

They assembled a team in South Africa and for two years worked in Cape Town. Brown says. "From our corner, we had no idea EC2 was going to be this big."

Now that EC2 has become the 800-pound cloud gorilla, Brown says he still impressed with it. "It's part of Amazon's culture, the way they stay ahead of the competition."

Please follow SAI: Enterprise on Twitter and Facebook.

Join the conversation about this story »

See Also:

Your rating: None

Maybe you're a Dropbox devotee. Or perhaps you really like streaming Sherlock on Netflix. For that, you can thank the cloud.

In fact, it's safe to say that Amazon Web Services (AWS) has become synonymous with cloud computing; it's the platform on which some of the Internet's most popular sites and services are built. But just as cloud computing is used as a simplistic catchall term for a variety of online services, the same can be said for AWS—there's a lot more going on behind the scenes than you might think.

If you've ever wanted to drop terms like EC2 and S3 into casual conversation (and really, who doesn't?) we're going to demystify the most important parts of AWS and show you how Amazon's cloud really works.

Read the rest of this article...

Read the comments on this post

Your rating: None

The Web of Data is built upon two simple ideas: First, to employ the RDF
data model to publish structured data on the Web. Second, to set explicit RDF links
between data items within different data sources. Background information about the Web of Data is found at the wiki pages of the W3C Linking Open Data community effort,
in the overview article Linked Data - The Story So Far
and in the tutorial on How to publish Linked Data on the Web.

The Silk Link Discovery Framework supports data publishers in accomplishing the
second task. Using the declarative Silk - Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must
fulfill in order to be interlinked. These link conditions may combine various similarity
metrics and can take the graph around a data item into account, which is addressed
using an RDF path language. Silk accesses the data sources that should be interlinked via the SPARQL protocol and can thus be used against local as well as remote SPARQL endpoints.

Silk is provided in three different variants which address different use cases:

  • Silk Single Machine is used to generate RDF links on a single machine. The datasets that should be interlinked can either reside on the same machine or on remote machines which are accessed via the SPARQL protocol. Silk Single Machine provides multithreading and caching. In addition, the performance is further enhanced using the MultiBlock blocking algorithm.
  • Silk MapReduce is used to generate RDF links between data sets using a cluster of multiple machines. Silk MapReduce is based on Hadoop and can for instance be run on Amazon Elastic MapReduce. Silk MapReduce enables Silk to scale out to very big datasets by distributing the link generation to multiple machines.
  • Silk Server can be used as an identity resolution component within applications that consume Linked Data from the Web. Silk Server provides an HTTP API for matching entities from an incoming stream of RDF data while keeping track of known entities. It can be used for instance together with a Linked Data crawler to populate a local duplicate-free cache with data from the Web.

All variants are based on the Silk Link Discovery Engine which offers the following features:

  • Flexible, declarative language for specifying linkage rules
  • Support of RDF link generation (owl:sameAs links as well as other types)
  • Employment in distributed environments (by accessing local and remote SPARQL endpoints)
  • Usable in situations where terms from different vocabularies are mixed and where no consistent RDFS or OWL schemata exist
  • Scalability and high performance through efficient data handling (speedup factor of 20 compared to Silk 0.2):
    • Reduction of network load by caching and reusing of SPARQL result sets
    • Multi-threaded computation of the data item comparisons (3 million comparisons per minute on a Core2 Duo)
    • Optional blocking of data items
Your rating: None