Skip navigation
Help

Computing

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.
Original author: 
Cyrus Farivar

Aurich Lawson

This is the first in a two-part series exploring Butterfly Labs and its lineup of dedicated Bitcoin-mining hardware. In part one, we look at the company and the experiences customers have had with it. In part two, to be published on June 29, we share our experiences running a Bitcoin miner for a couple weeks. Spoiler alert: we made money.

The more I dig into Bitcoin, the stranger it gets. There’s gray-market online gambling and Russian-operated futures markets—to say nothing of the virtual currency’s wild ride over the last several months. It’s full of characters with names like “artforz” and “Tycho,” supposedly two of the largest Bitcoin holders out there. Of course, like most things Bitcoin, it’s nearly impossible to know for sure.

While reporting on a Bitcoin-based gambling story earlier this year, I interviewed Bryan Micon, who works with a Bitcoin-based poker site called Seals With Clubs. (To continue the lack of information, Micon won’t say who owns the site.) Micon has taken it upon himself to investigate what he believes are Bitcoin-related scams—such as the ill-fated Bitcoin Savings and Trust online bank—and he makes public pronouncements about them.

Read 43 remaining paragraphs | Comments

0
Your rating: None
Original author: 
Todd Hoff

Erasure codes are one of those seemingly magical mathematical creations that with the developments described in the paper XORing Elephants: Novel Erasure Codes for Big Data, are set to replace triple replication as the data storage protection mechanism of choice.

The result says Robin Harris (StorageMojo) in an excellent article, Facebook’s advanced erasure codes: "WebCos will be able to store massive amounts of data more efficiently than ever before. Bad news: so will anyone else."

Robin says with cheap disks triple replication made sense and was economical. With ever bigger BigData the overhead has become costly. But erasure codes have always suffered from unacceptably long time to repair times. This paper describes new Locally Repairable Codes (LRCs) that are efficiently repairable in disk I/O and bandwidth requirements:

These systems are now designed to survive the loss of up to four storage elements – disks, servers, nodes or even entire data centers – without losing any data. What is even more remarkable is that, as this paper demonstrates, these codes achieve this reliability with a capacity overhead of only 60%.

They examined a large Facebook analytics Hadoop cluster of 3000 nodes with about 45 PB of raw capacity. On average about 22 nodes a day fail, but some days failures could spike to more than 100.

LRC test results found several key results.

  • Disk I/O and network traffic were reduced by half compared to RS codes.
  • The LRC required 14% more storage than RS, information theoretically optimal for the obtained locality.
  • Repairs times were much lower thanks to the local repair codes.
  • Much greater reliability thanks to fast repairs.
  • Reduced network traffic makes them suitable for geographic distribution.
  • LRC test results found several key results.
  • Disk I/O and network traffic were reduced by half compared to RS codes.

I wonder if we'll see a change in NoSQL database systems as well? 

Related Articles

0
Your rating: None
Original author: 
Jacob Kastrenakes

Untitled-2_large

One of the biggest personal data collectors around is getting ready to open its vaults to the public. According to Forbes, you'll soon be able to request your personal files from Acxiom, a marketing company that holds a database on the interests and details of over 700 million people. That database reportedly holds information on consumers' occupations, phone numbers, religions, shopping habits, and health issues, to name a few. That data has traditionally been given only to marketers — for a fee, of course — but Acxiom has decided to let consumers peer into its database as well. Whether individuals will have to pay too is still up for debate, but it's been decided that a person can only view their own file.

Continue reading…

0
Your rating: None
Original author: 
Todd Hoff

This is a guest post by Yelp's Jim Blomo. Jim manages a growing data mining team that uses Hadoop, mrjob, and oddjob to process TBs of data. Before Yelp, he built infrastructure for startups and Amazon. Check out his upcoming talk at OSCON 2013 on Building a Cloud Culture at Yelp.

In Q1 2013, Yelp had 102 million unique visitors (source: Google Analytics) including approximately 10 million unique mobile devices using the Yelp app on a monthly average basis. Yelpers have written more than 39 million rich, local reviews, making Yelp the leading local guide on everything from boutiques and mechanics to restaurants and dentists. With respect to data, one of the most unique things about Yelp is the variety of data: reviews, user profiles, business descriptions, menus, check-ins, food photos... the list goes on.  We have many ways to deal data, but today I’ll focus on how we handle offline data processing and analytics.

In late 2009, Yelp investigated using Amazon’s Elastic MapReduce (EMR) as an alternative to an in-house cluster built from spare computers.  By mid 2010, we had moved production processing completely to EMR and turned off our Hadoop cluster.  Today we run over 500 jobs a day, from integration tests to advertising metrics.  We’ve learned a few lessons along the way that can hopefully benefit you as well.

Job Flow Pooling

0
Your rating: None
Original author: 
Cesar Torres


Tumblr Creative Director Peter Vidani

Cesar Torres

New York City noise blares right outside Tumblr’s office in the Flat Iron District in Manhattan. Once inside, the headquarters hum with a quiet intensity. I am surrounded by four dogs that employees have brought to the workspace today. Apparently, there are even more dogs lurking somewhere behind the perpendicular rows of desks. What makes the whole thing even spookier is that these dogs don’t bark or growl. It’s like someone’s told them that there are developers and designers at work, and somehow they’ve taken the cue.

I’m here to see Tumblr’s Creative Director Peter Vidani who is going to pull the curtain back on the design process and user experience at Tumblr. And when I say design process, I don’t just mean color schemes or typefaces. I am here to see the process of interaction design: how the team at Tumblr comes up with ideas for the user interface on its website and its mobile apps. I want to find out how those ideas are shaped into a final product by their engineering team.

Back in May, Yahoo announced it was acquiring Tumblr for $1.1 billion. Yahoo indicated that Tumblr would continue to operate independently, though we will probably see a lot of content crossover between the millions of blog posts hosted by Tumblr and Yahoo’s search engine technology. It’s a little known fact that Yahoo has provided some useful tools for UX professionals and developers over the years through their Design Pattern Library, which shares some of Yahoo’s most successful and time-tested UI touches and interactions with Web developers. It’s probably too early to tell if Tumblr’s UI elements will filter back into these libraries. In the meantime, I talked to Vidani about how Tumblr UI features come to life.

Read 9 remaining paragraphs | Comments

0
Your rating: None
Original author: 
Stack Exchange

Stack Exchange

This Q&A is part of a weekly series of posts highlighting common questions encountered by technophiles and answered by users at Stack Exchange, a free, community-powered network of 100+ Q&A sites.

Stack Exchange user and C# developer George Powell tries hard to follow the DRY principle. But as any good dev knows, it's not always possible, or even optimal, to stay original. Powell writes:

Often I write small methods (maybe 10 to 15 lines of code) that need to be reused across two projects that can't reference each other. The method might be something to do with networking / strings / MVVM etc. and is a generally useful method not specific to the project it originally sits in.

So how should you track shared snippets across projects so you know where your canonical code resides and know where it's in production when a bug needs to be fixed?

Read 15 remaining paragraphs | Comments

0
Your rating: None
Original author: 
Cyrus Farivar


Smári McCarthy, in his Twitter bio, describes himself as a "Information freedom activist. Executive Director of IMMI. Pirate."

SHARE Conference

On Friday, two Icelandic activists with previous connections to WikiLeaks announced that they received newly unsealed court orders from Google. Google sent the orders earlier in the week, revealing that the company searched and seized data from their Gmail accounts—likely as a result of a grand jury investigation into the rogue whistleblower group.

Google was forbidden under American law from disclosing these orders to the men until the court lifted this restriction in early May 2013. (A Google spokesperson referred Ars to its Transparency Report for an explanation of its policies.)

On June 21, 2013, well-known Irish-Icelandic developer Smári McCarthy published his recently un-sealed court order dating back to July 14, 2011. Google sent him the order, which included McCarthy's Gmail account metadata, the night before. The government cited the Stored Communications Act (SCA)(specifically a 2703(d) order) as grounds to provide this order.

Read 8 remaining paragraphs | Comments

0
Your rating: None
Original author: 
Johnny Chung Lee


A little less than than a year ago, I transfered to a new group within Motorola called Advanced Technology and Projects (ATAP) which was setup after the Google acquisition of Motorola last year (yes, Google owns Motorola now).

The person hired to run this new group is Regina Dugan, who was previously the director of the Defense Advanced Research and Projects Agency (DARPA). This is the same organization that funded projects such as ARPANET, the DARPA Grand Challenge, Mother of All Demos, Big Dog, CALO (which evolved into Apple's Siri), Exoskeletons, and Hypersonic Vehicles that could reach any point on earth in 60 minutes.

It's a place with big ideas powered by big science.

The philosophy behind Motorola ATAP is to create an organization with the same level of appetite for technology advancement as DARPA, but with a consumer focus. It is a pretty interesting place to be.

One of the ways DARPA was capable of having such a impressive portfolio of projects is because they work heavily with outside research organizations in both industry and academia.  If you talk to a university professor or graduate student in engineering, there is a very good chance their department has a DARPA funded project.  However, when companies want to work with universities, it has always been notoriously difficult to get through the paperwork of putting research collaborations in place due to long legal discussions over IP ownership and commercialization terms lasting several months.

To address this issue head on, ATAP created a Multi-University Research Agreement (MURA). A single document that every university partner could sign to accelerate the collaboration between ATAP and research institutions, reducing the time to engage academic research partners from several months to a couple weeks. The agreement has been signed by Motorola, California Institute of Technology, Carnegie Mellon University, Harvard University, University of Illinois at Urbana-Champaign, Massachusetts Institute of Technology, Stanford University, Texas A&M University, and Virginia Tech.  As we engage more research partners, their signatures will be added to the same document.

"The multi-university agreement is really the first of its kind," said Kaigham J. Gabriel, vice president and deputy director of ATAP. "Such an agreement has the potential to be a national model for how companies and universities work together to speed innovation and US competitiveness, while staying true to their individual missions and cultures."

This may seem a little dry.  But to me, what it means is that I can approach some of the smartest people in the country and ask, "do you want to build the future together?" and all they have to say is, "yes."

Let's do it.

Full press release here.

0
Your rating: None
Original author: 
Todd Hoff

The paper MegaPipe: A New Programming Interface for Scalable Network I/O (video, slides) hits the common theme that if you want to go faster you need a better car design, not just a better driver. So that's why the authors started with a clean-slate and designed a network API from the ground up with support for concurrent I/O, a requirement for achieving high performance while scaling to large numbers of connections per thread, multiple cores, etc.  What they created is MegaPipe, "a new network programming API for message-oriented workloads to avoid the performance issues of BSD Socket API."

The result: MegaPipe outperforms baseline Linux between 29% (for long connections) and 582% (for short connections). MegaPipe improves the performance of a modified version of memcached between 15% and 320%. For a workload based on real-world HTTP traces, MegaPipe boosts the throughput of nginx by 75%.

What's this most excellent and interesting paper about?

0
Your rating: None