Skip navigation
Help

Data

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.

For this rainy Labor Day, here's an uplifting talk by DataKind founder Jake Porway. He talks data and how it can make a worthwhile difference in areas that could use a change.

0
Your rating: None

Abstract
This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data stored in a Hadoop
cluster using the standard SQL query language. Unlike other database systems that provide only a relational view over HDFSresident data through the use of an external table mechanism, Polybase employs a split query processing paradigm in which
SQL operators on HDFS-resident data are translated into MapReduce jobs by the PDW query optimizer and then executed on the Hadoop cluster. The paper describes the design and implementation of Polybase along with a thorough performance evaluation that explores the benefits of employing a split query processing paradigm for executing queries that involve both structured data in a relational DBMS and unstructured data in Hadoop. Our results demonstrate that while the use of a splitbased query execution paradigm can improve the performance of some queries by as much as 10X, one must employ a cost-based query optimizer that considers a broad set of factors when deciding whether or not it is advantageous to push a SQL operator to Hadoop. These factors include the selectivity factor of the predicate, the relative sizes of the two clusters, and whether or not their nodes are co-located. In addition, differences in the semantics of the Java and SQL languages must be carefully considered in order to avoid altering the expected results of a query.

Link to the paper

0
Your rating: None
Original author: 
Jon Brodkin


Can Google's QUIC be faster than Mega Man's nemesis, Quick Man?

Josh Miller

Google, as is its wont, is always trying to make the World Wide Web go faster. To that end, Google in 2009 unveiled SPDY, a networking protocol that reduces latency and is now being built into HTTP 2.0. SPDY is now supported by Chrome, Firefox, Opera, and the upcoming Internet Explorer 11.

But SPDY isn't enough. Yesterday, Google released a boatload of information about its next protocol, one that could reshape how the Web routes traffic. QUIC—standing for Quick UDP Internet Connections—was created to reduce the number of round trips data makes as it traverses the Internet in order to load stuff into your browser.

Although it is still in its early stages, Google is going to start testing the protocol on a "small percentage" of Chrome users who use the development or canary versions of the browser—the experimental versions that often contain features not stable enough for everyone. QUIC has been built into these test versions of Chrome and into Google's servers. The client and server implementations are open source, just as Chromium is.

Read 11 remaining paragraphs | Comments

0
Your rating: None
Original author: 
Lee Aylward


One of Ars Technica's many memcached server graphs. Look at all those misses!

This week, memcached, a piece of software that prevents much of the Internet from melting down, turns 10 years old. Despite its age, memcached is still the go-to solution for many programmers and sysadmins managing heavy workloads. Without memcached, Ars Technica would likely be unable to serve this article to you at all.

Brad Fitzpatrick wrote memcached for LiveJournal way back in 2003 (check out the initial CVS commit here). While waiting for new hardware to help save the site from being overloaded, Fitzpatrick realized that he had plenty of unused RAM spread across LiveJournal's existing servers. He wrote memcached to take advantage of this spare memory and lighten the load on the site.

memcached is a distributed in-memory key-value store that uses a very simple protocol for storing and retrieving arbitrary data from memory instead of from a filesystem. To store a value, a program connects to the memcached server on the default port of 11211 and issues a series of basic commands. (Note: a binary protocol is also supported.)

Read 4 remaining paragraphs | Comments

0
Your rating: None
Original author: 
Stack Exchange

Stack Exchange

This Q&A is part of a weekly series of posts highlighting common questions encountered by technophiles and answered by users at Stack Exchange, a free, community-powered network of 100+ Q&A sites.

Dokkat appears to think that databases are overused. "Instead of a database, I just serialize my data to JSON, saving and loading it to disk when necessary," he writes. "All the data management is made on the program itself, which is faster AND easier than using SQL queries." What is missing here? Why should a developer use a database when saving data to a disk might work just as well?

See the original question here.

Read 18 remaining paragraphs | Comments

0
Your rating: None
Original author: 
samzenpus

redletterdave writes "Thanks to a newly developed audio extraction technology called optical scanning, the Smithsonian was able to recover the voice of Alexander Graham Bell from one of his hundreds of discs he donated to the museum, which were once considered 'mute artifacts.' Since many of the collected recordings are very fragile due to their age and experimental nature, optical scanning is a non-invasive procedure that creates a high-resolution digital map of the disc or cylinder, which is then reconstructed and used to simulate the motion of a stylus moving through its grooves to reproduce the original audio content. Bell, who created this recording on a wax and cardboard disc on April 15, 1885, can be heard clearly saying, 'In witness whereof — hear my voice, Alexander Graham Bell.'"

Share on Google+

Read more of this story at Slashdot.

0
Your rating: None
Original author: 
Nathan Yau

This video clearly describes the distribution of wealth in America using a set of transitioning charts. The graphics are good. The explanation is better.

0
Your rating: None