Skip navigation
Help

big data

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.
Original author: 
Cory Doctorow

Journeyman Pictures' short documentary "Naked Citizens" is an absolutely terrifying and amazing must-see glimpse of the modern security state, and the ways in which it automatically ascribes guilt to people based on algorithmic inferences, and, having done so, conducts such far-reaching surveillance into its victims' lives that the lack of anything incriminating is treated of proof of being a criminal mastermind:

"I woke up to pounding on my door", says Andrej Holm, a sociologist from the Humboldt University. In what felt like a scene from a movie, he was taken from his Berlin home by armed men after a systematic monitoring of his academic research deemed him the probable leader of a militant group. After 30 days in solitary confinement, he was released without charges. Across Western Europe and the USA, surveillance of civilians has become a major business. With one camera for every 14 people in London and drones being used by police to track individuals, the threat of living in a Big Brother state is becoming a reality. At an annual conference of hackers, keynote speaker Jacob Appelbaum asserts, "to be free of suspicion is the most important right to be truly free". But with most people having a limited understanding of this world of cyber surveillance and how to protect ourselves, are our basic freedoms already being lost?

World - Naked Citizens (Thanks, Dan!)     

0
Your rating: None

Faced with the need to generate ever-greater insight and end-user value, some of the world’s most innovative companies — Google, Facebook, Twitter, Adobe and American Express among them — have turned to graph technologies to tackle the complexity at the heart of their data.

To understand how graphs address data complexity, we need first to understand the nature of the complexity itself. In practical terms, data gets more complex as it gets bigger, more semi-structured, and more densely connected.

We all know about big data. The volume of net new data being created each year is growing exponentially — a trend that is set to continue for the foreseeable future. But increased volume isn’t the only force we have to contend with today: On top of this staggering growth in the volume of data, we are also seeing an increase in both the amount of semi-structure and the degree of connectedness present in that data.

Semi-Structure

Semi-structured data is messy data: data that doesn’t fit into a uniform, one-size-fits-all, rigid relational schema. It is characterized by the presence of sparse tables and lots of null checking logic — all of it necessary to produce a solution that is fast enough and flexible enough to deal with the vagaries of real world data.

Increased semi-structure, then, is another force with which we have to contend, besides increased data volume. As data volumes grow, we trade insight for uniformity; the more data we gather about a group of entities, the more that data is likely to be semi-structured.

Connectedness

But insight and end-user value do not simply result from ramping up volume and variation in our data. Many of the more important questions we want to ask of our data require us to understand how things are connected. Insight depends on us understanding the relationships between entities — and often, the quality of those relationships.

Here are some examples, taken from different domains, of the kinds of important questions we ask of our data:

  • Which friends and colleagues do we have in common?
  • What’s the quickest route between two stations on the metro?
  • What do you recommend I buy based on my previous purchases?
  • Which products, services and subscriptions do I have permission to access and modify? Conversely, given this particular subscription, who can modify or cancel it?
  • What’s the most efficient means of delivering a parcel from A to B?
  • Who has been fraudulently claiming benefits?
  • Who owns all the debt? Who is most at risk of poisoning the financial markets?

To answer each of these questions, we need to understand how the entities in our domain are connected. In other words, these are graph problems.

Why are these graph problems? Because graphs are the best abstraction we have for modeling and querying connectedness. Moreover, the malleability of the graph structure makes it ideal for creating high-fidelity representations of a semi-structured domain. Traditionally relegated to the more obscure applications of computer science, graph data models are today proving to be a powerful way of modeling and interrogating a wide range of common use cases. Put simply, graphs are everywhere.

Graph Databases

Today, if you’ve got a graph data problem, you can tackle it using a graph database — an online transactional system that allows you to store, manage and query your data in the form of a graph. A graph database enables you to represent any kind of data in a highly accessible, elegant way using nodes and relationships, both of which may host properties:

  • Nodes are containers for properties, which are key-value pairs that capture an entity’s attributes. In a graph model of a domain, nodes tend to be used to represent the things in the domain. The connections between these things are expressed using relationships.
  • A relationship has a name and a direction, which together lend semantic clarity and context to the nodes connected by the relationship. Like nodes, relationships can also contain properties: Attaching one or more properties to a relationship allows us to weight that relationship, or describe its quality, or otherwise qualify its applicability for a particular query.

The key thing about such a model is that it makes relations first-class citizens of the data, rather than treating them as metadata. As real data points, they can be queried and understood in their variety, weight and quality: Important capabilities in a world of increasing connectedness.

Graph Databases in Practice

Today, the most innovative organizations are leveraging graph databases as a way to solve the challenges around their connected data. These include major names such as Google, Facebook, Twitter, Adobe and American Express. Graph databases are also being used by organizations in a range of fields including finance, education, web, ISV and telecom and data communications.

The following examples offer use case scenarios of graph databases in practice.

  • Adobe Systems currently leverages a graph database to provide social capabilities to its Creative Cloud — a new array of services to media enthusiasts and professionals. A graph offers clear advantages in capturing Adobe’s rich data model fully, while still allowing for high performance queries that range from simple reads to advanced analytics. It also enables Adobe to store large amounts of connected data across three continents, all while maintaining high query performance.
  • Europe’s No. 1 professional network, Viadeo, has integrated a graph database to store all of its users and relationships. Viadeo currently has 40 million professionals in its network and requires a solution that is easy to use and capable of handling major expansion. Upon integrating a graph model, Viadeo has accelerated its system performance by more than 200 percent.
  • Telenor Group is one of the top ten wireless Telco companies in the world, and uses a graph database to manage its customer organizational structures. The ability to model and query complex data such as customer and account structures with high performance has proven to be critical to Telenor’s ongoing success.

An access control graph. Telenor uses a similar data model to manage products and subscriptions.

An access control graph. Telenor uses a similar data model to manage products and subscriptions.

  • Deutsche Telekom leverages a graph database for its highly scalable social soccer fan website attracting tens of thousands of visitors during each soccer match, where it provides painless data modeling, seamless data model extendibility, and high performance and reliability.
  • Squidoo is the popular social publishing platform where users share their passions. They recently created a product called Postcards, which are single-page, beautifully designed recommendations of books, movies, music albums, quotes and other products and media types. A graph database ensures that users have an awesome experience as it provides a primary data store for the Postcards taxonomy and the recommendation engine for what people should be doing next.

Such examples prove the pervasiveness of connections within data and the power of a graph model to optimally map relationships. A graph database allows you to further query and analyze such connections to provide greater insight and end-user value. In short, graphs are poised to deliver true competitive advantage by offering deeper perspective into data as well as a new framework to power today’s revolutionary applications.

A New Way of Thinking

Graphs are a new way of thinking for explicitly modeling the factors that make today’s big data so complex: Semi-structure and connectedness. As more and more organizations recognize the value of modeling data with a graph, they are turning to the use of graph databases to extend this powerful modeling capability to the storage and querying of complex, densely connected structures. The result is the opening up of new opportunities for generating critical insight and end-user value, which can make all the difference in keeping up with today’s competitive business environment.

Emil is the founder of the Neo4j open source graph database project, which is the most widely deployed graph database in the world. As a life-long compulsive programmer who started his first free software project in 1994, Emil has with horror witnessed his recent degradation into a VC-backed powerpoint engineer. As the CEO of Neo4j’s commercial sponsor Neo Technology, Emil is now mainly focused on spreading the word about the powers of graphs and preaching the demise of tabular solutions everywhere. Emil presents regularly at conferences such as JAOO, JavaOne, QCon and OSCON.

0
Your rating: None


Jeremy Kun, a mathematics PhD student at the University of Illinois in Chicago, has posted a wonderful primer on probability theory for programmers on his blog. It's a subject vital to machine learning and data-mining, and it's at the heart of much of the stuff going on with Big Data. His primer is lucid and easy to follow, even for math ignoramuses like me.

For instance, suppose our probability space is \Omega = \left \{ 1, 2, 3, 4, 5, 6 \right \} and f is defined by setting f(x) = 1/6 for all x \in \Omega (here the “experiment” is rolling a single die). Then we are likely interested in more exquisite kinds of outcomes; instead of asking the probability that the outcome is 4, we might ask what is the probability that the outcome is even? This event would be the subset \left \{ 2, 4, 6 \right \}, and if any of these are the outcome of the experiment, the event is said to occur. In this case we would expect the probability of the die roll being even to be 1/2 (but we have not yet formalized why this is the case).

As a quick exercise, the reader should formulate a two-dice experiment in terms of sets. What would the probability space consist of as a set? What would the probability mass function look like? What are some interesting events one might consider (if playing a game of craps)?

Probability Theory — A Primer

(Image: Dice, a Creative Commons Attribution (2.0) image from artbystevejohnson's photostream)

0
Your rating: None

Rick Smolan, creator of the epic “Day in the Life” photography books, is taking on a new challenge: Big data.

“Big data” has become a buzzphrase many people like to hate for its vagueness, but Smolan’s book format brings out all sorts of specificity and examples.

© Joe McNally 2012 / from The Human Face of Big Data

His new 7.5-pound book, “The Human Face of Big Data,” includes vignettes about wirelessly sensing disproportionate electricity and water consumption by individual home appliances, restoring human sight with a pair of computer eyeglasses that analyze light and other input in real-time, predicting repeat heart attacks by screening large samples of patients’ EKG data, and taking personal health tracking to the extreme. It will be released Nov. 20.

Smolan has been creating these massive photography projects for the last 30 years, but they’re usually about more naturally visual subjects, most recently President Obama and global water problems.

“This is the most difficult set of assignments I’ve ever worked on,” he told me. “How do you photograph data?”

Smolan also said he is well aware that the next step beyond “big data” is often thought to be “big brother.” He said the aim of the project is to get people to talk about the potential for big data, without ignoring the privacy implications.

While the book may be a static piece of work, Smolan is also trying to create a participatory experience that generates its own data, hopefully a big amount of it. Before the book comes out, he is releasing a Human Face of Big Data app for iOS and Android that asks people to measure themselves from Sept. 25 to Oct. 2.

The app will collect data about each user implicitly from smartphone sensors as well explicitly through quizzes, with everything promised to be anonymized (though I’m not clear on how exactly that will happen, given the depth of access a smartphone has to its owner’s activities).

For example, the app might count the number of contacts in people’s phone address books or track how far they travel in a single day. Then it will inform users about their “data doppelgangers” with similar attributes somewhere else in the world.

At the end of the week, all the data will be made available to scientists at Webcast “Big Data Lab” events in New York City, London and Singapore. And there’s a whole bunch of more ambitious (dare I say big) ideas beyond that, including a kids’ education day and a documentary film.

0
Your rating: None

Ben sez, "I want to share a short documentary that I recently produced about the hidden Infrastructure of the Internet called Bundled, Buried and Behind Closed Doors. The video is meant to remind viewers that the Internet is a physical, geographically anchored thing. It features a tour inside Telx's 9th floor Internet exchange at 60 Hudson Street in New York City, and explores how this building became one of the world's most concentrated hubs of Internet connectivity."

Lower Manhattan’s 60 Hudson Street is one of the world’s most concentrated hubs of Internet connectivity. This short documentary peeks inside, offering a glimpse of the massive material infrastructure that makes the Internet possible.

Featuring interviews with Stephen Graham, Saskia Sassen, Dave Timmes of Telx, Rich Miller of datacenterknowledge.com, Stephen Klenert of Atlantic Metro Communications, and Josh Wallace of the City of Palo Alto Utilities.

Bundled, Buried & Behind Closed Doors

(Thanks, Ben!)

0
Your rating: None