Skip navigation

Data collection

warning: Creating default object from empty value in /var/www/vhosts/ on line 33.

Panda: A System for Provenance and Data

Google Tech Talk October 26, 2012 (more info below) Presented by Jennifer Widom, Stanford University. ABSTRACT The goal of the Panda (Provenance and Data) project has been to develop a general-purpose system for modeling, capturing, storing, exploiting, and querying data provenance in a wide range of applications. Abstractly, provenance (also referred to as lineage) describes where data came from and how it has been processed over time. In Panda we consider "data-oriented workflows" whose nodes are arbitrary queries and transformations, challenging us to integrate data-based and process-based provenance, to handle a spectrum from well-understood to opaque transformations, and to develop compositional formalisms and algorithms suitable for arbitrary workflows. On the system side, we strive to enable efficient provenance operations while keeping the capture overhead low. In this talk, we lay the foundations for data-oriented workflows, then discuss how provenance is defined and captured in this environment. We describe the basic provenance-enabled operations of backward tracing, forward tracing, forward propagation, and refresh, and explain how we support these operations in three settings: provenance as general predicates, provenance as attribute mappings, and provenance in workflows composed exclusively of Map and Reduce functions. We briefly describe the prototype Panda system, and we discuss possible follow-on work: extensions to the provenance model and operations <b>...</b>

More in
Science & Technology

Your rating: None

The Stack Exchange

This Q&A is part of a biweekly series of posts highlighting common questions encountered by technophiles and answered by users at Stack Exchange, a free, community-powered network of 80+ Q&A sites.

Robert Harvey asks:

Occasionally I see questions about edge cases on Stack Overflow that are easily answered by the likes of Jon Skeet or Eric Lippert—experts who demonstrate a deep knowledge of a particular language and its many intricacies. Here's an example of this from Lippert's MSDN blog:

You might think that in order to use a foreach loop, the collection you are iterating over must implement IEnumerable or IEnumerable. But as it turns out, that is not actually a requirement. What is required is that the type of the collection must have a public method called GetEnumerator, and that must return some type that has a public property getter called Current and a public method MoveNext that returns a bool. If the compiler can determine that all of those requirements are met then the code is generated to use those methods. Only if those requirements are not met do we check to see if the object implements IEnumerable or IEnumerable.

Read more | Comments

Your rating: None

Jer Thorp, a data artist in residence at The New York Times, shows off some of his work (like this and this) and speaks about the connection between the real world and the mechanical bits we know as data. Worth your 17 minutes.

People often miss this point about data — that it's a representation of the physical world — and because of that, things like uncertainty and complexity come attached to the numbers. There are also actual human beings associated with a lot of data. So while optimization, maximization, and efficiency are well and good, stories, ethics, and lessons are pretty good takeaways, too.

Update: Don't miss the unexpected discussion around data and capitalism.

[Jer Thorp]

Your rating: None Average: 3 (1 vote)


You knew this was coming, right? The New York Times describes the point guard fundamentals — dribble penetration, ball screen, and isolation — of Jeremy Lin in this animated Linfographic. For each play, the players of interest are outlined, and the frame shifts so that you can see where the players have been, relative to where they currently are. It's a simple concept executed well.

I'm familiar with this stuff already, but I imagine this being pretty useful for people just tuning into the game, due to their sudden case of Linsanity. Today's game against Dallas is gonna be a hot ticket.

[New York Times]

Your rating: None

For those interested, I've compiled a follow-up report on the Indie Developer Motivations Survey. It talks about some of the more significant findings, as well as recommendations for studios wishing to improve talent retention.

Your rating: None