Help

# computer science

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.

## Impossible Programs: a great lecture on some of computer science's most important subjects

Original author:
Cory Doctorow

Here's a 40-minute video in which Tom Stuart gives a talk summarizing one of the chapters from him new book Understanding Computation, describing the halting state problem and how it relates to bugs, Turing machines, Turing completeness, computability, malware checking for various mobile app stores, and related subjects. The Halting State problem -- which relates to the impossibility of knowing what a program will do with all possible inputs -- is one of the most important and hardest-to-understand ideas in computer science, and Stuart does a fantastic job with it here. You don't need to be a master programmer or a computer science buff to get it, and even if you only absorb 50 percent of it, it's so engagingly presented, and so blazingly relevant to life in the 21st century, that you won't regret it.

At Scottish Ruby Conference 2013 I gave a talk called Impossible Programs, adapted from chapter 8 of Understanding Computation. It’s a talk about programs that are impossible to write in Ruby — it covers undecidability, the halting problem and Rice’s theorem, explained in plain English and illustrated with Ruby code. The slides are available

Original author:
Cory Doctorow

Thearn released a free/open program for detecting and monitoring your pulse using your webcam. The code is on github for you to download, play with and modify. If this stuff takes your fancy, be sure and read Eulerian Video Magnification for Revealing Subtle Changes in the World, an inspiring paper describing the techniques Thearn uses in his code:

This application uses openCV (http://opencv.org/) to find the location of the user's face, then isolate the forehead region. Data is collected from this location over time to estimate the user's heartbeat frequency. This is done by measuring average optical intensity in the forehead location, in the subimage's green channel alone. Physiological data can be estimated this way thanks to the optical absorbtion characteristics of oxygenated hemoglobin.

With good lighting and minimal noise due to motion, a stable heartbeat should be isolated in about 15 seconds. Other physiological waveforms, such as Mayer waves (http://en.wikipedia.org/wiki/Mayer_waves), should also be visible in the raw data stream.

Once the user's pulse signal has been isolated, temporal phase variation associated with the detected hearbeat frequency is also computed. This allows for the heartbeat frequency to be exaggerated in the post-process frame rendering; causing the highlighted forhead location to pulse in sync with the user's own heartbeat (in real time).

Support for pulse-detection on multiple simultaneous people in an camera's image stream is definitely possible, but at the moment only the information from one face is extracted for cardiac analysis

## Solving classic NES games computationally

Original author:
Cory Doctorow

Dr. Tom Murphy VII gave a research paper called "The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel . . . after that it gets a little tricky," (PDF) (source code) at SIGBOVIK 2013, in which he sets out a computational method for solving classic NES games. He devised two libraries for this: learnfun (learning fuction) and playfun (playing function). In this accompanying video, he chronicles the steps and missteps he took getting to a pretty clever destination.

## Algorithmically constructed news

Original author:
Cory Doctorow

In Wired, Steven Levy has a long profile of the fascinating field of algorithmic news-story generation. Levy focuses on Narrative Science, and its competitor Automated Insights, and discusses how the companies can turn "data rich" streams into credible news-stories whose style can be presented as anything from sarcastic blogger to dry market analyst. Narrative Science's cofounder, Kristian Hammond, claims that 90 percent of all news will soon be algorithmically generated, but that this won't be due to computers stealing journalists' jobs -- rather, it will be because automation will enable the creation of whole classes of news stories that don't exist today, such as detailed, breezy accounts of every little league game in the country.

Narrative Science’s writing engine requires several steps. First, it must amass high-quality data. That’s why finance and sports are such natural subjects: Both involve the fluctuations of numbers—earnings per share, stock swings, ERAs, RBI. And stats geeks are always creating new data that can enrich a story. Baseball fans, for instance, have created models that calculate the odds of a team’s victory in every situation as the game progresses. So if something happens during one at-bat that suddenly changes the odds of victory from say, 40 percent to 60 percent, the algorithm can be programmed to highlight that pivotal play as the most dramatic moment of the game thus far. Then the algorithms must fit that data into some broader understanding of the subject matter. (For instance, they must know that the team with the highest number of “runs” is declared the winner of a baseball game.) So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?

Then comes the structure. Most news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles. To construct sentences, the algorithms use vocabulary compiled by the meta-writers. (For baseball, the meta-writers seem to have relied heavily on famed early-20th-century sports columnist Ring Lardner. People are always whacking home runs, swiping bags, tallying runs, and stepping up to the dish.) The company calls its finished product “the narrative.”

Both companies claim that they'll be able to make sense of less-quantifiable subjects in the future, and will be able to generate stories about them, too.

## Aaron Swartz's unfinished monograph on the "programmable Web"

Michael B. Morgan, CEO of Morgan & Claypool Publishers, writes:

In 2009, we invited Aaron Swartz to contribute a short work to our series on Web Engineering (now The Semantic Web: Theory and Technology). He produced a draft of about 40 pages -- a "first version" to be extended later -- which unfortunately never happened.

After his death in January, we decided (with his family's blessing) that it would be a good idea to publish this work so people could read his ideas about programming the Web, his ambivalence about different aspects of Semantic Web technology, his thoughts on Openness, and more.

(Thanks, Michael!)

## Regular expressions crossword

On Coinheist.com, a crossword puzzle you solve by interpreting regular expressions.

## Probability theory for programmers

Jeremy Kun, a mathematics PhD student at the University of Illinois in Chicago, has posted a wonderful primer on probability theory for programmers on his blog. It's a subject vital to machine learning and data-mining, and it's at the heart of much of the stuff going on with Big Data. His primer is lucid and easy to follow, even for math ignoramuses like me.

For instance, suppose our probability space is $\Omega = \left \{ 1, 2, 3, 4, 5, 6 \right \}$ and $f$ is defined by setting $f(x) = 1/6$ for all $x \in \Omega$ (here the “experiment” is rolling a single die). Then we are likely interested in more exquisite kinds of outcomes; instead of asking the probability that the outcome is 4, we might ask what is the probability that the outcome is even? This event would be the subset $\left \{ 2, 4, 6 \right \}$, and if any of these are the outcome of the experiment, the event is said to occur. In this case we would expect the probability of the die roll being even to be 1/2 (but we have not yet formalized why this is the case).

As a quick exercise, the reader should formulate a two-dice experiment in terms of sets. What would the probability space consist of as a set? What would the probability mass function look like? What are some interesting events one might consider (if playing a game of craps)?

(Image: Dice, a Creative Commons Attribution (2.0) image from artbystevejohnson's photostream)

## Machine-learning algorithm develops heuristics for trustworthy tweets in time of emergency

In "Credibility ranking of tweets during high impact events," a paper published in the ACM's Proceedings of the 1st Workshop on Privacy and Security in Online Social Media , two Indraprastha Institute of Information Technology researchers describe the outcome of a machine-learning experiment that was asked to discover factors correlated with reliability in tweets during disasters and emergencies:

The number of unique characters present in tweet was positively correlated to credibility, this may be due to the fact
that tweets with hashtags, @mentions and URLs contain
more unique characters. Such tweets are also more informative and linked, and hence credible. Presence of swear words
in tweets indicates that it contains the opinion / reaction of
the user and would have less chances of providing informa-
tion about the event. Tweets that contain information or
are reporting facts about the event, are impersonal in nature, as a result we get a negative correlation of presence of
pronouns in credible tweets. Low number of happy emoticons [:-), :)] and high number of sad emoticons [:-(, :(] act
as strong predictors of credibility. Some of the other important features (p-value < 0.01) were inclusion of a URL in
the tweet, number of followers of the user who tweeted and
presence of negative emotion words. Inclusion of URL in a
tweet showed a strong positive correlation with credibility,
as most URLs refer to pictures, videos, resources related to
the event or news articles about the event.

Of course, this is all non-adversarial: no one is trying to trick a filter into mis-assessing a false account as a true one. It's easy to imagine an adversarial tweet-generator that suggests rewrites to deliberately misleading tweets to make them more credible to a filter designed on these lines. This is actually the substance of one of the cleverest science fiction subplots I've read: in Peter Watt's Behemoth, in which a self-modifying computer virus randomly hits on the strategy of impersonating communications from patient zero in a world-killing pandemic, because all the filters allow these through. It's a premise that's never stopped haunting me: the co-evolution of a human virus and a computer virus.

(via /.)

## The many stages of writing a research paper

Timothy Weninger recently submitted a research paper to a computer science conference called World Wide Web. On his way to that, he went through 463 drafts. Bear in mind, this paper has only been submitted, not yet accepted, so there's probably even more edits that are still yet to happen. Welcome to the life of a scientist.

In this video, Weninger created a timelapse showing all the different stages of his writing process, as he added graphs and went through cycles of expanding, contracting, and expanding the text. But mostly expanding. The paper grows from two pages to 10 by the end of the video.

Via Bill Bell

## Chinook: the story of the computer that beat checkers

Last month, I blogged about Relatively Prime, a beautifully produced, crowdfunded free series of math podcasts. I just listened to the episode on Chinook (MP3), the program that became the world champion of checkers.

Chinook's story is a bittersweet and moving tale, a modern account of John Henry and the steam-drill, though this version is told from the point of view of the machine and its maker, Jonathan Schaeffer, a University of Alberta scientist who led the Chinook team. Schaeffer's quest begins with an obsessive drive to beat reigning checkers champ Marion Tinsley, but as the tale unfolds, Tinsley becomes more and more sympathetic, so that by the end, I was rooting for the human.

This is one of the best technical documentaries I've heard, and I heartily recommend it to you.