Skip navigation
Help

Internet scale

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.

Ever wonder what powers Google's world spirit sensing Zeitgeist service? No, it's not a homunculus of Georg Wilhelm Friedrich Hegel sitting in each browser. It's actually a stream processing (think streaming MapReduce on steroids) system called MillWheel, described in this very well written paper: MillWheel: Fault-Tolerant Stream Processing at Internet Scale. MillWheel isn't just used for Zeitgeist at Google, it's also used for streaming joins for a variety of Ads customers, generalized anomaly-detection service, and network switch and cluster health monitoring.

Abstract:

MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework’s fault-tolerance guarantees.

 

This paper describes MillWheel’s programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel’s features are used. MillWheel’s programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel’s unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.

0
Your rating: None

Internet map

Upon discovering hundreds of thousands open embedded devices on the Internet, an anonymous researcher conducted a Census of the Internet, mapping 460 million IP addresses around the world.

While playing around with the Nmap Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address usage.

It's a pretty thorough analysis, but the conclusion interested me most:

The why is also simple: I did not want to ask myself for the rest of my life how much fun it could have been or if the infrastructure I imagined in my head would have worked as expected. I saw the chance to really work on an Internet scale, command hundred thousands of devices with a click of my mouse, portscan and map the whole Internet in a way nobody had done before, basically have fun with computers and the Internet in a way very few people ever will. I decided it would be worth my time.

It makes me feel...uneasy. [Thanks, Roger]

0
Your rating: None

Aurich Lawson (after Aliens)

In one of the more audacious and ethically questionable research projects in recent memory, an anonymous hacker built a botnet of more than 420,000 Internet-connected devices and used it to perform one of the most comprehensive surveys ever to measure the insecurity of the global network.

In all, the nine-month scanning project found 420 million IPv4 addresses that responded to probes and 36 million more addresses that had one or more ports open. A large percentage of the unsecured devices bore the hallmarks of broadband modems, network routers, and other devices with embedded operating systems that typically aren't intended to be exposed to the outside world. The researcher found a total of 1.3 billion addresses in use, including 141 million that were behind a firewall and 729 million that returned reverse domain name system records. There were no signs of life from the remaining 2.3 billion IPv4 addresses.

Continually scanning almost 4 billion addresses for nine months is a big job. In true guerilla research fashion, the unknown hacker developed a small scanning program that scoured the Internet for devices that could be logged into using no account credentials at all or the usernames and passwords of either "root" or "admin." When the program encountered unsecured devices, it installed itself on them and used them to conduct additional scans. The viral growth of the botnet allowed it to infect about 100,000 devices within a day of the program's release. The critical mass allowed the hacker to scan the Internet quickly and cheaply. With about 4,000 clients, it could scan one port on all 3.6 billion addresses in a single day. Because the project ran 1,000 unique probes on 742 separate ports, and possibly because the binary was uninstalled each time an infected device was restarted, the hacker commandeered a total of 420,000 devices to perform the survey.

Read 16 remaining paragraphs | Comments

0
Your rating: None