Skip navigation
Help

AltaVista

warning: Creating default object from empty value in /var/www/vhosts/sayforward.com/subdomains/recorder/httpdocs/modules/taxonomy/taxonomy.pages.inc on line 33.

Screen Shot 2012-05-01 at 10.00.20 AM

Recently I tried to do a Google search for a wine to pair with swordfish, and it was pretty much a disaster (first world problems, I know, but still.) The problem is, web search results for certain topics are just overloaded with dummy websites with little to no valuable content, many of which have utilized “search engine optimization” (SEO) tactics. Of course, search engines work overtime to stay one step ahead of the SEO spammers, but sometimes the bad guys just win out.

There’s also the issue of discovering new content. Say you’re looking for a new recipe for a dish you’ve made lots of times before. The top 20 search results are going to be from very popular food sites, of recipes you’ve probably already seen What if you want something fresh?

That’s what a neat hack called MillionShort aims to help with. The website is a search engine that lets you remove the top million (or 100,000, or 10,000, or whatever) hits from the results list. It’s a lot like pruning a plant, or skimming the film off the top of a stew: MillionShort lets you remove the old or non-useful stuff from traditional web search to find new or interesting content.

Results for "ratatouille recipe" search (click to enlarge)

The website, which is apparently built on top of Google search (we’ve reached out for an interview and more details and will update this post when we hear back), describes itself like this:

“We thought might be somewhat interesting to see what we’d find if we just removed an entire slice of the web.

The thinking was the same popular sites (we’re not saying popular equals irrelevant) show up again and again, Million Short makes it easy to discover sites that just don’t make it to the top of the search engine results for whatever reason (poor SEO, new site, small marketing budget, competitive keyword(s) etc.). Most people don’t look beyond page 1 when doing a search and now they don’t have to.”

Technically it seems pretty basic, but the idea is pretty powerful. The community at developer-centric news aggregator and discussion site HackerNews has had a pretty big response to MillionShort: The post about the site has garnered nearly 200 comments in less than 24 hours. As one commenter, jaems33, noted: “It reminds me of why I first moved to Google from Yahoo/Webcrawler/Altavista/etc in the first place.”

Social search and dedicated apps may be great and all, but it seems there is still an appetite for discovering fresh new things from the world wide web at large. If the search powers-that-be stop focusing on that, it’s good to see that there are still enterprising developers keen to hack out their own solutions to the problem.

0
Your rating: None

google

Editor’s note: James Altucher is an investor, programmer, author, and entrepreneur. He is Managing Director of Formula Capital and has written 6 books on investing. His latest books are I Was Blind But Now I See and FAQ MEYou can follow him on Twitter @jaltucher.

Ken Lang could perform miracles. In 1990 we would head off to a bar near where we were going to graduate school for computer science, and we would bring a Go board. Then we would drink and play Go for five hours. At the end of the five hours, after a grueling battle over the board, I remember this one time when magically Ken would show up with two girls who were actually willing to sit down and hang out with two guys who had a GO BOARD in front of them. How did Ken do that?

Fast forward: 1991, CMU asks me to leave graduate school, citing lack of maturity. The professor who threw me out still occasionally calls me up asking me when I’m going to be mature enough.

Fast forward: 1994, one of our classmates, Michael Mauldin is working on a database that automatically sorts by category pages his spider retrieves on the Internet. The name of his computer: lycos.cs.cmu.edu. Lycos eventually spins out of CMU, becomes the biggest seach engine,  and goes public with a multi-billion dollar valuation.

Fast forward: Ken Lang starts a company called WiseWire. I was incredibly skeptical. I read through what the company is about. “No way,” I think to myself, “that this is going to make any money”.

1998: Ken files a patent that classified how search results and ad results are sorted based on the number of click-thrus an ad gets. He sells the company to Lycos for $40 million. Ken Lang becomes CTO of Lycos and they take over his patents.

$40 million! What? And then Lycos stock skyrockets up. I can’t believe it. I’m happy for my friend but also incredibly jealous although later in 1998 I sell my first company as well. Still, I wanted to be the only one I knew who made money. I didn’t think it was fun when other people I knew made money. And, anyway, weren’t search engines dead? I mean,what was even the business model?

Fast forward: the  2000s. Almost every search engine dies. Excite, Lycos, Altavista. Before that “the world wide web worm”. Lycos got bought by a Spanish company, then a Korean company, then an Indian company. To be honest, I don’t even know who owns it now. It has a breathing tube and a feeding tube. Somehow, in a complete coma, it is being kept alive.

One search engine, a little company called Google, figured out how to make money.

One quick story: I was a venture capitalist in 2001. A company, Oingo, which later became Applied Semantics, had a technique for how search engines could make money by having people bid for ads. My partner at the firm said, “we can probably pick up half this company for cheap. They are running out of money.” It was during the Internet bust.

“Are you kidding me, “ I said. “they are in the search engine business. That’s totally dead.” And I went back to playing the Defender machine that was in my office. That I would play all day long even while companies waited in the conference room. (See: “10 Unusual Things I Didn’t Know About Google, Plus How I Made the Worst VC Decision Ever“)

A year later they were bought by Google for 1% of Google. Our half would’ve now been worth hundreds of millions if we had invested. I was the worst venture capitalist ever. They had changed their name from Oingo to Applied Semantics to what became within Google…AdWords and AdSense, which has been 97% of Google’s revenues since 2001. 97%. $67 billion dollars.

Don’t worry.  I’m getting to it.

(Yahoo won hundreds of millions from Google on the Overture patent even before Google amassed the bulk of their $67 billion in overall revenues from AdWords)

Fast forward. Overture, another search engine company that no longer exists (Yahoo bought it) files a patent for a bidding system for ads on a search engine. The patent office says (I’m paraphrasing), “you can file patents on A, B, and C. But not D, E, and F. Because Ken Lang from Lycos filed those patents already.”

Overture/Yahoo goes on to successfully sue Google based on the patents they did win. Google settled right before they went public but long before they achieved the bulk of their revenues.

Lycos goes on to being a barely breathing, comatose patient. Fast forward to 2011. Ken Lang buys his patents back from Lycos for almost nothing. He starts a company: I/P Engine. Two weeks ago he announced he was merging his company with a public company, Vringo (Nasdaq: VRNG). Because it’s Ken, I buy the stock although will buy more after this article is out and readers read this.

The company sues Google for a big percentage of those $67 billion in revenues plus future revenues. The claim: Google has willfully infringed on Vringo – I/P’s patents for sorting ads based on click-throughs. I remember almost 20 years ago when Ken was working on the software. “Useless!” I thought then. Their claim: $67 billion of Google’s revenues come from this patent. All of Google’s revenues going forward come from this patent. And every search engine which uses Google is allegedly infringing on the Vringo patent and is being sued.

Think: Interactive Corp (Nasdaq: IACI) with Ask.com. Think AOL. Think Target which internally uses Google’s technologies. Think Gannett, which uses Google’s technology and is also being sued. Think, eventually, thousands of Google’s customers who use AdSense.

Think: “willfully”. Why should you think that? Two reasons. Overture already sued Google. Google is aware of Yahoo/Overture’s patent history. The patent history officially stated that Ken Lang/Lycos already has patented some of this technology.  What does “willfully” mean in legal terms? Triple damages.

Why didn’t Lycos ever sue? After Lycos had its massive stroke and was left to die in a dirty hospital room with some uncaring nurse changing it’s bedpans twice a day, Google was STILL Lycos’s biggest customer. Why sue your biggest customer? Operating companies rarely sue other operating companies. Then there are countersuits, loss of revenues, and all sorts of ugly things. The breathing tube would’ve been pulled out of Lycos and it would’ve been left to die.

Think: NTP suing RIMM on patents. NTP had nothing going on other than the patents. Like Vringo/Innovate. NTP won over $600 million from RIMM once Research in Motion realized this is a serious issue and not one they can just chalk up to a bad nightmare.

(the beginning of the end for RIMM)

Guess who NTP’s lawyer was? Donald Stout. Guess who Vringo’s patent lawyer is? Donald Stout. Why is Donald Stout so good? He was an examiner at the US Patent Office. He knows patents. They announced all of this but nobody reads announcements of a small public company like Vringo. It’s hard enough figuring out how many pixels are on the screen of Apple’s amazng iPad 3.

Well, Google must have a defense? Even though their AdWords results are sorted by click-throughs in the way described by the patent maybe they sorted in a different way (a “work-around” of the patent), and didn’t infringe on the patent.

Maybe: But look at Google economist Hal Varian describing their algorithm right here in this video. And compare with the patent claim filed in court by Vringo. You decide. But it looks like the exact same to me.

Maybe: But does Google want to risk losing ten billion dollars plus having all of their customers sued. The district the case is getting tried in rules 70% in favor of the plaintiff in patent cases. Most patent trials get settled on the court steps.

Maybe: But then there’s still Microsoft /Yahoo search which, by the way, sorts based on click-throughs and has not been sued yet.

Guess what? Google’s patent lawyer is Quinn-Emmanuel. They are defending Google. Oh, and here’s something funny. Guess who Yahoo’s lawyer is? Yahoo is suing Facebook for patent infringement in the search domain. Quinn-Emanuel. So the same lawyer is both defending and accusing in the same domain. Someone’s going to settle. Everyone will settle. If anyone loses this case then the entire industry is going down in the same lawsuit and the exact same lawyer will be stuck on both sides of the fence. I’m not a lawyer but that smells. The trial is October 16 in the Eastern District Court of Virginia and will last 2 weeks. An appeal process can take, at most, a year.

I’ve known Ken for 23 years. I’ve been in the trenches with him when he was writing what I thought was his useless software. I watched his company get bought and we’ve talked about these technologies through the decades.

I’ve read the patent case. I watched Hal Varian’s video. Also look at this link on Google’s site where they describe their algorithm. Compare with the patent claim.  I have a screenshot if they decide to take it down. $67 billion in revenues from this patent. Imagine: double that in the next ten years. Imagine: triple damages.

Vringo will have an $80 million market capitalization post their merger with I/P. NTP won $600 million from RIMM using the same lawyer. RIMM’s revenues are a drop in the bucket compared to Google. And compared to 1000s of Google’s customers who will be embarrassed when the lawyer shows up at their door also. That’s why I made my investment accordingly. Is Google going to take the risk this happens?

I doubt it.

You can think to yourself: “ugh, patent trolls are disgusting”. But the protection of intellectual property is what America is built on. Smart people invent things. Then they get to protect the intellectual property on what they invents. Other companies can’t steal that technology. That’s why we have such a problem outsourcing to China and other countries where we are worried they might steal our intellectual property. Patents are the defense mechanism for capitalism.

Ken can perform miracles. But no miracle would save me. At the end of one evening of Go playing and beer drinking in 1990 we gave two girls our phone numbers. I don’t know if Ken ever got the call. I didn’t. But I guess I’m happy where it all ended up.

0
Your rating: None

We've always put a heavy emphasis on performance at Stack Overflow and Stack Exchange. Not just because we're performance wonks (guilty!), but because we think speed is a competitive advantage. There's plenty of experimental data proving that the slower your website loads and displays, the less people will use it.

[Google found that] the page with 10 results took 0.4 seconds to generate. The page with 30 results took 0.9 seconds. Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.

In A/B tests, [Amazon] tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue.

I believe the converse of this is also true. That is, the faster your website is, the more people will use it. This follows logically if you think like an information omnivore: the faster you can load the page, the faster you can tell whether that page contains what you want. Therefore, you should always favor fast websites. The opportunity cost for switching on the public internet is effectively nil, and whatever it is that you're looking for, there are multiple websites that offer a similar experience. So how do you distinguish yourself? You start by being, above all else, fast.

Do you, too, feel the need – the need for speed? If so, I have three pieces of advice that I'd like to share with you.

1. Follow the Yahoo Guidelines. Religiously.

The golden reference standard for building a fast website remains Yahoo's 13 Simple Rules for Speeding Up Your Web Site from 2007. There is one caveat, however:

There's some good advice here, but there's also a lot of advice that only makes sense if you run a website that gets millions of unique users per day. Do you run a website like that? If so, what are you doing reading this instead of flying your private jet to a Bermuda vacation with your trophy wife?

So … a funny thing happened to me since I wrote that four years ago. I now run a network of public, community driven Q&A web sites that do get millions of daily unique users. (I'm still waiting on the jet and trophy wife.) It does depend a little on the size of your site, but if you run a public website, you really should pore over Yahoo's checklist and take every line of it to heart. Or use the tools that do this for you:

We've long since implemented most of the 13 items on Yahoo's list, except for one. But it's a big one: Using a Content Delivery Network.

The user's proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start?

As a first step to implementing geographically dispersed content, don't attempt to redesign your web application to work in a distributed architecture. Depending on the application, changing the architecture could include daunting tasks such as synchronizing session state and replicating database transactions across server locations. Attempts to reduce the distance between users and your content could be delayed by, or never pass, this application architecture step.

Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. This is the Performance Golden Rule. Rather than starting with the difficult task of redesigning your application architecture, it's better to first disperse your static content. This not only achieves a bigger reduction in response times, but it's easier thanks to content delivery networks.

As a final optimization step, we just rolled out a CDN for all our static content. The results are promising; the baseline here is our datacenter in NYC, so the below should be read as "how much faster did our website get for users in this area of the world?"

Cdn-performance-test-world-map

In the interests of technical accuracy, static content isn't the complete performance picture; you still have to talk to our servers in NYC to get the dynamic content which is the meat of the page. But 90% of our visitors are anonymous, only 36% of our traffic is from the USA, and Yahoo's research shows that 40 to 60 percent of daily vistors come in with an empty browser cache. Optimizing this cold cache performance worldwide is a huge win.

Now, I would not recommend going directly for a CDN. I'd leave that until later, as there are a bunch of performance tweaks on Yahoo's list which are free and trivial to implement. But using a CDN has gotten a heck of a lot less expensive and much simpler since 2007, with lots more competition in the space from companies like Amazon's, NetDNA, and CacheFly. So when the time comes, and you've worked through the Yahoo list as religiously as I recommend, you'll be ready.

2. Love (and Optimize for) Your Anonymous and Registered Users

Our Q&A sites are all about making the internet better. That's why all the contributed content is licensed back to the community under Creative Commons and always visible regardless of whether you are logged in or not. I despise walled gardens. In fact, you don't actually have to log in at all to participate in Q&A with us. Not even a little!

The primary source of our traffic is anonymous users arriving from search engines and elsewhere. It's classic "write once, read – and hopefully edit – millions of times." But we are also making the site richer and more dynamic for our avid community members, who definitely are logged in. We add features all the time, which means we're serving up more JavaScript and HTML. There's an unavoidable tension here between the download footprint for users who are on the site every day, and users who may visit once a month or once a year.

Both classes are important, but have fundamentally different needs. Anonymous users are voracious consumers optimizing for rapid browsing, while our avid community members are the source of all the great content that drives the network. These guys (and gals) need each other, and they both deserve special treatment. We design and optimize for two classes of users: anonymous, and logged in. Consider the following Google Chrome network panel trace on a random Super User question I picked:

 
requests
data transferred
DOMContentLoaded
onload

Logged in (as me)
29
233.31 KB
1.17 s
1.31 s

Anonymous
22
111.40 KB
768 ms
1.28 s

We minimize the footprint of HTML, CSS and Javascript for anonymous users so they get their pages even faster. We load a stub of very basic functionality and dynamically "rez in" things like editing when the user focuses the answer input area. For logged in users, the footprint is necessarily larger, but we can also add features for our most avid community members at will without fear of harming the experience of the vast, silent majority of anonymous users.

3. Make Performance a Point of (Public) Pride

Now that we've exhausted the Yahoo performance guidance, and made sure we're serving the absolute minimum necessary to our anonymous users – where else can we go for performance? Back to our code, of course.

When it comes to website performance, there is no getting around one fundamental law of the universe: you can never serve a webpage faster than it you can render it on the server. I know, duh. But I'm telling you, it's very easy to fall into the trap of not noticing a few hundred milliseconds here and there over the course of a year or so of development, and then one day you turn around and your pages are taking almost a full freaking second to render on the server. It's a heck of a liability to start 1 full second in the hole before you've even transmitted your first byte over the wire!

That's why, as a developer, you need to put performance right in front of your face on every single page, all the time. That's exactly what we did with our MVC Mini Profiler, which we are contributing back to the world as open source. The simple act of putting a render time in the upper right hand corner of every page we serve forced us to fix all our performance regressions and omissions.

Mvc-mini-profiler-question-page

(Note that you can click on the SQL linked above to see what's actually being run and how long it took in each step. And you can use the share link to share the profiler data for this run with your fellow developers to shame them diagnose a particular problem. And it works for multiple AJAX requests. Have I mentioned that our open source MVC Mini Profiler is totally freaking awesome? If you're on a .NET stack, you should really check it out. )

In fact, with the render time appearing on every page for everyone on the dev team, performance became a point of pride. We had so many places where we had just gotten a little sloppy or missed some tiny thing that slowed a page down inordinately. Most of the performance fixes were trivial, and even the ones that were not turned into fantastic opportunities to rearchitect and make things simpler and faster for all of our users.

Did it work? You bet your sweet ILAsm it worked:

Google-webmaster-crawl-stats-download-time

That's the Google crawler page download time; the experimental Google Site Performance page, which ostensibly reflects complete full-page browser load time, confirms the improvements:

Google-webmaster-site-performance-overview

While server page render time is only part of the performance story, it is the baseline from which you start. I cannot emphasize enough how much the simple act of putting the page render time on the page helped us, as a development team, build a dramatically faster site. Our site was always relatively fast, but even for a historically "fast" site like ours, we realized huge gains in performance from this one simple change.

I won't lie to you. Performance isn't easy. It's been a long, hard road getting to where we are now – and we've thrown a lot of unicorn dollars toward really nice hardware to run everything on, though I wouldn't call any of our hardware choices particularly extravagant. And I did follow my own advice, for the record.

I distinctly remember switching from AltaVista to Google back in 2000 in no small part because it was blazing fast. To me, performance is a feature, and I simply like using fast websites more than slow websites, so naturally I'm going to build a site that I would want to use. But I think there's also a lesson to be learned here about the competitive landscape of the public internet, where there are two kinds of websites: the quick and the dead.

Which one will you be?

0
Your rating: None