Is every search engine focused on building the same system? You have a bunch of crawlers, and then you build up some way to store all of this stuff and index it, and then make some way to search it and serve an interface to it. Am I right so far?
This is how web search has worked for a long time: make a copy of as much of the web as you can, and then search that. This means a lot of missed content, inconsistent results, and so much duplication it's not funny. How many disk farms are out there solely to try to hold copies of the entire web? How many RAM farms for the "hot" n%?
I came up with an idea for inverting web search. Instead of searching the copies, search the actual sites with the content. But... instead of having to find all of them to send them your searches, have them find you. It's like a stock exchange for searching. I register a query, they pull from the firehose, and they can provide their best match for it. Then it finds its way back to me. It would probably cache old results to make response times reasonable, and so that the sources wouldn't have to consume the full firehose.
This is how web search has worked for a long time: make a copy of as much of the web as you can, and then search that. This means a lot of missed content, inconsistent results, and so much duplication it's not funny. How many disk farms are out there solely to try to hold copies of the entire web? How many RAM farms for the "hot" n%?
I came up with an idea for inverting web search. Instead of searching the copies, search the actual sites with the content. But... instead of having to find all of them to send them your searches, have them find you. It's like a stock exchange for searching. I register a query, they pull from the firehose, and they can provide their best match for it. Then it finds its way back to me. It would probably cache old results to make response times reasonable, and so that the sources wouldn't have to consume the full firehose.
That's the basic idea, and it goes from there.
I wrote about this in April: http://rachelbythebay.com/w/2012/04/30/search/