The sorry state of ebook search results

Why is Google so popular and how does it quickly help you find what you’re looking for? It’s all about their algorithm. Google uses a variety of metrics, including how many inbound links a site has, to determine what’s in their search results and how those results are presented.

Imagine Google without their algorithm. Rather than using all these metrics to figure out which site is most relevant, they just give you a list of sites that happen to contain your search phrase. Pretty worthless, right? So why do we accept that same, lame functionality in ebooks today?

Let’s look at an example. I remember reading Jean Edward Smith’s terrific FDR biography awhile back and wanting to go back to re-read the details about Hyde Park. The author provided information about the location earlier and I wanted to find that specific part of the book. Here’s what I got in the Kindle in-book search results:

Hyde park

What an awful user experience. I get every instance of the phrase, listed in the order of appearance in the book. There’s absolutely no indication of how in-depth the coverage of Hyde Park is in any section; I’m left to figure that out on my own.

Now take a look at these search results:

Olive results

There I searched for the phrase “BeagleBone” in a technology book and each of the results has a score associated with them; the results with higher scores offer more in-depth coverage of the topic.

How did I produce those results? They came from an ebook reading platform that does much more than simply reproduce “print under glass.” The content comes into the system as a simple, text-embedded PDF. It’s then analyzed and converted to render in a browser-based reading engine. No third-party apps or plug-ins are required.

The magic is in the content ingestion process. This platform knows when a phrase appears in a first-level heading vs. a second-level heading as well as how many times it appears on that page or in that section. In short, it applies technology to produce a far superior set of search results. 

When will we see this type of functionality in any of the popular ebook reading apps? I’m not holding my breath. The leading vendors apparently don’t see a need to bring their search capabilities out of the dark ages.

If you’d like to learn more about this platform you’ll find summary information here. You’ll also notice it’s the ebook platform solution offered by my employer, Olive Software, Inc. I may not be a book publisher anymore but I’m thrilled to be part of an organization that’s helping lead the industry forward. Relevance-ranked search is just one of the cool innovations that sets us apart. Let me know if you’d like to learn more.

In search of a better search

Imagine Google's search results with no sophisticated algorithm behind them. Rather, when you type in your search phrase and press Enter, Google simply shows you a list of websites where that phrase can be found. No indication of relevance. No ranking mechanism. It's just a list of the sites that contain the phrase. Maybe the list is arranged in chronological order, where the older sites containing your phrase appear first.

Pretty worthless, right? So why do we accept that as our search solution in every major ebook reader app? Open an ebook, search for a phrase and the results merely list each occurrence of it, arranged from page 1 through the end of the book.

You might think a sophisticated search would only be useful for specialty products like textbooks and other reference materials. I disagree. I could see something like this being quite useful in novels, for example. Let's say you forgot who a minor character is and you'd like to quickly learn more about them. Sure, you could do a simple text search and see where the character first appears, but wouldn't you prefer results that provide some context? Maybe it's really the fourth occurrence of that character's name where you'll get the real details on their role in the story. That wouldn't be easy to figure out with today's ebook search capabilities.

And yes, I'm aware of at least two specialty ebook platforms that offer better search results. That's because they have editors who spend hours and hours parsing the content to build this feature manually. I have one word for them: scale. Their solution simply doesn't scale...more on that in a moment.

Here's another use case for a smarter ebook search feature: catalog-wide search. I challenge you to go to Amazon or a publisher's website and use their site search feature to tell you which book has the most in-depth coverage of a particular topic. Let's say you're looking for a book about creating websites. You want one that provides thorough coverage of HTML but also offers a solid introduction to JavaScript. You can't use site search on Amazon or a publisher's site to figure this out. You'll have to look at each book's table of contents and determine the answer yourself, one book a time. A better solution is one where the results show you exactly how deep the JavaScript coverage is in each book, arranged where the books having the most in-depth coverage appear first.

Now back to the scale problem... The key here is to enable these richer search results without requiring a bunch of manual labor. Book publishers are trying to reduce staff and cut costs, not add more of either. So the only way to deliver this service is through a software solution where the content is analyzed and a rich, context-sensitive index is created.

Does that sound far-fetched to you? I don't think so, In fact, I believe we'll see a service like this very soon. I know I'll get a lot of use out of it and I bet you will too.

Towards a better book recommendation service

The ideal content discovery service has yet to be invented. Plenty have tried but none have truly succeeded. The latest is venture is BookScout from Random House. It’s a nifty Facebook app that uses your social graph to help you discover relevant content. As Laura Hazard Owen recently discovered though, it’s far from perfect.

Reading Laura’s post reminded me of something a wise person told me last year: Just because I’m Facebook friends with you doesn’t mean we have the same reading interests. In fact, I’d be willing to bet my reading interests don’t map very well to any of my friends, real or virtual.


Goodreads New Recommendation Engine & Discoverability

Discoverability is one of the key issues that plagues the book and econtent world. The bad news is the situation is only going to get worse, particularly when you consider all the new publishing and self-publishing platforms that are vying for our attention. The good news is we're starting to see platforms like Goodreads to help you discover new titles that match your interests. Goodreads community manager, Patrick Brown, tells us all about their new recommendation engine and some of the complexities of the algorithm behind it. Watch the interview via this link or the embedded version below. Key points include:

  • Recommendation engines are complex -- The Goodreads engine has been in development for 6 years! In fact, the Goodreads algorithm benefited from the competition Netflix had to improve their own algorithm. [Discussed at 2:50]
  • The more you use the better the advice -- Goodreads obviously wants us all to engage with their service as much as possible. One benefit to doing so is that the recommendations served up will be more fine-tuned to your interests. [Discussed at 5:24]
  • Serendipity can be found further down the long tail -- Part of what makes the Goodreads recommendation engine so valuable is that they're not just recommending the latest bestseller on the topic. [Discussed at 6:40]
  • Categories are broad today, but... -- This initial release of the Goodreads recommendation engine uses large buckets (e.g., History, but not narrowed down to, say, WWII). Over time the granularity, and therefore, the value of this aspect of the service will improve. [Discussed at 13:45]

"Google-level Relevance, Facebook-level Social, & Apple-level Design"

Of all the items I read over the holidays, a blog post from Bradford Cross called Why the iPad is Destroying the Future of Journalism was by far the best.  It's a must read for everyone involved in any form of publishing.  Here are some of my favorite excerpts (italics) and with a few comments of my own mixed in:

The iPad has been dubbed a revolutionary device and the journalism industry has raced to embrace it.  But their embrace is more of a desperate final grasp at the past.

Yes!  That's exactly why I've tried out at least 4 or 5 different magazines on it but have yet to subscribe to a single one.  Wired is my favorite example.  I spent $4.99 on the first iPad issue and never went back for more.  Even though they lowered the price, why should I buy the iPad version when I get the print one for $10 a year?  There's nothing new and exciting enough to get me to switch.

Publishers want to have their own branded channel - whether in their own app, or in some meta-app.  They are fighting back against syndicating their content on the web and they want you to come to their sites and pay.  Nobody gets their content from only one source; this is the Internet.  Nobody is going to pick their favorite newspaper or magazine and just stick to their app.

This is where we in the publishing industry need to think more about getting our content to where readers already are and not expect them to always come to us (or grab our latest app).

Nobody wants an app for each content source.  The parallels to RSS are striking.

This reminds me of a conversation I was part of at a recent conference.  One person in the session mentioned that he couldn't recall the last time he opened his RSS reader.  Another agreed, saying she felt too guilty seeing the new tally of unread items in it.  How true.  I also can't tell you the last time I looked at my RSS reader, but I certainly don't like the idea of individual apps for every type of content I'm interested in.

Since most non-direct traffic for news now is coming from search, Facebook turns out to be the largest subscription source of news content on the Internet.

Even though Bradford's post is all about the news industry, I believe there are parallels to book publishing here as well.  See this earlier post on publishing in the social world for more info.

The success of search, social, and design seem to indicate that the future of news products need Google-level relevance, Facebook-level social, and Apple-level design.

What a terrific way to state it.  And unlike the famous words of Meatloaf, two out of those three simply isn't good enough.

If journalism is going to rediscover a model that works, it has to figure out how to integrate with the social web.  What should I be able to do with that Economist article?  Should I be able to share it à la carte so I can discuss it with the people I want?  Should I be able to share it within my network?  Should I be able to share it publicly?

I think the answers should be yes, yes and yes.  Btw, I read this article using the Instapaper app on my iPad.  Doing so helped me realize social network functionality that need to be added to Instapaper as well -- see my latest iPadHound post for more info.

The Uber-Index

Infinity The Rich Content post I wrote back on March 29th keeps popping into my head.  I think our industry has spent way too much time trying to force-fit video and other types of content in with the written word.  Meanwhile, the real solution to rich content has probably been right here under our noses the whole time: the index.  Actually, what I'm talking about should be called an "index on steroids" or an uber-index.

For years publishers have generated those backmatter elements we've grown to know, love and rely on...the index.  Index specialists are charged with finding all the critical terms, synonyms and other entries then compiling them into one of the most important elements of the book.  Up to now those indexes have been static and almost exclusively focus on providing pointers within the book the where index appears.  In tomorrow's ebook, the uber-index should grow as more related content is available on websites, blogs, other books, apps, etc.

Liza Daly expressed a similar vision in this excerpt from an iPad-related interview she did with The New York Times about a week after my "Rich Content" blog post:

I see the consummate iPad reading experience to be one that is, on the surface, traditional: heavily textual, quiet, hand-held. But lurking beneath the words is the whole Internet, ready to be questioned — “Find other works that quoted this,” “Where was the Marshalsea prison?”, “Which of my friends is also reading this?”, “What is that attractive person across from me reading?”

None of that requires a publisher to “enhance” the e-book prior to publication. A truly modern e-reader is one that is intimately connected to the Web and allows a user to make queries as a series of asides, while reading or after immersive reading has ended.

So what this all means is that authors and publishers could continue to build books they way they've done for hundreds of years, but a new effort needs to be dedicated to the index itself.  Not the print index, of course, but the uber one that works within the e-reader.

Imagine an e-reader/app that lets you read a book in the traditional way but below the surface it offers smart links to all the related content and resources you could hope for.  As I mentioned in the 3/29 post, some of this could be automated but then it's little more than a set of algorithm-based search results.  I want something more and I'll bet you do too.

How about applying the wisdom of the masses to the problem?  Just as the Wikipedia provides encyclopedia-length entries on subjects far and wide, what if there were a community-based service that created nothing but the most relevant pointers to all the best content?

You're an expert in 70's music and you spend all your waking hours looking for the best sites, videos, interviews, etc., on the subject  Why not share your discoveries about Thin Lizzy and Mott The Hoople by adding to and helping curate the uber-index on these topics?  The uber-index would then be made available to e-reader apps so that when someone clicks on Glen Frey's name in Don Felder's (terrific!) book about The Eagles, Heaven & Hell, they'll immediately have access to a growing list of outside resources that confirm Felder's point that Frey was a complete jerk!

All of this functionality would be included, btw, with little to no work required by the publisher.  A utility would run the book's contents against the uber-index and generate all the relevant links.  You could do this when you buy the book or periodically as you're reading it, to make sure it's always up-to-date.

How about that?  An infinitely deep index, the uber-index, that dramatically enhances and extends the reading experience while preserving it at the same time.  Isn't that what we're all after?

P.S. -- Now take it a step further.  Are you familiar with the "Sponsored Links" area of the Google search results?  These are the links someone has paid to have included in your search results  Why not introduce a sponsored link section to this as well, where monetization can occur?  So when you pull up the menu for Glen Frey mentioned earlier it also includes a paid link from Amazon where you can buy his latest CD, if you're so inclined.  Click that link and the publisher/author get a cut of the sponsored link payment.  If a substantial enough AdSense-like ecosystem builds up around this it creates an additional revenue stream that could be shared by all parties.

Extending Google's Book Search Program

Google book searchGoogle's Book Search program isn't exactly new but how many times have you used it?  When I'm searching for something I usually just start with Google's default web search.  If I'm looking for a book (or the contents of a book), well, I go to a bookseller's site.

Although it's hard to beat an in-person "flip test" with a book, Amazon's Search Inside the Book feature offers perhaps the best online alternative. But that's just one vendor and I don't believe Amazon has opened it up as a service to other websites.  As of yesterday, Google is doing just that with their Book Search program -- here's the official announcement.

Whether or not this is significant depends on a couple of things including who adopts it and how flexible Google will be with the feature set.  The announcement already talks about a number of websites that have either already implemented this service or plan to shortly.  That's great news as it should enable each of those retailers to offer a Search Inside service like the one Amazon has enjoyed for many years now.

I'm more interested to see what non-retailers will do with this opportunity, including publishers.  If this becomes a truly open system it could lend itself to all sorts of interesting implementations, beyond simple limited search access to 20% of the book.  For example, what if publishers could create a subscription service that provides access to 100% of the book?  That's where flexibility comes into play.  As more websites implement this service Google will receive more requests to enhancement it.

In short, I love the idea and I'm anxious to see if it evolves into something much larger.

Cuil: Apparently "Bigger" Isn't What We Crave

Cuil I was anxious to try out this new Cuil search engine everyone's buzzing about.  The management team is loaded with former Google-ites and they've promised to deliver "the world's biggest search engine," meaning all those sites Google ignores will now be included in Cuil search results.  Further, content and relevance are king, which should provide a much more satisfying search experience.

To be honest, I don't have any beefs with Google.  I use it throughout the day and I generally find what I'm looking for in the top half of the first page of results.  Then again, I was happy with Lycos many years ago before shifting to Yahoo.  Then I abandoned Yahoo to jump on the Google bandwagon.  Although I've pretty much stuck with Google for the past several years you can see I have no search engine loyalty.  I'll use whatever suits my needs.

Btw, I've seen lots of people ask the question, "do we need another search engine?"  My answer is, "it depends", but I'm not convinced the solution involves focus groups or building a business/tool around user feedback.  That's how New Coke's are born.  After all, was anyone really screaming for a better search engine in 1997-1998 when Google hit the scene?  I'm pretty sure we were all happy with Yahoo, AltaVista, Excite and the others back then.  It reminds me of that great quote from Henry Ford who said, "If I had asked my customers what they wanted, they would have said a faster horse."

Well, Cuil may indeed be a faster (or at least bigger) horse than Google, but I'm not all that impressed with it.  The searches I experimented with produced results that were different from Google's but I still found Google's to be more useful and relevant.  Although it doesn't take much to change search engines I'd need a compelling reason to switch from Google; I'm not finding that with Cuil.

P.S. -- Searchme is probably the only search engine I've seen recently that's worthy of abandoning Google over.  No, it's not just the nifty user interface...I like the whole stacks metaphor they use and how stacks can be saved and sent to others.  Now that's something I never would have suggested as a search engine improvement but it really lends itself to some very interesting applications.

100 Sites for Answers

Question mark sign I use the Wikipedia.  You use the Wikipedia.  We all use the Wikipedia.  It's great...there's no doubt about it.  But what about all those other sites that offer loads of answers to your endless list of questions?  Sure, Google is an excellent starting point and it's introduced me to quite a few answer-related sites, but which ones are the best?  (See?  I even have questions about questions!)

The nice folks at have given this some thought and offered up a handy post called Lose Your Wikipedia Crutch: 100 Places to Go for Good Answers Online.  It's easy to get lost in all these great sites in the list.  Fortunately for us, DistanceDegrees has broken them down by categories, so maybe you should just focus on Encyclopedias today, the Science and Math section tomorrow, etc. :-)

What a great resource.  I've already bookmarked it for future reference.

Nicholas Carr Says Google is Making Us Stupid


Nicholas Carr is one of the most outspoken and opinionated authors you'll ever come across.  Several years ago he was the enemy of IT-types everywhere when he asked the question Does IT Matter?  More recently he wrote an excellent book I reviewed here called The Big Switch.  I enjoy his work and was delighted when a Wiley colleague left a Carr article on my desk entitled Is Google Making Us Stupid?  How could I resist reading it last night?!

Carr's premise is that Google is making us lazy by encouraging more online surfing at the shallowest of levels.  Read a headline and move on.  Scan an article but don't read it thoroughly.  It's the sort of thing I'm guilty of 99% of the time I'm online.  Actually, what Carr talks about as a problem is exactly what Jeff Bezos refers to as "information  snacking" and an issue he hopes will be counterbalanced by the Kindle.

I love some of the metaphors Carr uses in this article:

Once I was a scuba diver in the sea of words.  Now I zip along the surface like a guy on a JetSki.

...we risk turning into 'pancake people'--spread wide and thin as we connect with that vast netork of information accessed by the mere touch of a button.

Maybe that's why I've had such a strong desire to curl up with a good book lately!  On a related note, this all raises a couple of billion dollar questions: Can we construct a new "book model" to address this?  How do we evolve with these changing reading habits?