Your personal index of everything

Google is terrific but it doesn’t help me answer the question, “where did I read about that?”. I’m running into that question more frequently these days, partly because I’m reading so many short bursts of content from so many sources.

It’s not just website content. I’m talking about emails, e-magazines, e-newspapers, ebooks, etc. In short, I need help indexing all the digital content streams I’m consuming every day.

Over the past few years I used a service called Findings that did some of this, but it was an approach that required me to actively curate the articles and excerpts I wanted to preserve and share. I need something that automatically ingests and indexes everything I see, not just what I’ve highlighted. And it has to happen with no clicks, copying or curation required by me. Just index everything I see.

I want a tool that watches all the emails I read every day, keeps track of the content of the webpages I visit, has access to the magazines I read in Next Issue, sees all the content I consume in the Kindle app, the Byliner app, etc. When privacy advocates read that sentence their heads will explode. That’s OK. They can choose to live without this service but I’ll be more efficient because of it.

It’s obviously a concept that requires opt-in from the user. It also requires the ability to tap into content streams from proprietary apps, not just browsers. And it needs to follow me across all my devices. Whoever develops it will help solve a problem that’s only going to get worse in the years ahead. I hope they do so soon because I’d love to start using it today to build my index of everything for tomorrow.


The sorry state of ebook search results

Why is Google so popular and how does it quickly help you find what you’re looking for? It’s all about their algorithm. Google uses a variety of metrics, including how many inbound links a site has, to determine what’s in their search results and how those results are presented.

Imagine Google without their algorithm. Rather than using all these metrics to figure out which site is most relevant, they just give you a list of sites that happen to contain your search phrase. Pretty worthless, right? So why do we accept that same, lame functionality in ebooks today?

Let’s look at an example. I remember reading Jean Edward Smith’s terrific FDR biography awhile back and wanting to go back to re-read the details about Hyde Park. The author provided information about the location earlier and I wanted to find that specific part of the book. Here’s what I got in the Kindle in-book search results:

Hyde park

What an awful user experience. I get every instance of the phrase, listed in the order of appearance in the book. There’s absolutely no indication of how in-depth the coverage of Hyde Park is in any section; I’m left to figure that out on my own.

Now take a look at these search results:

Olive results

There I searched for the phrase “BeagleBone” in a technology book and each of the results has a score associated with them; the results with higher scores offer more in-depth coverage of the topic.

How did I produce those results? They came from an ebook reading platform that does much more than simply reproduce “print under glass.” The content comes into the system as a simple, text-embedded PDF. It’s then analyzed and converted to render in a browser-based reading engine. No third-party apps or plug-ins are required.

The magic is in the content ingestion process. This platform knows when a phrase appears in a first-level heading vs. a second-level heading as well as how many times it appears on that page or in that section. In short, it applies technology to produce a far superior set of search results. 

When will we see this type of functionality in any of the popular ebook reading apps? I’m not holding my breath. The leading vendors apparently don’t see a need to bring their search capabilities out of the dark ages.

If you’d like to learn more about this platform you’ll find summary information here. You’ll also notice it’s the ebook platform solution offered by my employer, Olive Software, Inc. I may not be a book publisher anymore but I’m thrilled to be part of an organization that’s helping lead the industry forward. Relevance-ranked search is just one of the cool innovations that sets us apart. Let me know if you’d like to learn more.


In search of a better search

Imagine Google's search results with no sophisticated algorithm behind them. Rather, when you type in your search phrase and press Enter, Google simply shows you a list of websites where that phrase can be found. No indication of relevance. No ranking mechanism. It's just a list of the sites that contain the phrase. Maybe the list is arranged in chronological order, where the older sites containing your phrase appear first.

Pretty worthless, right? So why do we accept that as our search solution in every major ebook reader app? Open an ebook, search for a phrase and the results merely list each occurrence of it, arranged from page 1 through the end of the book.

You might think a sophisticated search would only be useful for specialty products like textbooks and other reference materials. I disagree. I could see something like this being quite useful in novels, for example. Let's say you forgot who a minor character is and you'd like to quickly learn more about them. Sure, you could do a simple text search and see where the character first appears, but wouldn't you prefer results that provide some context? Maybe it's really the fourth occurrence of that character's name where you'll get the real details on their role in the story. That wouldn't be easy to figure out with today's ebook search capabilities.

And yes, I'm aware of at least two specialty ebook platforms that offer better search results. That's because they have editors who spend hours and hours parsing the content to build this feature manually. I have one word for them: scale. Their solution simply doesn't scale...more on that in a moment.

Here's another use case for a smarter ebook search feature: catalog-wide search. I challenge you to go to Amazon or a publisher's website and use their site search feature to tell you which book has the most in-depth coverage of a particular topic. Let's say you're looking for a book about creating websites. You want one that provides thorough coverage of HTML but also offers a solid introduction to JavaScript. You can't use site search on Amazon or a publisher's site to figure this out. You'll have to look at each book's table of contents and determine the answer yourself, one book a time. A better solution is one where the results show you exactly how deep the JavaScript coverage is in each book, arranged where the books having the most in-depth coverage appear first.

Now back to the scale problem... The key here is to enable these richer search results without requiring a bunch of manual labor. Book publishers are trying to reduce staff and cut costs, not add more of either. So the only way to deliver this service is through a software solution where the content is analyzed and a rich, context-sensitive index is created.

Does that sound far-fetched to you? I don't think so, In fact, I believe we'll see a service like this very soon. I know I'll get a lot of use out of it and I bet you will too.


Towards a better book recommendation service

The ideal content discovery service has yet to be invented. Plenty have tried but none have truly succeeded. The latest is venture is BookScout from Random House. It’s a nifty Facebook app that uses your social graph to help you discover relevant content. As Laura Hazard Owen recently discovered though, it’s far from perfect.

Reading Laura’s post reminded me of something a wise person told me last year: Just because I’m Facebook friends with you doesn’t mean we have the same reading interests. In fact, I’d be willing to bet my reading interests don’t map very well to any of my friends, real or virtual.

Read more...

Goodreads New Recommendation Engine & Discoverability

Discoverability is one of the key issues that plagues the book and econtent world. The bad news is the situation is only going to get worse, particularly when you consider all the new publishing and self-publishing platforms that are vying for our attention. The good news is we're starting to see platforms like Goodreads to help you discover new titles that match your interests. Goodreads community manager, Patrick Brown, tells us all about their new recommendation engine and some of the complexities of the algorithm behind it. Watch the interview via this link or the embedded version below. Key points include:

  • Recommendation engines are complex -- The Goodreads engine has been in development for 6 years! In fact, the Goodreads algorithm benefited from the competition Netflix had to improve their own algorithm. [Discussed at 2:50]
  • The more you use the better the advice -- Goodreads obviously wants us all to engage with their service as much as possible. One benefit to doing so is that the recommendations served up will be more fine-tuned to your interests. [Discussed at 5:24]
  • Serendipity can be found further down the long tail -- Part of what makes the Goodreads recommendation engine so valuable is that they're not just recommending the latest bestseller on the topic. [Discussed at 6:40]
  • Categories are broad today, but... -- This initial release of the Goodreads recommendation engine uses large buckets (e.g., History, but not narrowed down to, say, WWII). Over time the granularity, and therefore, the value of this aspect of the service will improve. [Discussed at 13:45]