Here’s how indexing could evolve with ebooks

Telescope-122960_1920Last month I shared some thoughts about how indexes seems to be a thing of the past, at least when it comes to ebooks. I’ve given more consideration to the topic and would like to offer a possible vision for the future.

Long ago I learned the value an exceptional indexer can bring to a project. For example, there’s a huge difference between simply capturing all the keywords in a book and producing an index that’s richly filled with synonyms, cross-references and related topics. And while we may never be able to completely duplicate the human element in a computer-generated index I’d like to think value can be added via automated text analysis, algorithms and all the resulting tags.

Perhaps it’s time to think differently about indexes in ebooks. As I mentioned in that earlier article, I’m focused exclusively on non-fiction here. Rather than a static compilation of entries in the book I’m currently reading, I want something that’s more akin to a dynamic Google search.

Let me tap a phrase on my screen and definitely show me the other occurrences of that phrase in this book, but let’s also make sure those results can be sorted by relevance, not just the chronological order from the book. Why do the results have to be limited to the book I’m reading though? Maybe that author or publisher has a few other titles on that topic or closely related topics. Those references and excerpts should be accessible via this pop-up e-index as well. If I own those books I’m able to jump directly to the pages within them; if not, these entries serve as a discovery and marketing vehicle, encouraging me to purchase the other titles.

This approach lends itself to an automated process. Once the logic is established, a high-speed parsing tool would analyze the content and create the initial entries across all books. The tool would be built into the ebook reader application, tracking the phrases that are most commonly searched for and perhaps refining the results over time based on which entries get the most click-thru’s. Sounds a lot like one of the basic attributes of web search results, right?

Note that this could all be done without a traditional index. However, I also see where a human-generated index could serve as an additional input, providing an even richer experience.

How about leveraging the collective wisdom of the community as well? Provide a basic e-index as a foundation but let anyone contribute their own thoughts and additions to it. Don’t force the crowdsourced results on all readers. Rather, let each consumer decide which other members of the community add the most value and filter out all the others.

This gets back to a point I’ve made a number of times before. We’re stuck consuming dumb content on smart devices. As long as we keep looking at ebooks through a print book lens we’ll never fully experience all the potential a digital book has to offer.


The lost art of indexes in ebooks

Labyrinth-1015639_1920When was the last time you used an index in an ebook? Maybe the better question is this: Have you ever used an index in an ebook? One of the challenges here is that most ebooks don’t have indexes, the result of the misguided notion that text search is a better solution.

Every so often I come across an ebook with an index. More often than not it’s just the print index at the end of the book, sometimes with nothing more than the physical page references that offer almost no value in a reflowable e-format.

Fiction represents a large chunk of ebook sales and those books generally don’t benefit from an index. The same is true for some types of non-fiction books. But for pure reference guides, in-depth how-to’s and other works, an index can be pretty useful.

If you’re relying exclusively on text search in an ebook you have to know exactly what you’re looking for. More importantly, why do we settle for such a lame text search solution when we’re spoiled every day with powerful, relevance-ranked search tools like Google?

When you search for a phrase in an ebook the results are shown in chronological order. You see all the occurrences from the beginning of the book to the end. Imagine if Google worked that way. So when you type in a phrase Google tells you the first (oldest) site to use that phrase, then the next oldest site that used it, etc. Users would laugh and reject it, yet that’s exactly what we’re forced to accept in ebook search.

What I really want is relevance-based results. Show me the location in the book with the highest density of that phrase and prioritize occurrences of it in a heading over occurrences in body text. I’m sure there are other attributes that could be rolled into an effective ebook search algorithm but I’ll take just those two features for starters.

The other problem with relying on search instead of an index is that you lose the benefit of synonyms and related terms. An indexer takes all that into consideration so you’re much more likely to find everything you’re looking for with a good index than a simple text search.

I’m not lobbying for back-of-book indexes in ebooks like they appear in print books. That’s another aspect that needs to change when you go digital. I want to see index functionality right there on the page I’m reading. The trick here is to offer it in a manner that’s not disruptive for the reader.

Remember that article I wrote a few weeks ago with the video showing a vision for auto-enriched ebooks? The same UI approach described there could be used here. The content is initially presented in as clean a manner as ebooks are today. But when you tap the screen on your tablet all the phrases that are indexed magically change color or are denoted with some other UI effect (e.g., underline). Just tap the phrase you’re interested in and a pop-up appears with relevance-ranked index results. These would be presented in a scrollable list with each entry having a preview of the text from that location in the ebook. Make it easy for me to bookmark those entries right in the pop-up. The net result is a way to quickly and easily access a smarter index without having to leave your current location.

This feature doesn’t exist today because we’re still stuck in the print-under-glass era of ebooks. I’m optimistic that one or two of the popular reading applications will eventually add such a capability though and help us get beyond today’s model where we’re consuming so much dumb content on all these smart devices.


Why I’m not on the Amazon Echo bandwagon…yet

Screen Shot 2016-03-06 at 9.32.56 AMI almost bought an Amazon Echo last November. It was on sale for $129 and I figured it was too good a deal to pass up. Amazon promised two-day Prime delivery but they got overwhelmed by all the orders and, like many others, they botched mine and said I might receive it by end of year. At that point I decided it wasn’t meant to be so I cancelled and I’m glad I did.

I already have a couple of other terrific Bluetooth speakers and while the Alexa voice control feature is nice, I’m not convinced it’s worth $100+. It reminds me of dedicated GPS devices and fitness bracelets, both of which have been replaced by sensors in my phone.

Echo is more of a nice-to-have, not need-to-have, item for me, especially with its ability to turn news and other types of written content into streamable audio content. But I’m much more interested in a mobile solution, not one that sits on a countertop.

Like GPS and fitness devices, Echo’s main functionality will also eventually find its way into the phone itself. The reason I’m prefer a mobile solution is that I spend a lot of time in my car where I use the Bluetooth feature of my radio and phone to listen to podcasts, music, etc.

The Echo platform becomes very attractive to me when it’s nothing more than an app on my phone that plays through my car radio. The app handles all the speech command conversion via the cellular connection, the same way the streaming content arrives.

This app doesn’t have to be free, btw. Charge me $5/month or something close to that and I’ll gladly pay for the option to “play news” and other commands in my car.

Where this really gets fascinating is with longer-form content and the ability to use voice commands to annotate and highlight audio books, for example. Whether it’s in my car or at home, it would be nice to finally have the ability to do more than just listen to an audio book. For example, when I hear a noteworthy passage, I’d like to be able to say “pause”, “highlight last two sentences”, “add private note to highlight saying ‘this is something I should pass along to the marketing team’”, etc.

Take it a step further and integrate my email app so that rather than just making that verbal note to pass along to marketing, let me say, “create email to Joe Smith at company.com, subject ‘key discovery’, body is highlight, send.”

Let’s say you’re listening to that book and you hear a phrase, person or location you’re not familiar with. The app should have the ability for me to say, “pause”, “tell me about phrase/person/location” and the app responds with the appropriate audio stream (e.g., top Google search result, Wikipedia entry, etc.)

All my audio highlights and annotations must be searchable, by voice as well as text. In fact, let’s add the capability to integrate all these highlights and notes into Evernote so I can keep everything in one place.

Amazon might be happy selling $100+ voice-controlled Bluetooth speakers today but the real opportunity is with a fully mobile, app-driven solution that integrates with a broader number of content sources and streams. We’re not there yet but by combining voice control and streaming audio the Amazon Echo platform is starting to show us what’s possible down the road.


Maximizing mobile micro-moments

Girl-925284_1920Google recently published a document entitled Micro-Moments: Your Guide to Winning the Shift to Mobile. You can download the PDF here. It’s a quick read and worth a close look.

I’ve long felt the publishing industry is too focused on simply delivering the print experience on digital devices, something often referred to as “print under glass.” That strategy has created new revenue streams over the past 10 years but it’s not the end game. Mobile represents opportunities for new methods of engagement and discovery; that’s precisely what Google’s document outlines with plenty of interesting stats.

For example, the document notes that “we check our phones 150 times a day” and then reminds us that each session is barely a minute long. That might be an average length but I’ll bet the mean is even shorter. How often do you pull your phone out for only a quick, 10-20 second peek at your email inbox or news? That’s probably my typical session length and based on what I see around me I’m confident it’s the case for plenty of others as well.

So what about that oft-used scenario of pulling the phone out to read an ebook while standing in line at the grocery store? That’s clearly something publishers fantasize about but consumers rarely, if ever, do. It’s more info snacking and short, bite-sized pieces of content that are consumed in most of these mobile sessions.

That trend isn’t changing anytime soon. As the Google doc states, in the past year mobile sessions have increased 20% while session time has decreased 18%. We’re shifting from longer desktop sessions to shorter mobile sessions.

Google asks this very important question: How does your brand perform on keywords searches that are vital to your business? Don’t just focus on search results ranking, btw. You may appear at the top but does the resulting link take a visitor to a terrific mobile experience? Responsive design is part of that but the more important point is that the destination page is constructed with content or a call-to-action perfectly designed for those 10-20 second mobile session bursts.

What does a great, mobile-optimized destination page look like? For one thing, it’s probably a single screen requiring no scrolling on even the smallest of phones. If you can’t deliver on that promise you need to focus on giving the visitor a reason to provide their email address for more details. Again, everything should be designed for an extremely short user session.

On page 8 Google says that that video how-to searches are still on an extremely steep growth trajectory. They’re up 70% year-over-year and far from plateauing. Your business is probably built around written content, but if you’re in the how-to space you’ve got to think about how to remain relevant as more solutions are discovered via mobile searches and delivered in video, not written, format.

Take a few minutes to read and highlight elements of Google’s report. There’s a lot of terrific information here and I guarantee it will both inspire you as well as force you to think about the importance of reframing your brand around mobile. There’s so much here, in fact, that I want to revisit the document in next week’s article. So stay tuned for part two where I’ll highlight several other important points as well as share a use-case for how mobile can complement, not replace, print.


Why publishers should embrace the evolution of “fair use”

Scale-310471_1280When I meet with publishers I always ask them about the biggest problems they face in today’s market. One of the most popular answers is “discoverability.” Most publishers fret about getting lost in a sea of other books and promotional campaigns.

Life seemed much easier in the brick-and-mortar days. A publisher simply paid a retailer for premium placement, resulting in endcap promotions and books stacked in high-traffic areas of the store. Those options still exist, of course, but they’re less important now that one retailer dominates distribution and discovery.

That’s why I’m scratching my head about all the negative publisher and author reaction to the recent federal appeals court ruling on Google Books. If you’re not familiar with Google Books, it’s an extension of the search engine enabling discovery and sampling of digitized books. Many of those books are still protected by copyright, hence the delicate nature of the case.

If you’ve never explored Google Books you need to take a closer look before forming an opinion on the ruling. Here’s a quick search for “FDR” on Google Books. The first book link points to my favorite FDR biography, by Jean Edward Smith. Click that link and the first thing you’ll see is a frame with scanned pages from the book. Scroll down a bit and the following note is displayed:

This is a preview. The total pages displayed will be limited.

Every so often you’ll see more notes like this one:

Pages 2 to 9 are not shown in this preview.

In other words, what you’re seeing are merely snippets of the book. There’s no way you can read the entire work inside Google Books. What you can do, however, is search and discover more book content than you’ve ever been able to before.

For example, on page 10 of the FDR book I noticed the phrase “Richard Crowninshield of Boston”. Let’s say you’re a researcher working on a project about Mr. Crowninshield. A colleague said they read something about him in a book but they don’t recall which one. You need that source because you want to buy the book to understand the context and your research requires more than just a page or two where the reference was made.

I challenge you to find the book by searching that phrase on Amazon. I just did that and here are the search results. Smith’s book is nowhere to be found.

Now search the same phrase in Google Books and here’s what you get. One click takes me directly to that page and the left side of the screen tells me Smith’s book is what I need. Notice that Google also includes links to buy the book as well, in print or ebook format.

Publishers, wake up and realize that the largest search engine on the planet offers a powerful way for your content to be discovered and purchased. Rather than getting all litigious about this, why not embrace it and find a way to fully leverage it?

The simple truth is that as technology evolves, the notion of “fair use” is also evolving. I think this is a very good thing, and not just for Google. History is littered with marketplace incumbents who crashed and burned as they tried to protect yesterday’s model. Tomorrow’s publishing leaders will be the ones who take advantage of services like Google Books, not those trying to make it go away in a courtroom. 


Here’s how search will evolve and become more powerful

Telescope-122960_1920You’re probably pretty happy with Google search today, right? It’s incredibly fast, extremely reliable and almost always delivers the desired results. What more could you ask for?

I think the problem with today’s search solutions is that we’ve limited them to what’s online. If the content has a web address and it’s been crawled by the major engines it’s properly analyzed and presented in search results.

But what about everything else? Once again, Evernote is a terrific example of what could be.

I’m a huge Evernote fan and I’ve configured it so that all my notes are exposed and retrievable in a Google search. Alongside the standard web, news, maps, images, etc., search results categories, Google also shows a frame with Evernote’s Web Clipper results. Simply put, a single Google search produces results from the web as well as my Evernote archive. Simple, yet powerful.

Why does it have to stop with the web and Evernote? Why can’t one search be configured to retrieve results from all my content streams?

Let’s start with the documents on my computer and in the cloud. They’re mostly Office applications, so a search needs to understand the structure of Word, Excel and Powerpoint documents. I’m not talking about simply searching file names; this search functionality needs to know whether the phrase is buried in the document itself.

Don’t forget about Outlook and all the other email applications. Search needs to sift through everything in my inbox, folders and attachments.

How about all the digital books, newspapers and magazines I read or scan every week? My search tool needs to capture, index and report back on all that activity as well. I sometimes rate articles and books I read, so the search algorithm needs to understand those rankings and include them in its algorithm, pushing higher-rated results towards the top.

Let’s also not forget about websites I’ve visited. This search tool should understand which sites I frequently visit and which pages I’ve spent more time on, reflecting the fact that I’m reading rather than scanning. This too is critical information for the search algorithm.

Next, it needs to understand my social graph and factor that into the search results. I’m much more active on Twitter than Facebook, for example, so what are the most recent relevant tweets that belong in my search results?

I realize this starts to clutter the results page. That’s why it all has to be configurable by the user. Clicking on/off checkboxes in a list should allow me to show or hide the various sources in search results. 

I’m able to search each of these sources individually today, of course, but there’s no uber-search tool allowing me to consolidate and search across all sources with one query.

Finally, and here’s where it gets even more interesting, I want the ability to curate and share my search results. Today you can do this by sharing the url from the results page; for example, here’s a Google search for my employer, Olive Software. That’s a start, but now I want to insert links to other sources, including all the ones noted above (e.g., documents, emails, ebooks, etc.).

Yes, there are countless sharing, opt-in, privacy and copyright issues to navigate before this vision becomes a reality. But imagine how powerful the results will be when these capabilities become standard features in every search engine.


Why Amazon Firefly is important

At any given point in time it’s easy to assume that search engines have evolved as much as they’re ever going to. Sometimes it’s hard to avoid falling into the logic that was allegedly uttered long ago by Charles Duell: “Everything that can be invented has been invented.”

Putting the gimmicky eye candy called “Dynamic Perspective” aside for a moment, there’s another element to Amazon’s recently-announced Fire phone that everyone in the content industry needs to focus on: Firefly.

On the surface, Firefly also feels like a Fire phone gimmick. In reality, it’s a next generation search platform and likely to be the first significant Google challenger. I’m not suggesting Google will disappear or feel the pain anytime soon, but Firefly will force them to evolve.

Firefly lets you snap pictures of objects so you can buy them from Amazon. It’s the next step in showrooming, the process brick-and-mortar retailers loathe. Publishers need to look beyond Firefly’s ability to enable one-click purchase of a physical book sitting on a table. Rather, publishers need to consider how Firefly will eventually enable the discovery and consumption of all types of digital content as well.

Let’s say you’re at the ballpark watching the Pittsburgh Pirates play. You snap a picture of the beautiful city skyline, looking out from behind home plate in PNC Park. You’re curious to learn more about the park, the team or maybe even the city itself.

Instead of clicking the camera button, click the Firefly button on your Fire phone. Rather than just getting a photo you might not ever look at again, your screen is filled with search results. These aren’t just the website links you get from Google though. You’re looking at all sorts of free and paid content you can consume now or later.

All the usual suspects are included here. You’ll see links to books about the team, park and city. But you’ll also have an opportunity to buy the program, print or digital, from today’s game. And maybe there’s a link to purchase a digital edition of today’s local paper or just portions of it (e.g., the sports section, just those articles covering today’s game, etc.) The results could also include articles about the team/park/city, accessible via either a trial subscription or maybe they’ll ultimately be free thanks to the ever-expanding reach of Amazon Prime. 

Don’t forget that all these results won’t just appear in random order. Amazon will develop a search algorithm as sophisticated as Google’s, but with the benefit of all Amazon’s “customers who viewed x also viewed y” data and capabilities.

Most importantly, don’t forget the power of paid placement in these results. Amazon has generated plenty of revenue from publishers for placement and promotional campaigns. Firefly will open the door to an enormous number of new ways Amazon can charge publishers for premium placement in those Firefly search results.

I haven’t forgotten that you’re sitting at a baseball game and the last thing you want to do is flip through search results and spend time reading content on your phone. That leads me to another model I suspect we’ll see from the Firefly search platform: save for later.

Web searches today focus exclusively on the here and now. You search, find what you need and you move on. Firefly opens the door to a lengthier relationship between user and search results.  You can’t be bothered with all the Firefly details when you’re trying to watch the baseball game. That’s why you’ve configured Firefly to save those results for later retrieval. They could sit in a holding area in your Amazon account, similar to your Amazon Wish List, or maybe they’ll be delivered to you via email. The more likely scenario is that Amazon will do both, of course. Amazon knows the value of data and reminding customers of what they like, so expect to see plenty of notifications about these potential one-click purchase opportunities.

None of this functionality exists today, of course. And most of it won’t be available when the Fire phone ships in July. But rest assured that these and plenty of other innovations will eventually be available through the Firefly feature. Amazon’s #1 goal is to get consumers to buy things and Firefly is a huge step forward in making those transactions happen more frequently and conveniently.  


Your personal index of everything

Google is terrific but it doesn’t help me answer the question, “where did I read about that?”. I’m running into that question more frequently these days, partly because I’m reading so many short bursts of content from so many sources.

It’s not just website content. I’m talking about emails, e-magazines, e-newspapers, ebooks, etc. In short, I need help indexing all the digital content streams I’m consuming every day.

Over the past few years I used a service called Findings that did some of this, but it was an approach that required me to actively curate the articles and excerpts I wanted to preserve and share. I need something that automatically ingests and indexes everything I see, not just what I’ve highlighted. And it has to happen with no clicks, copying or curation required by me. Just index everything I see.

I want a tool that watches all the emails I read every day, keeps track of the content of the webpages I visit, has access to the magazines I read in Next Issue, sees all the content I consume in the Kindle app, the Byliner app, etc. When privacy advocates read that sentence their heads will explode. That’s OK. They can choose to live without this service but I’ll be more efficient because of it.

It’s obviously a concept that requires opt-in from the user. It also requires the ability to tap into content streams from proprietary apps, not just browsers. And it needs to follow me across all my devices. Whoever develops it will help solve a problem that’s only going to get worse in the years ahead. I hope they do so soon because I’d love to start using it today to build my index of everything for tomorrow.


The sorry state of ebook search results

Why is Google so popular and how does it quickly help you find what you’re looking for? It’s all about their algorithm. Google uses a variety of metrics, including how many inbound links a site has, to determine what’s in their search results and how those results are presented.

Imagine Google without their algorithm. Rather than using all these metrics to figure out which site is most relevant, they just give you a list of sites that happen to contain your search phrase. Pretty worthless, right? So why do we accept that same, lame functionality in ebooks today?

Let’s look at an example. I remember reading Jean Edward Smith’s terrific FDR biography awhile back and wanting to go back to re-read the details about Hyde Park. The author provided information about the location earlier and I wanted to find that specific part of the book. Here’s what I got in the Kindle in-book search results:

Hyde park

What an awful user experience. I get every instance of the phrase, listed in the order of appearance in the book. There’s absolutely no indication of how in-depth the coverage of Hyde Park is in any section; I’m left to figure that out on my own.

Now take a look at these search results:

Olive results

There I searched for the phrase “BeagleBone” in a technology book and each of the results has a score associated with them; the results with higher scores offer more in-depth coverage of the topic.

How did I produce those results? They came from an ebook reading platform that does much more than simply reproduce “print under glass.” The content comes into the system as a simple, text-embedded PDF. It’s then analyzed and converted to render in a browser-based reading engine. No third-party apps or plug-ins are required.

The magic is in the content ingestion process. This platform knows when a phrase appears in a first-level heading vs. a second-level heading as well as how many times it appears on that page or in that section. In short, it applies technology to produce a far superior set of search results. 

When will we see this type of functionality in any of the popular ebook reading apps? I’m not holding my breath. The leading vendors apparently don’t see a need to bring their search capabilities out of the dark ages.

If you’d like to learn more about this platform you’ll find summary information here. You’ll also notice it’s the ebook platform solution offered by my employer, Olive Software, Inc. I may not be a book publisher anymore but I’m thrilled to be part of an organization that’s helping lead the industry forward. Relevance-ranked search is just one of the cool innovations that sets us apart. Let me know if you’d like to learn more.


In search of a better search

Imagine Google's search results with no sophisticated algorithm behind them. Rather, when you type in your search phrase and press Enter, Google simply shows you a list of websites where that phrase can be found. No indication of relevance. No ranking mechanism. It's just a list of the sites that contain the phrase. Maybe the list is arranged in chronological order, where the older sites containing your phrase appear first.

Pretty worthless, right? So why do we accept that as our search solution in every major ebook reader app? Open an ebook, search for a phrase and the results merely list each occurrence of it, arranged from page 1 through the end of the book.

You might think a sophisticated search would only be useful for specialty products like textbooks and other reference materials. I disagree. I could see something like this being quite useful in novels, for example. Let's say you forgot who a minor character is and you'd like to quickly learn more about them. Sure, you could do a simple text search and see where the character first appears, but wouldn't you prefer results that provide some context? Maybe it's really the fourth occurrence of that character's name where you'll get the real details on their role in the story. That wouldn't be easy to figure out with today's ebook search capabilities.

And yes, I'm aware of at least two specialty ebook platforms that offer better search results. That's because they have editors who spend hours and hours parsing the content to build this feature manually. I have one word for them: scale. Their solution simply doesn't scale...more on that in a moment.

Here's another use case for a smarter ebook search feature: catalog-wide search. I challenge you to go to Amazon or a publisher's website and use their site search feature to tell you which book has the most in-depth coverage of a particular topic. Let's say you're looking for a book about creating websites. You want one that provides thorough coverage of HTML but also offers a solid introduction to JavaScript. You can't use site search on Amazon or a publisher's site to figure this out. You'll have to look at each book's table of contents and determine the answer yourself, one book a time. A better solution is one where the results show you exactly how deep the JavaScript coverage is in each book, arranged where the books having the most in-depth coverage appear first.

Now back to the scale problem... The key here is to enable these richer search results without requiring a bunch of manual labor. Book publishers are trying to reduce staff and cut costs, not add more of either. So the only way to deliver this service is through a software solution where the content is analyzed and a rich, context-sensitive index is created.

Does that sound far-fetched to you? I don't think so, In fact, I believe we'll see a service like this very soon. I know I'll get a lot of use out of it and I bet you will too.