Why is text-to-speech only an afterthought?
I spend a lot of time commuting to and from work in my car and I try to use the time wisely. I cycle through a playlist of podcasts every week but I feel like I’m missing out on other types of content. Regardless of your daily commute, I’ll bet you’d feel the same way if you’d stop to consider the possibilities.
I’m thinking mostly about short-form content such as website articles, whitepapers and other documents. If someone sends me a link or I discover an interesting article online it’s highly likely I won’t have time to read it immediately. That’s why I typically save it in Instapaper or Evernote.
This approach has turned me into an article hoarder as I have countless unread articles in both Instapaper and Evernote. So while I thought my problem was a lack of time at that moment, the truth is I rarely have time to read many of these things later either.
To its credit, the Instapaper app for Android has a text-to-speech feature built in. But the way it’s implemented tells me it was added as an afterthought. Sure, I can tap the “Speak” button and sit back and listen, but how useful is that when you’ve got a bunch of 2-4 minute articles stacked up and you’re trying to go hands-free while driving along the highway (or taking a walk, or running on a treadmill, etc.)?
Publishers sometimes talk of engaging with the consumer who’s reading their content while standing in the proverbial grocery store check-out line. Next time you’re in line at the grocery store look around. Nobody reads like that. Some people have their phones out but they’re probably scanning Facebook or sending a text message. Rather than heads-down reading you’re more likely to see people with ear buds in, listening to music while they shop or wait in line. And let’s face it: nobody reads while they’re running or doing other strenuous activities.
So along with all those “send to” buttons for various social and “read later” services, why isn’t there one built exclusively for text-to-speech conversions that open up all sorts of new use-cases for content consumption?
The service has to do much more than just transform text to audio though. There’s an important UI component that needs to be considered. The entire platform has to be audio-based, including voice commands. Picture an app on your phone that has all the voice command capabilities of Siri or Alexa, for example. Whether you’re driving or running, all you’d have to do is say things like “skip”, “next article”, “archive”, “annotate”, etc. The user should be able to manually create playlists and the service should offer the option of automatically detecting topics and placing each article in a relevant folder (e.g., sports, business, DIY, etc.).
Don’t forget the social aspect and opportunities here. Using voice commands I should be able to quickly and easily share an interesting article via email, Twitter, etc. Let me also keep track of the most popular articles other users are listening to so I don’t miss anything that might be gaining momentum.
One business model option is probably quite obvious: insert short audio ads at the start of each article, similar to the plugs I’m hearing more frequently in podcasts. And since the article topic and keywords can be identified before streaming it’s easy to serve highly relevant ads that are closely aligned with the articles themselves; think Google AdSense for audio. Give publishers an incentive to feature new “send to audio” buttons on their articles by sharing that well-targeted ad income with them.
Doesn’t this seem like it’s right in Google’s wheelhouse? I suppose they’ve got bigger fish to fry but this looks like an existing marketplace gap that’s just waiting to be filled.