June 11, 2013

NEXT POST
Structured documents for science: JATS XML as canonical content format It’s only my 7th day on the job here at PLOS as a product manager for content management. So it’s early days, but I’m starting to think about the role of JATS XML in the journal publishing process. I come from the book-publishing world, so my immediate challenge is to get up to speed on journal publishing. And that includes learning the NISO standard JATS (Journal Archiving and Interchange Tag Suite). You may know JATS by its older name, NLM. As journal publishing folks know, JATS is used for delivering metadata, and sometimes full text, to the various journal archives. But here’s where journal and book publishing share the same dilemma: just because XML is a critically important exchange format, is it the best authoring format these days? Should it be the canonical storage format for full text content? And how far upstream should XML be incorporated into the workflow? Let’s look at books for a minute. The book-publishing world has standardized on an electronic delivery format of EPUB (and its cousin, MOBI). This standardization has helped publishers drill down to a shorter list of viable options for canonical source format. Even if most publishers haven’t yet jumped to adopt end-to-end HTML workflows, it’s clear to me that HTML makes a lot of sense for book publishing. Forward-thinking book publishers like O’Reilly are starting to replace their XML workflow with an HTML5/CSS3 workflow. HTML/CSS can provide a great authoring and editing experience, and then it also gets you to print and electronic delivery with a minimum of processing, handling, or conversion. (O’Reilly’s Nellie McKesson gave a presentation about this at TOC 2013.) And which technology will get the most traction and advance the most in the next few years, XML or HTML? I know which one I’m betting on. In terms of canonical file format, journal publishing may have one less worry than book publishing, because many journals are moving away from print to focus exclusively on electronic delivery whereas most books still have a print component. Electronic journal reading—or at least article discovery—happens in a browser; therefore, HTML is the de facto principal delivery format. And as much as I’d like to think HTML is the only format that matters, I know that many readers still like to download and read articles in PDF format. But as I mentioned, spinning off attractive, readable PDF from HTML is pretty easy to automate these days. So I ask: If XML is being used as an interchange format only, what do we gain from moving the XML piece of the workflow any further upstream from final delivery? Well, why does anyone adopt an XML workflow? The key benefits are: platform/software independence (which HTML also provides), managing and remixing content to the node level (which is not terribly useful for journal articles), and transforming the content to a number of different output formats such as PDF, HTML, and XML (HTML5/CSS3 can be used for this transformation as well, with a bit of toolchain development work). But XML workflows come with a hefty price tag. The obvious one is conversion, which is not just expensive, but costly in terms of the time it takes. Another downside is the learning curve for the people actually interacting with the XML—how many people should that be? In the real world, will you ever get authors, editors, and reviewers to agree to interact with their content as XML? So more likely than not, you’re either going to need to hide the fact that the underlying format is XML through a WYSIWYG-ish editor that you either buy or build (both are expensive), or you’re doing your XML conversion towards the end of the process. On a similar note, how easy is it to hire experienced XSL-FO toolchain developers? But developers who work in the world of HTML5, CSS3, and JavaScript are plentiful. So building an entire content management system and workflow for journal publishing around XML—specifically JATS XML, which is just one delivery format, that isn’t needed until basically the end of the process—doesn’t seem like a slam-dunk to me. I should clarify that using JATS XML for defining metadata does seem like the obvious way to go. But I’m not so sure it’s a good fit to serve as the canonical storage format for the full text. One idea is to separate article metadata from the article body text, to leverage the ease-of-editing of HTML for the text itself. What about moving HTML upstream, and focusing efforts on delivering better, more readable HTML in the browser? What about shifting focus away from old print models and toward leveraging modern browser functionality, maybe by adding inline video or interactive models, or by making math, figures, and tables easier to read and work with? Just to throw a curve ball into the discussion, I attended Markdown for Science last weekend, where Martin Fenner and Stian Håklev led the conversation about whether it makes sense to use markdown plus Git for academic authoring and collaboration. I want to hear from as many sides of the content format conversation as possible. So, what do YOU think? This article was written by contributor Molly Sharp, appeared earlier on the PLOS site and has been presented here with permission of the author. Molly has worked in various content management-related roles since the late 90′s, when she led the implementation of an XML editing and production system for Sybex, a tech book publisher. Most recently, Molly was the Director of Content Management at Safari Books Online, an electronic reference library of 30,000 tech & business titles, where she created and managed a Content Team to ensure the quality of incoming content; designed and maintained content-related processes and workflows; and managed a publishing partner community of more than 100 organizations.
PREVIOUS POST
My new job I spent the last six weeks taking on some consulting projects and exploring full-time job opportunities. I've had the luxury of being very selective on both, but especially on the latter. My primary goal has been to find a role where I can have a significant impact on the organization's future. Culture is critical as well, of course; I've been looking for a team that's passionate about the business and where everyone is rallying around a common goal. Lastly, I made it clear I need to stay in Indiana, even if that means I'm on the road a lot. I'm delighted to let you know that I've found the ideal solution that addresses all these objectives. On June 20th I'm officially joining the team at Olive Software. If you're not familiar with Olive it's probably because they only recently started expanding into the book publishing and eLearning space. Olive specializes in creating a digital presence for publishers and content owners. And although the organization is currently thriving, that's not where the Olive opportunity ends. I spent time in Olive's Aurora office last week and came away thoroughly energized. I wanted to dive in and become a contributor to this team right then and there. I met several Olive employees throughout the day and each one of them left me with the same impression: they all love what they're doing, they deeply respect what their colleagues bring to the table and they're all enthusiastic about the future of the organization. What more could I ask for? :-) I've got a few days to decompress a bit before I fully immerse myself in my new role at Olive next week. I can't tell you the last time I was this excited about a new job. Part of it has to do with the fact I'm stepping outside of the book publishing industry for the first time in many years. But as Kat Meyer and I realized long ago, the challenges that exist in book publishing are similar to the ones faced in other content creation and distribution industries; the length of the work and the frequency of publication doesn't really distinguish content as much in the digital model as it did in the print model. Thanks to everyone who brought me on board for consulting work and to discuss full-time opportunities. These past six weeks have been a new experience for me and I've learned at least one very important lesson: Even though the traditional publishing industry is rapidly shrinking and the job openings are limited, there are plenty of interesting opportunities in adjacent businesses, particlarly with organizations that truly understand digital and aren't paralyzed by The Innovator's Dilemma.

Joe Wikert

I'm Chief Operating Officer at OSV (www.osv.com)

The Typepad Team

Recent Comments