03 Nov 10: Cross platform, web-based “WebHelp” from DocBook

This is a guest blog post from David Cramer, who is a Staff Technical Writer at Motive, an Alcatel-Lucent company, and is also a self-described XML-for-documentation tools geek.

Suppose you're writing context-sensitive help for a cross-platform application that's translated into six languages, including Japanese. If you were writing help for an Eclipse-based application, it would be easy. The Eclipse user assistance infrastructure is powerful and flexible. The format is simple, has internationalization built-in, and can be produced from a variety of formats including DocBook, DITA, and some traditional HATs (Help Authoring Tools). Even if you’re writing help for a native Windows application, you would just produce a .chm file and leave it to the localization vendor to figure out how to produce the Japanese version. But often you find yourself writing help for a Web application and in that case, the choice isn’t as clear.

Sure, you can now create a .war version of an Eclipse infocenter, but that’s probably overkill for a help system that’s going to contain a single document. Moreover, many tech writers won’t have the technical skills to get it to work and even if they do, they’ll probably meet resistance from the developers. Your project manager might be worried about taxing the app server’s performance or worried that if they support another app server they’ll have trouble getting your little war file to work in the new environment. Often developers just want a bundle of HTML, CSS, and JS files from their writers and nothing more complicated.

Now, there are a few cross platform formats that offer all the stuff you expect to find in a help system: a table of contents pane on the left with a search tab and maybe an index tab on the left. Ideally it will highlight the search results in the page. The search needs to support stemming and needs to support languages other than English, including Chinese, Japanese, and Korean (“CJK”). CJK is hard, by the way, because those languages don’t have spaces between their words, so the strategy used by most client-side search engines fails. The indexer can’t easily create a list of terms in the set and indicate on which pages those terms occur, because there’s nothing to use as a delimiter in tokenizing the words. In addition to all that, you might also want a button to hide the table of contents to maximize the content area and you certainly want a way to sync the contents of the table of contents with the content so the user will be able to see where a topic occurs in the table of contents, when jumping from topic to topic via cross references or other links. Oh, and you must have a way to deep-link into the help set so you can do contextual help. It’s not such a hard thing, but for too long our options have been limited to a few commercial tools of questionable overall quality. In fact, on this very blog, Janet has called this the “holy grail”.

Enter the Google Summer of Code 2010 and Kasun Gajasinghe, a student at the University of Moratuwa in Sri Lanka. This past spring, nudged along by Dick Hamilton and Stefan Seefeld, the DocBook Open Repository project applied to be a mentoring organization for the first time. Why hadn’t we done this before? No idea. I guess we all had our heads down in our own problems and it didn’t occur to anyone that Summer of Code would be a great way to advance DocBook development while introducing some bright students to the ins and outs of open source development. In any case, I’m very happy that Dick and Stefan got the ball rolling and were willing to administer our participation in GSoC as well as mentor students and especially happy that Kasun submitted his proposal to provide webhelp for DocBook.

The features that he implemented include:
  • Full text search with:
    • Stemming support for English, French, and German. Stemming support can be added for other languages by implementing a stemmer.
    • Support for Chinese, Japanese, and Korean using code from the Lucene search engine.
    • Search highlighting that shows where the searched for term appears in the results.
    • Search results can include brief descriptions of the target.
  • Table of contents pane with collapsible TOC tree.
  • Auto-synchronization of content pane and TOC.
  • TOC and search pane implemented without the use of a frameset.
  • An Ant build.xml file to generate output.

Kasun's work from his Summer of Code project is now part of version 1.76.1 of the DocBook XSLs. Please take a look and share your thoughts below.

Thanks, Janet, for letting me hijack your blog!

Category: Tools | Posted by: jmswisher


03 Nov 10, 13:07:37 Stefan Kleineikenscheidt wrote:

Hi Janet, hi David,

very interesting! As our single source publishing plugin for Confluence called Scroll Publisher will further evolve, this looks like we should support it.

Is there a particular license for the "runtime".

This looks very promising!


04 Nov 10, 12:40:14 David Cramer wrote:

Hi Stefan,
That would be cool and it's licensed in such a way that you can include it in a commercial application. The overall license for the DocBook xsls is an MIT/BSD-style license. The indexer contains code from Apache Lucene project which is Apache 2.0. Details here:

I look forward to seeing it in ScrollWiki someday.


08 Nov 10, 00:59:28 Stefan Kleineikenscheidt wrote:

Hi David,

awesome! We'll keep you posted.


09 Feb 11, 18:07:17 Jim Campbell wrote:

Hey Janet, Anne Gentle has just pushed out a set of docs based on this for the OpenStack project. They look pretty great!


Hope all is well,


09 Feb 11, 18:22:10 Janet Swisher wrote:

Hi Jim,

Thanks for sharing that link. I agree, those OpenStack docs look sweet.

DITA-heads, take note that you can use a DITA-to-DocBook transformation to get this output from DITA as well. (Contact David Cramer if you need help.)

17 Apr 11, 10:53:04 Denis Bradford wrote:

WebHelp is terrific -- I've demoed it for WinMerge, am hoping to get them to adopt it. The OpenStack customization is amazing.

One question: All WH pages are output to the content root. I can generate pages to subdirectories by using docbook's <?dbhtml dir?> PI, but the TOC still looks for all pages in the root. Any thoughts about supporting output subdirectories, or should I learn to live with it?

17 Apr 11, 22:10:47 David Cramer wrote:

Hi Denis,
Regarding the <?dbhtml dir?> issue, please log a bug in the sourceforge tracker. You're not the first to hit that limitation.

Btw., I _think_ I'll be able to roll the OpenStack UI improvements into a future version of the DocBook webhelp. The OpenStack stuff is open source after all (Apache license I think).


21 Jul 11, 10:29:18 Tasha wrote:

Context sensitive help?


http://bit.ly/paMczQ // cmswire


http://bit.ly/pEV4TH // silicon angle ?

01 Aug 11, 12:53:06 Janet Swisher wrote:

@Tasha: Do you have something to add to the conversation, or are you just link-spamming for MindTouch? I expect better from MindTouch.

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it