October Meeting: Faceted Search with DITA

At our October meeting last week, Roger Hadley and the technical communications team at Fiserv, a global financial service company with an office in Hillsboro, presented to a full house of PDX DITA users from such companies as Cisco, NetApp, InfoParse, and Harmonic.

Roger demos an interesting feature for the crowd

Roger walked us through Fiserv’s implementation of faceted search that relies on DITA metadata and a SQL search server to search their 5,000 help topics. There is one search box to enter search terms and results are listed in a center pane containing links to help topics; you can further refine the results using filters shown in the left sidebar that are based on their metadata labels of Feature, Function, and Role.


The Stats

This awesome project took:

  • A team of 8 (5 writers, 1 lead, 1 architect, and 1 manager)
  • One year of planning
  • One year to implement

And some quick stats about their documentation ecosystem:

  • 5,000 xml topics
  • 6 product categories (each built from their own map)
  • Hundreds of conrefs and keyrefs embedded in the xml files
  • Filters based on the following:
    • Feature — 34 keywords
    • Function — 90 keywords
    • Role — 20 keywords

Planning and Implementation

To complete this project, the team:

  • Researched what their users needed from a better search experience (they had been hearing over the years that the search was getting worse and worse as their content was increasing)
  • Determined that a DITA metadata/SQL Search Server approach could yield a simpler and more accurate search experience


  • Decided that faceted search based on their existing tags for Feature, Function, and Role would best optimize search results for their particular content set
  • Created a homegrown utility that showed their Feature, Function, and Role tags so writers could easily add them to the top of each xml file
  • Manually edited all 5,000 help topics and tagged them with the appropriate Feature, Function, and Role metadata attributes

SQL/MS Search Server Work

The team used Microsoft Search Server Express to index pages based on their metadata. A SQL index file built during the XHTML transformation for each map/product, is used to populate a database of pages for each product. Breadcrumbs, navigation, and metadata-based search links are added to each content page based on the information in the SQL database, as well as the metadata in each page, via PHP.

Microsoft Search Server Express offered some built-in features that they used out-of-the-box or with relatively straightforward customization, such as:

  • Provides a single box on the search home page to enter search terms (which then searches all 25,000 topics)
  • Shows a list of search results in the form of links to help topics below the search box
  • Builds a TOC-like hierarchy in the left sidebar based on Feature, Function, and Role which you can refine results by


Join Us!

We hope you’ll join us December 2nd when we host a joint meeting with our local chapter of Write the Docs.

October Meetup: Building Faceted Search

On October 28th from 6:30-8:00, please join us and members of the Fiserv technical communications team for dinner, conversation, and an overview of how Fiserv is using DITA metadata to build a site with faceted information search. We’ve struggled with search in our own Help implementations, so this one should be really interesting.

Here is their description:

Join the Fiserv Technical Communications team as they explain how they use metadata embedded in DITA topics for the faceted search features of their online documentation system. They will also present the supporting technology and the process they went through to plan and develop the features.

RSVP to marya.devoto@jivesoftware.com if you can make it.

shortdesc: an exploration

At the June meetup of PDX DITA I presented a brief talk about the <shortdesc> element, which had been puzzling me for a long time. Due to tragic audio failure, several remote attendees couldn’t tune in, so I promised a blog rendition of the talk. The post below is not a strict transcription, but it attempts to capture most of the main points, including some that emerged during the Q&A.


[meets] deep but possibly unacknowledged needs

–Don Day, http://ditaperday.com/blog/the-shortdesc-element/

 perhaps the most versatile and yet most challenging element to write for

–Laura Bellamy, Michelle Carey, and Jenifer Schlotfeldt,
DITA Best Practices

As the quotations above suggest, the humble <shortdesc> element has inspired a surprising amount of passion in writers about DITA. Pretty grandiose for a piece of text that’s often ignored and when it isn’t, takes up at most a couple of sentences per topic. Why is the shortdesc such an object of mystery?

I’m not sure I have an answer, so let’s start with a few non-mysterious facts.

What is <shortdesc>?

The shortdesc:

  • Is an optional element that precedes the topic body in a topic
  • Provides clues about topic content (enabling the “progressive disclosure model”–readers can scan it to see if they want to read on)
  • Shows up in search results and link previews as well as the body of the topic
Here are a few things the shortdesc isn’t (or shouldn’t be):
  • A  lead-in or introduction
  • A promise about the contents of the topic
  • A sentence fragment

Why use <shortdesc>?

None of these facts really gets at what the shortdesc is for.  Here is the OASIS reference’s summation of the purpose, which goes a little farther in explaining where the shortdesc shows up, but doesn’t quite get to the heart of things.

The short description, which represents the purpose or theme of the topic, is also intended to be used as a link preview and for searching. When used within a DITA map, the short description of the <topicref> can be used to override the short description in the topic. http://docs.oasis-open.org/dita/v1.1/OS/langspec/langref/shortdesc.html

What is implied here, but not quite said, is that the shortdesc should answer the “so what”? question–in other words, what is the value in this topic, and why should I care about it? The shortdesc should either try and deflect the need to actually read the topic, by extracting the key and most actionable piece of information (especially effective with tasks), or it should attempt to help a reader decide whether this topic will actually be useful enough to be worth reading. The best case is that you can mouse over a shortdesc or find it in a mini-TOC and actually find the key detail you need without reading further–and there are topics that might well consist entirely of the shortdesc. (Imagine a topic of which the shortdesc is “You should install Service Pack 3 before attempting to install the latest version; otherwise your installation will fail.”) But the second best case is that you can tell whether or not this is the topic YOU need to read. For example, a shortdesc to a longer topic with several paragraphs might specify use case information about where the information is applicable. For example, a shortdesc that says “Users who need to use this technology in a distributed environment should understand the flow of information between servers.” You can achieve some of these goals by writing clear titles–but shortdescs give you a lot more space.

Because the content of shortdesc is promoted in search and shows up in a number of places as described below, making the shortdesc valuable in itself or a clear indicator of where to find value is very useful.

Where Does <shortdesc> Show Up?

<shortdesc> shows up helpfully in many places:
  • At the top of the topic in output
  • When you mouse over a cross-ref to the topic in HTML
  • In a mini-toc inside a top-level topic with several topics nested underneath it
  • In search results (internal to a Help system, or in a search engine)

The Need for Consistency

This multi-usefulness, however, creates consistency problems. If you don’t use a shortdesc in EVERY topic, you’ll see mini-TOCs in your output with gaps in them. If some of your short descriptions are sentence fragments, or if some are very long and some are very short, or if they use very different sentence structure, you can find yourself inadvertently creating TOCs with faulty parallelism. This isn’t the end of the world in terms of usability, but it looks sloppy and will annoy people who should be paying attention to your fantastic content.


Some Technical Limitations

<shortdesc> can’t contain any of the following items, so you can’t get too fancy with them. It’s best to think of them as a text-only element since they can’t have:
  • Conditional formatting (use <abstract> if you need to do this)
  • x-refs
  • codeblocks, lists, tables, or other fancy formatting

Challenges of <shortdesc>, Summarized

<shortdesc> is going to work best if your DITA implementation is already working well. If your content isn’t well-structured and concise with a clear purpose for each topic (concept, task, and reference) and a modular structure, it will be hard to write a clear shortdesc–in fact, difficulty in writing a shortdesc may be an early warning about content problems. If your team isn’t working with a clear idea of how to write a shortdesc, you’ll end up with consistency problems, so you need to communicate about what you’re doing. And if you have a lot of legacy content, you may find yourself writing this element in bulk which is probably not most people’s idea of fun.

Most of all, the <shortdesc> element needs to function in multiple contexts and be used consistently. Otherwise you’ll end up with confusing search results or hover text, and mini-TOCs that are gappy or not especially helpful. As we learned in our meetup this June, some people just decide not to deal with this tricky element.

Tips for Success

Nevertheless, <shortdesc> can have a lot of utility in highlighting valuable content. Here are some tips to make yours effective.

  • Use complete sentences.
  • Make sure your content can either stand alone (in which case consider formatting it distinctively in the output) or that it works as the opening of the topic.
  • Don’t be long-winded.
  • Use a consistent sentence structure. Statements work best.
  • Try and offer a takeaway that relieves someone from reading.
  • Don’t promise (e.g. “the following methods work:)
  • Be systematic in getting them done.
  • If a topic contains only one sentence, just make it the shortdesc.
  • If you have them, make shortdescs for “container topics” cover the nested topics succinctly.

How About You?

In our meetup, we learned that most people in our group are using <shortdesc> fairly traditionally, as described above, or else not using it at all. However, we keep hearing rumors about creative uses including special formatting and tool tips. If you have some ingenious ideas about how to use them, we’d love to hear from you in comments or at our next meetup.












March Meeting: Reporting on the State of DITA

The highlight of last night’s meeting of the PDX DITA User’s Group was Keith Schengili-Roberts’ presentation on The State of DITA 2015. Using the crude metric of “more people to feed at our meetings” over the last few years, we had observed that interest in DITA is growing, but Keith provided a more expansive view of the subject based on his research from the last decade. Here is a sampling of data-based observations drawn from his analysis of job postings, case studies, presentations, LinkedIn references and individual reports:

  • Many hundreds of companies worldwide use DITA, with a concentration in the United States and specifically in California.
  • Computer software might still be the largest individual industry using DITA, but a large array of industries outside software make up the lion’s share of users.
  • DITA is the dominant flavor of XML cited in technical writer job postings (DocBook appears rarely these days)
  • In United States job postings, demand for XML experience is trending up and demand for traditional tech writer tools that don’t require structured authoring is gradually trending down.
  • DITA experience is increasingly required or preferred in job descriptions.
  • Job postings asking for DITA expertise are offering higher starting salaries on average than job postings requiring FrameMaker expertise.

The Q&A covered DITA and aerospace standards, demand in the DITA-based CMS market, how review processes work in a DITA-based documentation organization, and how to get better PDFs out of DITA. And Mark Giffin, who called in from California (he is on the OASIS Lightweight DITA committee), alerted us to the existence of an open-source Markdown-to-DITA plugin. This should be of interest to DITA users who work with programmers who see DITA as an obstacle to collaboration.

Thanks to Keith for providing such a great and engaging talk! You can read more about Keith’s work at his blog, Ditawriter.com. Keith’s presentation was sponsored by Ixiasoft who happen to be his employer as well as a maker of DITA component content management systems. (Thanks to Leah d’Emilio for setting up the tech side and making sure everything went smoothly.)

We were especially pleased to see a handful of new DITA users turning up to explore and network: if you’ve been dithering about coming to a meetup because you’re not yet using DITA, please consider this an invitation to show up and find out more. Also, we discovered proudly that one of our regular attendees found a new job by networking at one of our meetings! We like to be socially useful as well as charming so this was very gratifying news. Maybe you will be next.



March Presentation: Keith Schengili-Roberts on the State of DITA

We’re very excited to be able to feature a remote presentation on “The State of DITA 2015” at our March meeting (see the sidebar on our main page for time, date, and location details). In-person attendees will gather for dinner as usual at 6:30, with the presentation starting at 6:45. If you want to attend remotely, please drop a line to pdxdita@gmail.com.

Keith Schengili-Roberts, DITA Information Architect for IXIASOFT and the writer behind the popular “DITAWriter.com” blog, has been doing extensive research on who is using DITA, where they are using it, what tools they are using and why. He has surveyed the technical writing marketplace in the United States and the role that DITA skills and experience have come to play in it. If you want to get a better sense as to who is using DITA, what software tools are popular and the many ways in which DITA is being used worldwide, come to this presentation!

Want to Present in March?

We’re accepting submissions for a short presentation at our March 18th meeting, so if you’ve had the stirrings of a DITA topic in your back pocket, please pull it out, brush off the lint, and turn it into a 20-minute talk. We would love to hear from you on any topic related to DITA XML.

To accept the mantle of grandiloquence, just drop a line to docs@jivesoftware.com before March 1st.

PDX DITA + WritetheDocs Meetup = Magic

A quick and enthusiastic report on last night’s PDX DITA holiday potluck with special guests from the WritetheDocs PDX Meetup Group. But first, a picture of happy documentarians.


We packed 30 people into our largest conference room for a delicious potluck (thanks Puppet team for the buffalo wings!) and a short presentation introducing DITA to prospective users. Leona Campbell and Melanie Jennings enthusiastically described the benefits and challenges of DITA as well as sharing experiences about what it’s like to get up and running with DITA when you’ve previously worked with different toolsets. Because we had a range of experiences at the table, from DITA consultants to working DITA writers to total newbies, there was a great ensuing discussion about why you’d want to use DITA rather than another tool. We also covered the need for different kinds of tooling depending on scale, the challenges of converting existing content versus writing topic-based content in DITA from scratch, and the always popular question of just how challenging it is to teach yourself DITA.

Another great outcome was the robust audience recommendations of resources, both print and online. We’ll be adding to the Resources section of this site soon, so stay tuned.

A big thank you to Mike Jang of WritetheDocs Meetup PDX for the opportunity to join forces, and to Melanie and Leona for a wonderful presentation. We’re looking forward to hearing more from our attendees, especially those who are starting up their DITA pilots soon.



Upcoming Holiday Potluck: WritetheDocs PDX Joins Us!

Our annual Holiday Potluck and final meeting of the year is upcoming on December 3rd. This year we’re featuring a special presentation aimed at new and prospective DITA users, which will also be a joint gathering with members of the WritetheDocs PDX Meetup Group. (We Jivers had a great time attending the WritetheDocs conference this past year, but DITA was not talked about in the presentations. That made us wonder why, and this event is one of the results.)

If you’ve been waiting to show up because you feel like you don’t know where to start learning about DITA, this would be a great time to dip your toes in the water. If you’re a grizzled veteran of DITA and/or PDX DITA, we want to see you too! We’d love to hear about your year in DITA and talk about what you’re doing next.

After last year’s potluck we feel confident in promising a delicious holiday feast. (Please bring a dish! We’re two blocks from Whole Foods and the carts, if you need inspiration.). Then Jive’s Melanie Jennings and Leona Campbell will present a brief introduction to writing in DITA XML, followed by Q&A and general conversation. As usual, we’ll provide beer, questionable witticisms, and stunning views of the Portland skyline.

Looking forward to seeing you!


Integrating the Oxygen WebHelp Plugin into the DITA Open Toolkit

I led a project earlier this year to convert my organization’s legacy Eclipse Help Center builds to HTML5 builds using Oxygen’s new WebHelp plugin for the DITA Open Toolkit. By changing the build output, we achieved the following:

  • Completely eliminated the costly Eclipse server maintenance
  • Improved our analytics results and potentially our SEO
  • Brought our output into HTML5 compliance

Unfortunately, actually implementing the plugin wasn’t just a matter of plopping the plugin into the Plugins directory. I had to upgrade the toolkit, rework scripts, and restructure directories to create a less-confusing, easily upgradeable build. Read on to understand my choices and learn from my experience.

Don’t look here to find a step-by-step procedure for implementing the Oxygen WebHelp plugin. Instead, read this blog post if you want help preparing for this process.

Upgraded the Toolkit

First, I upgraded our toolkit, which I was happy to do because it’s good practice, as well as nice to take advantage of the latest that DITA has to offer. Unfortunately, I couldn’t upgrade it too far, because the plugin only supports DITA-OT 1.7.5, which is still better than the 1.5.4 version we’d been using.

This time, I customized using the Customization directory only. My first experience with DITA was a highly customized DITA OT 4.2.1 that still makes me twitch when I look at it. It worked fine and it had some decent customizations, but upgrading a hacked-up toolkit is about as simple as string theory.

Reworked the Scripts

I reworked the .sh script and our ant scripts. The WebHelp plugin includes a dita.sh file that you need to edit in order to set environmental variables for the build. This replaces the toolkit’s OOB startcmd.sh and was actually a little tricky to figure out because I was retrofitting to an existing build layout.

The instructions from Oxygen tell you how to use the dita.sh, but don’t mention anything about ant builds, which is how we build here (via Jenkins). Oxygen’s support team was also less familiar with this approach. The reason I didn’t want to use only the dita.sh was that I didn’t want to have every writer set TRANSTYPE or DITAVAL_FILE or DITA_DIR  as environment variables every time she ran it from the command line.

For example, here’s what it would look like if the writer had to set all of the variables from the command line each time she ran a build:

"/usr/bin/java" -Xmx512m -classpath "docs/DITA-OT1.7.5LBC/tools/ant/lib/ant-launcher.jar" "-Dant.home=docs/DITA-OT1.7.5/tools/ant" org.apache.tools.ant.launch.Launcher -lib "docs/DITA-OT1.7.5LBC/" -lib "docs/DITA-OT1.7.5LBC/lib" -lib "docs/DITA-OT1.7.5/lib/saxonb9-1-0-8j/saxon9.jar" -lib "docs/DITA-OT1.7.5LBC/lib/saxonb9-1-0-8j/saxon9-dom.jar" -lib "docs/DITA-OT1.7.5/plugins/com.oxygenxml.webhelp/lib/license.jar" -lib "docs/DITA-OT1.7.5LBC/plugins/com.oxygenxml.webhelp/lib/log4j.jar" -lib "docs/DITA-OT1.7.5/plugins/com.oxygenxml.webhelp/lib/resolver.jar" -lib "docs/DITA-OT1.7.5plugins/com.oxygenxml.webhelp/lib/ant-contrib-1.0b3.jar" -lib "docs/DITA-OT1.7.5LBC/plugins/com.oxygenxml.webhelp/lib/lucene-analyzers-common-4.0.0.jar" -lib "docs/DITA-OT1.7.5/plugins/com.oxygenxml.webhelp/lib/lucene-core-4.0.0.jar" -lib "docs/DITA-OT1.7.5/plugins/com.oxygenxml.webhelp/lib/xhtml-indexer.jar" -f "docs/DITA-OT1.7.5/build.xml" "-Dtranstype=webhelp" "-Dbasedir=docs/sbs/7_0/trunk/src/dita/" "-Doutput.dir=docs/sbs/7_0/trunk/src/dita/out/webhelp" "-Ddita.temp.dir=docs/sbs/7_0/trunk/src/dita/temp/webhelp" "-Dargs.filter=docs/filters/on_prem_sys_admin_7_0.ditaval" "-Ddita.input.valfile=docs/filters/on_prem_sys_admin_7_0.ditaval" "-Dargs.hide.parent.link=no" "-Ddita.dir=docs/DITA-OT1.7.5" "-Dargs.xhtml.classattr=yes" "-Dargs.input=docs

Instead, I was able to revise the script so the build command now looks like this:

ant -Dargs.filter=docs/filters/on_prem_sys_admin_7_0.ditaval

My Goals for Reworking the Scripts

  • Simplify the build script so more than one user can build from their own environment
  • Automate the build
  • Produce PDFs along with HTML5 output
  • Make it possible to build using numerous maps, filters, and help sets

My Solutions for the Scripts

To reach these goals, I set the variables at build time using a series of build files that are triggered in one simple call on the command line that passes only the DITAVAL_FILE filter. Any writer, or build automation tool, can now use this command because the build scripts are no longer dependent on absolute paths to the DITA-OT or to an SVN repo.

On the command line or in Jenkins, any writer can run the new command in any help set directory, and the local build files reach out to a shared build file for all help sets. I put all of the targets in that one shared file, single-sourcing it like a good writer would.

I also wanted to preserve a previously-used script in our new build. It starts with a simple ant target that calls some .jars for creating PDFs. This target and its supporting .jars comb the main ditamap for submaps and create PDFs for each submap. This Chapter PDF output has been popular with our users, so I made sure we kept it.

Restructured the Directories

Because I was rewriting build files and creating new Jenkins scripts anyway, I figured it was a good time to flatten our directory structure a bit. I removed some unused directories, including some old build cruft and unnecessary branches (all highlighted in red).

Our former structure with extra layers:

  • productA/feature/ (used for eclipse output).
  • productA/version1/trunk/src/dita/
  • productA/version1/trunk/src/build/
  • productA/version1/trunk/output/
  • productA/version1/trunk/src/temp/

Here’s our new simpler way:

  • productA/version1/dita/
  • productA/version1/build/
  • productA/version1/output/
  • productA/version1/temp/

A next step is to research best practices for directory structures and DITA to further simplify our structure, but I’m happy to work in this slightly less cluttered place.

Customized WebHelp CSS and JavaScript

Knowing the hell of a customized toolkit made me extra concerned with customizing the Webhelp plugin. Although the plugin offers a lot, it does not provide a Customization directory. In order to change the look and feel of our HTML5 output, we had to edit the CSS and some JavaScript directly. This means that when/if we upgrade, it will take some work to transfer the changes to the new plugin. I already saw a huge reorg of files and directories between the pre-release and January versions of the same release.

Out of the box, Oxygen WebHelp plugin output looks pretty good:

Oxygen OOB Output

But I wanted to change our help to match our corporate style:
Jive Help Screen

Things to Ponder

  • Customizing the plugin might cause future headaches, but who wants OOTB?
  • Speaking of customizations: If you want to change the banner/header, brace yourself!
  • You will become an expert if you customize this plugin.
  • Upgrading even a non-customized toolkit is not always simple.
  • Oxygen Support is in Romania — an entire workday of waiting for responses is frustrating. But they are helpful and thorough, so ask them lots of questions.
  • Read the Oxygen Support forums (http://www.oxygenxml.com/forum/). There are some nuggets out there.
  • Hold out for the most stable version of the plugin, no matter how excited you are about it.
  • The Search and CSS are way better than in any other product we evaluated, but still clunky.
  • My last gripe about customizing this plugin is the utter lack of documentation. The paragraph on customizing the CSS cracked me up because they make it sound like you can use any ol’ CSS file, but you still need to point to the html elements that the plugin uses, such as 5 nested UL elements!

Was It Worth It?

Hell, yeah! I’m known for jumping head first into complicated projects, and this was no exception. (Someday, I’ll blog about improving PDF output using XSLT.) But, should YOU use this plugin? The answer is yes if you:

  • Want HTML5 output for an okay price.
  • Like the out-of-the-box look, or you want to learn (or already know) JavaScript and CSS.
  • Are comfortable using Support forums or Technical Support for help.

Have any of you had to implement the Oxygen WebHelp plugin? If yes, I’d love to hear your challenges and how you overcame them.