Understanding Filtering

It can be difficult to understand the power of filtering if you’ve never done it before. I’ll attempt here to explain filtering for beginners with plans to write another post about how to actually get started with filtering.


Many times when you’re writing and publishing large amounts of content as a technical, sales, or marketing writer, you need the same snippet or large section of content in several different publications. For example, you use the same Terms & Conditions or Executive Statement in all of your publications (your company website, your brochures, your online help center, and so on).


Rather than writing content over and over again, or keeping it somewhere on your network in a bunch of slightly different versions, filtering allows you to easily reuse content in many different contexts (your company website, your brochures, your online help center). Write it once, use it often. How we do this is called “filtering”. You might also hear it called “single-sourcing” or “conditional text” or some other variation on those themes. Same thing. It means that you, the author, manually “tag” the content for the different versions of the content that you publish. You maintain one version but publish many versions.

To my way of thinking, filtering is really a methodology or way of approaching content and authoring content. Once you grasp the concept of filtering, you can accomplish the actual tasks in many different tools: any number of XML editors using DITA, FrameMaker, or even Microsoft Word.

Generally speaking, filtering is easy to implement. But, we’ll explore more about that in another post. For now, an example!



Let’s say you make a robot product with the following features:

  • Pushes you out of bed
  • Makes your coffee
  • Turns on your computer
  • Checks your email
  • Asks Google what you should do today
  • Makes you a breakfast burrito
  • Drives the kids to school
  • Makes lunch
  • Makes dinner
  • Tells you a bedtime story

And you sell the robot in the following models. I’m going to color-code the shared items for you:

  • The 24/7 Robot
    • Includes all features except Tells you a bedtime story
  • The Morning Robot
    • Pushes you out of bed
    • Makes your coffee
    • Turns on your computer
    • Checks your email
    • Asks Google what you should do today
    • Makes you a breakfast burrito
    • Drives the kids to school
    • Makes lunch
  • The Mid-day Robot
    • Makes your coffee
    • Checks your email
    • Makes lunch
  • The Night Owl Plus Robot
    • Makes lunch
    • Makes dinner
    • Tells you a bedtime story
    • Pushes you out of bed

You need to create long, complex sales proposals, web content, and user guides for all of these versions. Wowsa. You’ll make that happen with content tagging. Read on.


Tagging is the magic behind filtering. Using our robot example, let’s say you have an introductory section that lists all the features of your robot. Rather than maintain four different versions of the introductory section (24/7, Morning Person, Mid-day Person, and Night Owl), you would have ONE version in which the content is tagged for the different models. Like I said, magic.

Once again using our robot example, you would tag your introductory section as follows:

Our fantastic robot will revolutionize your entire life. Here's a list 
of its features: [this content not tagged because you want this text to show 
up in all versions]
  • Pushes you out of bed [tagged 24/7, Morning Person, and 
    Night Owl Plus] 
  • Makes your coffee [tagged 24/7, Morning Person, Mid-day Person]
  • Turns on your computer [tagged 24/7 and Morning Person]
  • Checks your email [tagged 24/7, Morning Person, Mid-day Person]
  • Asks Google what you should do today [tagged 24/7 and 
    Morning Person]
  • Makes you a breakfast burrito [tagged 24/7 and Morning Person]
  • Drives the kids to school [tagged 24/7 and Morning Person]
  • Makes lunch [not tagged because you want this 
    text to show up in all versions]
  • Makes dinner [tagged 24/7 and Night Owl Plus]
  • Tells you a bedtime story [tagged Night Owl Plus]

After you had tagged all the content, you would push a button in whatever tool you use to publish things (again, could be an XML editor, FrameMaker, Word, etc.), and that tool would spit out the version you want to see. More on that in another post.

Version Matrix

I’ve found it helpful to maintain a version matrix for my sanity. The matrix can also be helpful as a reference for other team members. Continuing with our example, the robot’s version matrix would look like this:

Screen Shot 2014-03-04 at 11.30.03 AM


There are many benefits of filtering, but off the top of my head, here are some:

  • Produces reliable content every time. No more “Is this the version I put that improvement in a few weeks ago?”
  • Optimizes your content library. No more maintenance of ten slightly different versions.
  • You can show or not show whole topics, paragraphs, sentences, or a single letter (by tagging it).
  • You can produce many individual publications from one set of content files.
  • Bottom line is actually the bottom line: it saves time/money.

To be clear, the use case or need for filtering is only warranted in an environment in which you are already keeping multiple versions of the same content and using them in a variety of publications.

Start Slowly

As I mentioned earlier, filtering requires a mindset change. It takes some time to fully grasp and implement filtering. Start with one small project. In my next post, I’ll explain how to do that.


Are there posts out there that do a better job of explaining filtering? Please let me know in the comments. Also, any other comments and/or corrections are very welcome.

Report on March 12th Meeting

Last night’s meeting was almost entirely devoted to discussing conditional filtering, with a few digressions into conrefs, content management systems, and the local taco carts. Melanie Jennings shared her detailed planning process for tagging and filtering a set of DITA files that will currently support six different filtered outputs for different audiences and platforms, but in the future may have to support many more. We also heard about how other PDX DITA members are using conditional filtering in different ways to get the most out of their files, and discussed some of the challenges of ditamap inheritance, enforcing consistent tagging, and, especially, educating all the writers in a group who need to understand not only the tagging model but how and to whom the final output will be delivered.

We were excited to have Toni Mantych join us for the first time. Toni teaches DITA over at Portland State and is planning to teach a DITA workshop in May through the local STC chapter. Stay tuned for more news about that workshop in a future post.

If you’d like to attend the next meeting in June, please check back here for the date or contact us to be added to the email list for reminders.

Filtering Extravaganza!

For our next PDX DITA meeting, we are inviting attendees to bring in and show their filtering madness. Fun, right? If you don’t know what filtering is, this would be a perfect opportunity to pick the brains of local practitioners to see what and how we’re doing with filters.

We’d like to know:

  • How many topics do you have?
  • How many help sets do you deliver from those topics?
  • How many filter attributes do you use for those help sets?
  • What kinds of validation do you use to prevent logic failures?
  • What would you like to do with filters that you can’t do today? Examples encouraged.

As usual, we’ll be meeting at the lovely downtown offices of Jive Software in Portland. We are looking forward to meeting you and talking shop about DITA!

Opportunity at Jive Software for a Writer with Java/JS Skills

Hi DITA enthusiasts! We have an opportunity working on developer onboarding docs here at Jive Software (the current meeting place of the PDX DITA group): you could be based out of Portland or Palo Alto, reporting to our documentation team but collaborating heavily with our platform API team. If you’re interested, you’re welcome to apply directly via JobVite.

Senior Documentation Engineer at Jive

Anyone who has been to a meeting has probably already raved about our beer, snacks, and view of Powell’s Books! We’re also very modest and charming.

The work for this position will not primarily use DITA, but the rest of the Docs team authors in DITA and you’d almost certainly get the opportunity to use those skills.

A Very DITA Holiday Celebration

Reminding all DITA enthusiasts that PDX DITA’s annual potluck and Year in DITA roundup discussion will take place at Jive Software next Wednesday, December 11th, from 6:30 to 8:00. Make something to share, run by the carts on your way in, or just have a beer and some pretzels on Jive.

We’re hoping to discuss breaking events in DITA world, talk about what we learned at LavaCon, and brag about our conversion to Oxygen Webhelp. What’s up in your world?

Meet the PDX DITA Folks at LavaCon!

The founders of PDX DITA will be presenting at LavaCon! Come meet us in meatspace!

In our presentation, “Socializing Content Creation,” we’ll talk about the ways we  collaborate on technical content using all the latest social tools in our roles as technical authors at Jive Software. But more importantly, we’ll talk about how you can use social tools to get your work done. Here are the details:

“Socializing Content Creation”

Tuesday, October 22nd, 1:45-2:45

Search and Social Track

Portland Hilton and Executive Tower
921 Southwest 6th Avenue
Portland, OR 97204



Topic for June Meeting: Video Plugin for the DITA OT

Sean Healy is going to talk to us about a plugin he created:

It’s a small, Open Toolkit plugin that allows users to review and insert video segments into DITA source. Video can be made granular just like “chunked” DITA. For visual learners, this is of some value. The video reference leverages the xref tag, so the DITA source remains valid wherever the xref tag is valid. The output is HTML5 for <video/>, <audio/>, and <canvas/>, which leverages an HTML5 browser. More info is here: http://seanjhealy.com/lab/ovid-user-guide-for-html5-video/ .

Thanks Sean, and we look forward to this discussion!

Call for speakers!

Our PDXDITA group has been active quarterly for over a year now and we’re ready to hear from YOU. Is there something about DITA that gets you excited? Then let’s hear it! We want to continue to learn from each other. Even if you just want to listen, we hope to see you at our June 12th meeting!
If you haven’t been coming to our meetings, but have wanted to, then leave a comment about what’s stopping you.

CIDM CMS/DITA 2013 Conference Report, Part 4

Last day of the conference! Half day only today.

Antenna House makes an XSL-FO processor called XSL Formatter (in DITA we mainly create PDFs with it). One of their guys gave a talk on a new tool they have created that compares PDFs. Doesn’t sound like much but this tool compares the PDFs pixel by pixel and is capable of finding and flagging the slightest changes. It turns out that you can’t just do a diff on the two FO files used to create the PDFs and have it tell you all the differences there could be in their resulting PDFs. There can be many other factors involved. And the Adobe Acrobat diff tool won’t do it all either, and also has a limit on the number of pages you can compare. Apparently the only way to do this before was to visually compare each page of each PDF.

Frank Miller of ComTech gave a talk on using RelaxNG with DITA. RelaxNG is a simplified XML schema language that is equivalent to DTDs and W3C Schemas, but is simpler and easier to use than either of those, especially in the area of DITA specializations. RelaxNG is so superior that the DITA Technical Committee has decided to use it as the standard schema for DITA, hopefully in the upcoming DITA 1.3. But they needed someone to write a script that converts a RelaxNG schema into equivalent DTDs and W3C Schemas, so I volunteered to do this.

The final session was a long one, a panel discussion on the DITA OT with several people including Robert Anderson, the main architect of the OT. It was really good and I learned that they need help in many areas, not just programming and bug fixing. Documentation is one thing that is needed. If you’re interested in helping, contact Kristen Eberlein.

And then it was over. I learned a ton and made many excellent contacts. Highly recommended if you’re interested in DITA.

By the way, Rhode Island seems to have its own style of graffiti. It has a lot of nice fast lines densely packed together, and it looks like it requires some skill. Here’s a specimen on the wallpaper in the men’s room in Murphy’s Pub in downtown Providence:

2013-04-17 14.30.51

Your guest blogger signing out now, it was lots of fun and thanks for having me!

CIDM CMS/DITA 2013 Conference Report, Part 3

It was the second day of the conference and I was not fully awake yet, but my first talk of my day had the demanding name of “Managing Cross-Publication Links Using Shared Key Definitions.” It was by Eliot Kimber, and he asked us if we were ready for the hardest part of DITA. So I tried to rev up some cognitive horsepower as best I could. Kimber is deeply into this subject and he showed many slides of all text and no graphics while precisely describing this area, its background and its problems, and it was really quite good. I almost feel like a link expert now.

Casey Jordan, founder of the easyDITA CMS, gave a talk about applying the programming practice called “continuous integration” to documentation. It was very technical and went at a fast pace (these talks last only about 35 minutes with a few minutes of questions, so presenters usually have to hustle). Jordan called it documentation-driven development. This is an interesting idea and would probably work well if you could implement it properly. Interesting technical details: He uses the open source exist-db XML database, and uses XQuery to query it.

(Sorry I am attending so many of these technical geek-out talks, but those are my interests. The conference does have a much wider range of subjects for most levels of interest and technical expertise. Find more info on the conference site. I have no commercial connection to them, by the way.)

Kristof Van Tomme from Belgium gave a nice overview of late-model web development techniques that make it easier to support the profusion of mobile screen sizes some of us deal with today. These techniques are called responsive design and adaptive content, and he talked about how to implement them with the Drupal content management framework. There was some DITA in there too, because Kristof continues to do work on integrating DITA with Drupal.

I attended a talk called “Global DITA” by people who work for a very large agricultural equipment company in the Midwest called AGRO. They have a huge DITA installation running on SDL Trisoft and they went over how they implemented it, including how they are cleaning up their considerable translation/localization requirements. They ran into friction from writers who were not happy with the changes required of them. They managed to handle the friction nicely by enlisting one of the friction-makers to help lead the effort to change to DITA. I liked hearing the information architect from this farming equipment company talk, with a straight face, about how her company’s information was in “silos.”

I listened to a group from a very big payroll processing company go into the details of how they saved tons of money and increased their translation quality by taking the chaos out of how their company did translations. Jargon Watch: Apparently their term for “paychecks paid to people” is “pays”, as in “We do 4 million pays annually.” They don’t seem to have employees, but they do have 57,000 “associates.” They are also hip to this late-model use of the word “spend”: “We decreased the spend for translations by 40%.”

This translation talk was interrupted by someone who came in and said we had to move to another part of the building because of a suspicious package found downstairs. We stopped everything and quickly moved out. It turned out to be a false alarm and we got back to our talk about 10 minutes later. People are understandably spooked about recent sad events in Boston, and Providence is about 50 miles from there.

Later I saw a talk on efforts to standardize change markup in DITA (and XML in general). And then a talk on ways to sell a marketing department on using DITA. Separating designers from their InDesign is sure to be an adventure.

End of conference day, nothing else on the agenda. I was lucky enough to be invited to a very nice dinner with drinks at a restaurant by a generous CCMS vendor.

Last day of the conference tomorrow!

Technical communication to the rescue in my hotel room:

2013-04-16 07.42.17