Understanding Filtering

It can be difficult to understand the power of filtering if you’ve never done it before. I’ll attempt here to explain filtering for beginners with plans to write another post about how to actually get started with filtering.

Problem

Many times when you’re writing and publishing large amounts of content as a technical, sales, or marketing writer, you need the same snippet or large section of content in several different publications. For example, you use the same Terms & Conditions or Executive Statement in all of your publications (your company website, your brochures, your online help center, and so on).

Solution

Rather than writing content over and over again, or keeping it somewhere on your network in a bunch of slightly different versions, filtering allows you to easily reuse content in many different contexts (your company website, your brochures, your online help center). Write it once, use it often. How we do this is called “filtering”. You might also hear it called “single-sourcing” or “conditional text” or some other variation on those themes. Same thing. It means that you, the author, manually “tag” the content for the different versions of the content that you publish. You maintain one version but publish many versions.

To my way of thinking, filtering is really a methodology or way of approaching content and authoring content. Once you grasp the concept of filtering, you can accomplish the actual tasks in many different tools: any number of XML editors using DITA, FrameMaker, or even Microsoft Word.

Generally speaking, filtering is easy to implement. But, we’ll explore more about that in another post. For now, an example!

Example

Let’s say you make a robot product with the following features:

Pushes you out of bed
Makes your coffee
Turns on your computer
Checks your email
Asks Google what you should do today
Makes you a breakfast burrito
Drives the kids to school
Makes lunch
Makes dinner
Tells you a bedtime story

And you sell the robot in the following models. I’m going to color-code the shared items for you:

The 24/7 Robot
- Includes all features except Tells you a bedtime story
The Morning Robot
- Pushes you out of bed
- Makes your coffee
- Turns on your computer
- Checks your email
- Asks Google what you should do today
- Makes you a breakfast burrito
- Drives the kids to school
- Makes lunch
The Mid-day Robot
- Makes your coffee
- Checks your email
- Makes lunch
The Night Owl Plus Robot
- Makes lunch
- Makes dinner
- Tells you a bedtime story
- Pushes you out of bed

You need to create long, complex sales proposals, web content, and user guides for all of these versions. Wowsa. You’ll make that happen with content tagging. Read on.

Tagging

Tagging is the magic behind filtering. Using our robot example, let’s say you have an introductory section that lists all the features of your robot. Rather than maintain four different versions of the introductory section (24/7, Morning Person, Mid-day Person, and Night Owl), you would have ONE version in which the content is tagged for the different models. Like I said, magic.

Once again using our robot example, you would tag your introductory section as follows:

Our fantastic robot will revolutionize your entire life. Here's a list 
of its features: [this content not tagged because you want this text to show 
up in all versions]

Pushes you out of bed [tagged 24/7, Morning Person, and 
Night Owl Plus]

Makes your coffee [tagged 24/7, Morning Person, Mid-day Person]

Turns on your computer [tagged 24/7 and Morning Person]

Checks your email [tagged 24/7, Morning Person, Mid-day Person]

Asks Google what you should do today [tagged 24/7 and 
Morning Person]

Makes you a breakfast burrito [tagged 24/7 and Morning Person]

Drives the kids to school [tagged 24/7 and Morning Person]

Makes lunch [not tagged because you want this 
text to show up in all versions]

Makes dinner [tagged 24/7 and Night Owl Plus]

Tells you a bedtime story [tagged Night Owl Plus]

After you had tagged all the content, you would push a button in whatever tool you use to publish things (again, could be an XML editor, FrameMaker, Word, etc.), and that tool would spit out the version you want to see. More on that in another post.

Version Matrix

I’ve found it helpful to maintain a version matrix for my sanity. The matrix can also be helpful as a reference for other team members. Continuing with our example, the robot’s version matrix would look like this:

Benefits

There are many benefits of filtering, but off the top of my head, here are some:

Produces reliable content every time. No more “Is this the version I put that improvement in a few weeks ago?”
Optimizes your content library. No more maintenance of ten slightly different versions.
You can show or not show whole topics, paragraphs, sentences, or a single letter (by tagging it).
You can produce many individual publications from one set of content files.
Bottom line is actually the bottom line: it saves time/money.

To be clear, the use case or need for filtering is only warranted in an environment in which you are already keeping multiple versions of the same content and using them in a variety of publications.

Start Slowly

As I mentioned earlier, filtering requires a mindset change. It takes some time to fully grasp and implement filtering. Start with one small project. In my next post, I’ll explain how to do that.

Feedback?

Are there posts out there that do a better job of explaining filtering? Please let me know in the comments. Also, any other comments and/or corrections are very welcome.

5 thoughts on “Understanding Filtering”

Julio Vazquez on March 26, 2014 at 4:50 am said:

While what you’ve written here is very true, it’s not necessarily the most efficient method of doing things. One of the reasons is that as you add more models, it becomes more difficult to manage the filters and the filtering profile becomes more difficult to manage. (By the way, why wouldn’t the 24/7 robot tell you a bedtime story?) With just the number of variants you have, you run into the problem of error caused by human typing because you can’t use an enumeration list to manage the values.

Could content inclusion be a better solution? In this case probably not because the variants are very different. If there were more commonality, there might even be cases for reorganizing the content and making use of conref ranges, content push, or content replace to make things a little easier and not have to manage complex profiles by managing the inclusions through keydefs.

All that said, this is a great architectural challenge and illustrates the need for planning reuse so that it makes sense and is easy to implement. For more thoughts on filtering and content inclusion, see http://www.writespiritservices.com/blog

Reply ↓
andrew clarke on March 26, 2014 at 6:37 am said:

Nice post… I fell on this post through a like notification from Julio Vasquez’s linkedin account

I would be interested in seeing in a later post how marketing teams and techcomm teams collaborate in creating content. I have yet to see this being tried in any of the companies that I have worked with.

Keep up the interesting stuff
Andy

Reply ↓
Peter Fournier on March 26, 2014 at 8:21 am said:

A couple of improvement suggestions.

1) Avoid spaces and a UNIX and Windows reserved characters at all times in your naming conventions: in this case “space” and “/” (well space isn’t actually reserved, but whatever …).
Best practice is to use “-” (hyphen) instead of space. This gets to be VERY important when maintaining large doc suites with many many topics and you are using GREP search and replace to do batch changes. BTW, your tool may be using GREP without telling you :). As for “24/7” I suppose you could use a hyphen as well.

2) Offer downloadable samples of the files discussed for people to play with?

3) It would be interesting to include a “Daytime Robot” that combines morning and mid-day.

Reply ↓
melanie.jennings on March 26, 2014 at 10:29 am said:

Hey all, thanks so much for these awesome comments! Really great to get this feedback. Great ideas here.

And yes, we do have comment moderation set up on this blog, so apologies for the delay in approving those this morning–too busy at work!

Reply ↓
Roger Hdley on May 10, 2014 at 5:12 pm said:

To reduce the management difficulty that Julio mentions, you can use a combination of content inclusion and filtering. For example, let’s say robot 2.0 releases with the ability to configure a weekday/weekend variable on each existing feature. On weekdays, check work email, on weekends, check personal email, etc. So now you would have 4 x 2 variations.

In this scenario I would use content inclusion for the four basic robot variations based on features, and filtering for the weekday/weekend time-based configurability. Now you have only two filtering profiles instead of four, and you’ve distributed the complexity. For new features, use content inclusion. For new time-based configurability (vacation days, seasonal, etc) use filtering.

Reply ↓

PDX DITA

A DITA users group in the Portland, Oregon metro area.