Web Index Preparation with HTML/Prep™
by David K. Ream
©2001, Leverage Technologies, Inc.
Indexers are often called upon to create indexes that will be displayed on the World Wide Web. These requests might include:
- mounting a print index, without links to the indexed material, for marketing or reference purposes;
- duplicating or cumulating print indexes, with links to the indexed material on the web site;
- indexing a single web site or portions thereof; or,
- indexing multiple web sites in specific domains of knowledge.
To produce such web-mounted indexes requires some knowledge of HTML, the markup language used to build web pages. There are various HTML editors and web site building applications available to ease the tedium of inserting the detailed tagging required for HTML pages. However, these software programs offer no features to aid indexers when building an index any more than Microsoft Word does.
Dedicated indexing software, such as CINDEX, Macrex, and Sky Index, provide all the necessary tools for index creation. They also provide features to accomplish tagging the indexing data for typesetting or for basic HTML usage. However, there are other aspects to creating a good web index not addressed by this software.
The HTML/Prep software takes a minimally tagged index file, and, operating with specified options, creates HTML pages which fulfill more completely the requirements for a web index. The index file can be created in dedicated index software and then output as a tagged file from that software, or the tagged index file can be created in a word processor or via a database system.
While this article will not discuss HTML/Prep's specific yet simple tagging scheme, it will focus on some issues relating to heading style, cross-references, and locators as well as the capabilities HTML/Prep provides beyond the tagging necessary for displaying indexes on the web. This article also will not address how to create approach screens (the on-line equivalent to head or introductory notes).
Many indexes, particularly back of the book or journal indexes, are presented in a run-in rather than indented style. While this is necessary to save space due to printing and distribution costs, this is not a factor in web indexes. Indexes on the web should be presented in an indented style. Run-ins are much more difficult for users to follow on the screen. Indentation though is problematic in HTML since it lacks tags specifically designed to handle multiple levels of indentation. Nor does HTML have a way of creating hanging outdents which normally appear in the indented style. HTML/Prep offers several indentation structures for displaying the index that cannot be output as simple tags from other software. It also allows specification of custom tagging schemes.
Similarly, cross-references, or strings of them, are often attached to headings and wrapped onto one or more lines. In some styles, they appear at the end of the subheading display. For a web index, it is recommended that each cross-reference starts its own line and that they appear ahead of the first subheading. Since users only see a small portion of the index on screen compared to a two-page spread in print, it is more helpful to make the lines of the index easy to read and to display alternative terminology or subject headings earlier rather than later.
In a web index, of course, it is preferable to have the cross-references link. HTML/Prep attempts to link main heading cross-references by matching the text of the target heading to the text of the destination heading. The ability to jump around in the index via these links is readily understood by users.
There are various types of locators possible in indexes, but they mostly fall into two major categories:
- page numbers which are related to the physical page rather than the specific text that was indexed (these may also be prefixed by another number such as chapter or volume); or,
- a citation that denotes the specifically indexed text (for example, a document number, section number, or paragraph number).
HTML/Prep can produce web index pages with unlinked page numbers if the index is to be mounted but not actually linked to the indexed material. However, linking to the material from an index with indeterminate locators (page numbers) presents problems for the indexer. Usually some (arbitrary) numbering scheme is developed for the text, which the indexer has to use as the link values in place of page numbers in the index data. In some instances (for example, journals), index entries may need to link to the beginning of the articles if the web site design doesn't provide a means for links to sections or paragraphs within the articles.
For citations with a logical relation to the material, the indexer needs to know how to represent these citations as a link value. Usually the print/display form of the citation becomes the text for the displayed link in the web index. For example, a section number might appear in the print index as 1301.01. Its web link value might be c1301s01. What should display in the web index is 1301.01 but it must link using c1301s01 to display the text of the section.
While HTML/Prep cannot assist directly with the transformation of the locators into link values since that is dependent on the type of locator and the design of each web site, HTML/Prep does simplify the indexer's work by supplying the appropriate link tagging. The indexer need only use the link value as the locator. This drastically reduces the amount of typing of (and typos in) HTML tagging that has to be done in the index file.
The other important issue relating to linked locators is that of displaying multiple locators attached to the same heading. The question is how to differentiate locators especially when there's a long list. Differentiation helps users in two ways: it aids them in remembering which links they've already taken; and, it gives them context possibly letting them predetermine whether a link might lead to the desired material.
For page numbers, which are meaningless as links, some arbitrary link tokens could be presented to users. These link tokens could be a sequential list of numbers, a graphic image, or a special character. It is much more helpful if text can be used for these displayed links to give users some sense of what is being linked to. Of course, this means more work for the indexer to research what material was on the page being referred to and what small word or phrase might denote that. If possible, the best solution is to re-index each of the multiple locators to another separate heading level so that each entry only has one locator attached. For instance, adding the titles of articles if the index is to a journal.
For other types of locators, additional information may be helpful to show as the link text: chapter numbers, section numbers, or volume/issue numbers. When the locators relate to the structure or nature of the material being linked to, users may be better able to determine the most appropriate link to take.
Some examples of multiple locator display styles:
Cats, ¶, ¶, ¶, ¶
Dogs, 1, 2, 3, 4
Fish, 4:3, 6:15, 11:8
Horses, jumping, showing, training
While HTML/Prep cannot solve the differentiation problem, it does have several options for displaying styles of links. The link text can be: the lowest heading level, extra text supplied with each locator, HTML coding displaying text or graphics, or a list of sequential numbers.
Lastly, for a pure web index which links to multiple web sites, it behooves the indexer to completely understand the structure of URLs since these are what will be entered as the locators. These are the addresses of web sites, pages on sites, and even specific locations within those pages.
Index Pages and Navigation
Having dealt with some of the details, I’ll now discuss some broader concerns. How big is the index? While space is not typically a storage concern on the web, the time to download a page is. If the index is small enough, it can be placed in a single HTML page. This allows users to start browsing the beginning of the index while the rest downloads. A simple search tool that works well for a single page index is the browser's Find capability (usually invoked with the keyboard shortcut Control-F).
When the index is too large to be easily downloaded, it should be broken apart into separate pages for each letter grouping. HTML/Prep provides options to do this automatically. It also prepares letter lists (i.e., A to Z letter links) for use on the web site. Additionally, it creates a single page, which is a list of the main headings and cross-references from them. If used on the web site, this allows users to browse the main headings only. Each main heading displayed on this page is a link to the main heading in the actual index page.
Another issue is one of context. There are no page headings, so there are no guidewords or continued lines. If users are viewing a long list of sub-subheadings, there is no way to see what the current heading structure is without scrolling backwards, and then scrolling forward again to the starting point. While various programmatic solutions to this problem can be made available through behind-the-scenes scripting, HTML/Prep provides a straightforward method for context display. It conditions each subheading so that when the mouse hovers over a subheading, a pop-up box displays the current heading structure. Thus users don't have to change position within the index page at all.
Web Site Structure
Many web sites have a distinct overall style, for instance, a background color or wallpaper, fonts (sizes, colors, etc.), navigation sidebars, heading areas, and for indexes, letter lists. Some of these web site properties relate to navigation, such as the last three, which involve frames. If the metaphor of a browser window is extended, then frames are the panes inside the window. The site's webmaster will have to assist in specifying how the index will fit into the web site's structure, style, and navigation methods.
HTML/Prep provides methods for defining all these style issues as boilerplate tags sets that are automatically integrated with the index data. If the index itself will be in its own frame, then other options insert the appropriate frame's name into the links that are built.
This article has only highlighted some of the many design issues related to web indexes and how HTML/Prep can solve them. The advantages to using HTML/Prep include:
- creation of the index in software already familiar to the indexer such as CINDEX
- automating the generation of multiple HTML index pages
- options to produce the proper appearance of the index pages
- simplifying the keyboarding required to input links
- automatic linking of cross-references
- creation of letter lists and a main heading list
LevTech continues to enhance HTML/Prep based customer feedback and requirements. However, special situations may require unique automation steps. LevTech has also created custom programs, for instance, to assist in validating web URLs or transforming locators into links, so that creating web indexes is as simple as possible.
For complete information on HTML, see the World Wide Web Consortium, the group that creates the standard for HTML tagging.
There are also many self-help books on learning HTML.
The American Society of Indexers publication Beyond Book Indexing (2000), edited by Diane Brenner and Marilyn Rowland, contains several articles that go deeper into web indexing issues.
David K. Ream is LevTech's chief consultant. He has an M.S. degree in Computer Science from Case Western Reserve University. Mr. Ream has spent over 25 years working with publishers in the areas of typesetting design and production, database creation, editorial systems, and electronic publication design and production. LevTech is the corporate/government sales partner for Indexing Research's CINDEX products. LevTech also performs computer consulting and programming for editorial & web applications and batch composition services.