Navigating and Searching in Hierarchical Digital Library Catalogs

Robert B. Allen Bellcore, MRE 2A367, 445 South Street, Morristown, NJ, rba@bellcore.com

Abstract

Two interfaces are described for navigating large collections of document and book records. An Online Public Access Catalog interface uses a classification hierarchy to facilitate browsing and searching. The system has been implemented and currently runs with over 50,000 book records. Interface widgets allow the hierarchy to be displayed and traversed easily. For example, the Book Shelf dynamically updates itself to reflect searches and attribute selections. A second interface, not yet fully implemented, allows access to the ACM Computing Reviews classification.

Keywords: Classification, hierarchies, hypertext, interface, OPAC, retrieval, search.

1. Navigation and Searching

Hypertext systems allow a user to browse a highly structured network of links and nodes. Information Retrieval (IR) systems usually return ranked lists of document records according to how well they match a query as determined by a retrieval algorithm. Both approaches have proven effective, but many of the issues in combining them remain to be explored. One domain in which this integration is needed is managing collections of document and book records.

Hierarchies are the primary organizing principle for many book classification systems. A hierarchical Online Public Access Catalog (OPAC) structure provides an a priori similarity space for locating related books. For instance, a user may search to find a shelf area relevant to a query and then check the surrounding books for other relevant items. Organizing books by an a priori similarity may be seen as a weak alternative to the variety of ad hoc organizations made possible by electronic searching. However, a consistent structure reflecting a commonly agreed upon organization of knowledge should help orient the user. It could orient a casual user browsing the book collection and it could be used to organize search results. Thus, the essential question of this research whether an a priori structure has advantages in conjunction with derived similarity spaces for navigation and retrieval.

While OPACs are widely used, many of these interfaces are designed for ASCII terminals and do not have advantages such as direct manipulation associated with GUIs. Some prototype systems introduce creative interfaces, but they may not scale well [4, 12]. Other OPACs provide extensive term searching but do not take advantage of the hierarchical organization [7]. Cataloging systems [e.g., 11] also provide access to the hierarchical classifications. However, these generally have only simple graphical interfaces and are not documented in the literature.

Interfaces for electronic books have now been widely studied, but relatively little attention has been paid to the management of collections of books in these systems. The SuperBook[TM] browser [5, 6] takes advantage of the hierarchical structure of documents. However, the SuperBook browser itself is not effective for navigating the hierarchical structure in an OPAC; it does not allow fielded search and it is not designed for presenting short records.

Section 2 describes an OPAC interface that exploits hierarchical organization. Section 3 describes an interface for the ACM Computing Reviews categories.

2. HOPAC Interface

Figure 1 shows an interface that allows interaction with the Dewey hierarchy by fielded search on the records and presentation of records on a dynamic shelf. The interface is composed of three main groups of widgets that are described in Section 2.2. The interface has been implemented in Xwindows with Motif widgets.

2.1. Book Records and Classification Hierarchy

The Dewey Classification System is probably the most widely used international classification system. It is also the purest hierarchy of the major library classification systems. While part of the Library of Congress MARC system is hierarchical, the "Cutter" number extensions are orthogonal to the main classifications. The Dewey Classification System was designed for cataloging books [11], but it has also been suggested as the basis for an interface for access by the casual user [10]. With the introduction of high-powered personal workstations and

Figure 1: Hierarchical OPAC Interface

flexible GUI interfaces, the accomplishment of this goal for the casual user is now much more practical.

The headings for a large part of the Dewey Decimal System were obtained and merged with the book records. While the Dewey hierarchy, like any classification system, is not suitable for all tasks, it is useful for a large range of task and is familiar to many users. In preparing the corpus, long call numbers were truncated to 4 decimal places. In a few cases, the hierarchy was not complete and filler headings were inserted. For instance, in the Classification immediately below the first-level node 000.0 Generalities is the third-level node 001.0 Knowledge. A second-level heading 000.0 General was created to match other second-level headings under 000.0 Generalities such as 010.0 Bibliography.

Book and document records numbered by the Dewey Decimal Classification System were obtained from the Bellcore Technical Libraries. They covered approximately 50,000 books and technical reports. Each record included the shelf number, author, title, publisher, location, a subject field, and a list of the library locations where the book was held.

2.2. Interface Widgets

2.2.1. Subject Hierarchy and Current Node Lists

The upper left corner of Figures 1 and 2 shows the Subject Hierarchy and Current Node lists. These allow a user to navigate through the hierarchy and serve a function similar to the expandable Table of Contents (TOC) of the SuperBook browser. In a deep and wide hierarchy, such as the Dewey Classification System, the contents of the expanding TOC would frequently scroll out of view. Although less information is presented in separate Subject Hierarchy and Current Node lists than in an expanding TOC, these lists yield a more predictable display and are especially suitable for the Dewey Classification records where the shelf number provides an additional pointer into the hierarchy. Moreover, hierarchies have looser semantic connections between nodes at the same level than the tables of contents of documents and books.

Figure 2: Interface after Search for "Shannon"

The Current Node list displays items which allow the user to navigate deeper into the hierarchy. Initially the current nodes are the top-level classification terms (as shown in Figure 1). When nodes lower in the hierarchy exist, the higher-level nodes are marked with an "=" . The Subject Hierarchy list displays the hierarchy nodes above the books currently being displayed on the Book Shelf. Clicking on one of the higher-level nodes causes the immediate descendants of the selected node to be displayed in the Current Node list. In addition, the Shelf displays books at the selected node. Figure 2 shows the Current Node list with three choices. A search on the author name "Shannon" has just been completed and eight books were returned. The Shelf displays only those eight books and their immediate parent nodes in the default HitsOnly display mode. The Subject Hierarchy has opened to the node that contains the first matched book.

Counts of search matches are posted beside the node labels and they can help the user locate relevant items. For instance, in Figure 2 the user can see that only 2 of the 8 books matching this search are under the heading of 000.0 Generalities. This suggests it might be worthwhile to examine those books under other parts of the hierarchy.

2.2.2. Book Shelf and Book Display Widgets

The Book Shelf (right side in Figures 1 and 2) does not attempt to mimic a physical book shelf. Rather, it is a very long list of records. The user typically has only a partial view of the list. The view of the Shelf is limited by the number of items that can be displayed on the screen at any time and by options that determine which records and which attributes of those records are to be displayed. The selection of displayed attributes is determined in response to iterative queries that control a filter mask. Thus, the Book Shelf is "dynamic" in the same sense as the dynamic graphical query interface described in [15] and used in data viewers [e.g., 13]. Nodes in the classification system immediately above the selected books are also presented on the Shelf. The default display for records on the Shelf shows titles. The user can select other record attributes to be presented on the Shelf such as the author name, the length (number of paper pages), and the publisher. In the current implementation, the Book Shelf widget list contains a very large number of records and it is slow to reinitialize.

Figure 3: Interface for Browsing Computing Literature by Computing Reviews Categories

When the user clicks on a book title, a Book Display widget is opened showing the full record for that book. One Book Display option allows the user to request Similar Books. This searches for books similar to the displayed book where similarity is determined by one of the retrieval algorithms, rather than by shelf proximity. This option spawns a new search that, when it follows an initial search, it is a type of relevance feedback. Because the book records are short, the Similar Book requests yield some spurious matches. As with the initial searches, posting similar-book hits against the Subject Hierarchy allows the user to follow the classification semantics to identify relevant items. The Book Display contains further options including one for presenting other books by the same author. This links books across leaf nodes of the hierarchy. It has not been fully implemented because many of the connections would have to be made by hand.

2.2.3. Fielded Search and Attribute Selection Widget

The Fielded Search widget (lower left in Figure 1) generates searches on book record fields such as title, author, and subject descriptors. Two search algorithms are available. One uses a Boolean OR of matched terms. The second is based on term matches between the query and the document terms weighted by term frequencies.

Attributes, such as the library location, whether the document has been checked out, and the type of document, which may be used to select subsets of books are controlled by menus. By selecting various library locations it is possible to examine the virtual Shelf for any one location or any combination of locations of the Bellcore Technical Libraries.

2.3. Examining the Book Shelf after a Search

Following a search the user can step forward and backward to the next matched book with the Up_Book and Down_Book buttons. These buttons provide a convenient way to move quickly through the hierarchy while allowing the user to keep a sense of the location within the hierarchy. The Up_Node and Down_Node buttons allow the user to move even more quickly by jumping from one node which contains hits to the next.

The hierarchical interface is most effective for comparing documents of relatively similar retrieval values because it does not display information about the quality of the matches. That is, unlike typical IR systems that present ranked similarity, the interface based on hierarchical structure does not readily show graded retrieval scores. Thus, a titration procedure was developed to select a reasonable number of titles to be displayed. In the current implementation, the system attempts to find a threshold to display more than 5 but less than 100 books.

The Previous_Book_In_Order and Next_Book_In_ Order buttons let the user examine books in the ranked order in which they matched the query. It is easy for the user to lose orientation because the books are not necessarily in order and the user viewing them jumps around the hierarchy. Furthermore, if the user requests Next_Book_In_Order after all of the books in the initial (titration) set have been viewed, the set expands by relaxing the threshold. The user is notified of this change in the display on the Feedback Window (lower left in Figure 1), but the hit counts are also updated and this may confuse the user.

2.4. Additional Features

Several additional widgets are under development. Graphics can often help orient users with large amounts of data [9]. For the hierarchical OPAC, an active dendrogram is being developed like the one in [3]. The graphical view can be used in many ways, such as displaying search hits. Another feature that is being developed is a personalized shelf on which the user can create relevant collections.

3. Computing Reviews Classifications

The computer science literature as organized by ACM Computing Reviews (CR) classification system [1]. Unlike the Dewey Classification, documents in the CR system may appear in several different parts of the hierarchy. There are several relatively orthogonal dimensions in the CR classification system. In that respect, it is like a facetted classification system [14].

Figure 3 shows a partially operational interface for browsing the computer science literature by means of the Computing Reviews classification. Major categories are chosen from the Facets widget at the upper left. These selections open cascaded menus which display lower-level categories. When the "+" to the right of the facet label is selected, the facet is added to the Current Constraint list (lower left). In order to give context to the selected constraints, their parents are displayed in parentheses on the Constraint List. The constraints are ANDed together to determine which documents are displayed in the Shelf. This is analogous to the Hits Only mode of the OPAC interface. Of course, the constraints propagate to all their descendants. Constraints can be dropped from the Constraint List by clicking on the "-".

A second way to employ the CR Classification would be to search for an article of interest and then find other articles that have the similar classifications. This is a type of lateral link across the hierarchy. For instance, among Doctoral Dissertations that were cited in Computing Archive [1] as having been published in 1992, the most frequent associate of category H.3.3 (Information Storage and Retrieval) was H.3.5 (On-line Information Systems). Thus, users who access articles under H.3.3 might be informed that articles likely to be related to their interest may be found under H.3.5.

4. Discussion

Interfaces have been developed for accessing collections of book and document archives. Although no formal user testing has been undertaken, informal tests suggest that the interfaces are intuitive. The greatest problem appears to be complex interactions among features. For instance, with Hits Only mode there are often too few selections to fill the Shelf Display; thus, the Up_Book and Down_Book buttons have no effect. In addition, some test users have suggested that the elision in the Hits Only mode should apply to TOC as well as the Book Shelf.

These interfaces could provide the basis for access to additional electronic information sources. Clearly, it would be possible to have the short document records point to the full text of the books and documents. Moreover, encyclopedia articles describing authors could easily be presented. Likewise, book reviews, citation statistics, circulation data, and user annotations could be included as part of the Book Display. Conversely, an electronic encyclopedia could access the OPAC for bibliographies.

Overall, these interfaces attempt to demonstrate that the structure of a classification system can be a useful aid for searching and navigating a digital library catalog. Techniques such as titration and lateral linking show how IR and Hypertext approaches can be combined. It is also worth noting that similar approaches could be applied to a search-based OPAC [e.g., 7] and display similar books for items that match a query. In any event, while the Dewey Classification System provides links to other, presumably related, documents, there are many other dimensions of similarity among collections of books and documents (e.g., author, citations, publisher) that could be used for linking as well. It remains to be seen whether all of these dimensions can be coordinated into usable interfaces.

Acknowledgments

The Dewey Decimal Classification was used with the permission of the Online Computer Library Center (OCLC). The collection of book records used here was developed for test purposes and is not a Bellcore product.

References

[1] ACM, ACM Computing Reviews Classification System. ACM Computing Reviews 35 (1994) 4-44.

[2] ACM, ACM Computing Archive, 1994, New York.

[3] Allen, R.B., Obry, P., and Littman, M., An Interface for Navigating Clustered Document Sets Returned by Queries. Proceedings of SIGOIS (Milpitas, CA, June) ACM, New York, 1993, 203-208.

[4] Borgman, C.L., Walter, V.A., Rosenberg, J.B., and Gallagher, A.L., Children's Use of a Direct Manipulation Library Catalog. ACM SIGCHI Bulletin 23(1991) 69-70.

[5] Egan, D., Lesk, M.E., Ketchum, D., Lochbaum, C.C., Remde, J.R., and Landauer, T.K., Hypertext for the Electronic Library? CORE Sample Results. Hypertext '89 (Pittsburgh, Nov.) ACM, New York, 1989, 299-312.

[6] Egan, D., Remde, J.R., Gomez, L.M., Landauer, T.K., Eberhardt, J., and Lochbaum, C.C., Formative Design and Evaluation of SuperBook. ACM Transactions on Information Systems 7 (1989) 30-57.

[7] Fox, E.A., France, R.K., Sahle, E., Daoud, A., and Cline, B.E., Development of a Modern OPAC: From REVTOLC to MARIAN. Proceedings of SIGIR (Pittsburgh, June) ACM, New York, 1993, 248-259.

[8] Frisse, M.E., Cousins, S.B., and Hassan, S., WALT: A Research Environment for Medical Hypertext. Hypertext '92 (San Antonio, Nov.) ACM, New York, 1992, 389-394.

[9] Lesk, M.E., What To Do When There's Too Much Information? Hypertext '89 (Pittsburgh, Nov.) ACM, New York, 1989, 305-318.

[10] Markey, K. and Demeyer, A.N. Dewey Decimal Classification Online Project: Evaluation of Library Schedule and Index Integrated into the Subject Searching Capabilities of an Online Catalog, OCLC, Dublin OH, 1986, OPR/RR-86-1.

[11] OCLC (Forrest Press), Electronic Dewey. Dublin OH, 1993.

[12] Pejtersen, A.M., A Library System for Information Retrieval Based on a Cognitive Task Analysis and Supported by an Icon-Based Interface. Proceedings of SIGIR (Cambridge, MA, June) ACM, New York, 1989, 40-47.

[13] Swayne, D.F., Cook, D., and Buja, A., Interactive Dynamic Graphics in the Xwindow System with a Link to S. Proceedings of the Section on Statistical Graphics of the American Statistical Association (Atlanta) ASA , 1991, 1-8.

[14] Vickery, B.C., Facetted Classification. New Brunswick, Rutgers University Press, 1965.

[15] Williamson, C. and Shneiderman, B., The Dynamic HomeFinder: Evaluating Dynamic Queries in a Real-Estate Information Exploration System. Proceedings of SIGIR (Copenhagen, June) ACM, New York, 1992, 338-346.