Given identical computer systems for searching the catalog records, is there an additional and substantial advantage in being able to search the full texts themselves in subject-browseable groups?I submit that anyone who actually has to do research, especially in unfamiliar subject areas or in languages in which he [sic] has little proficiency, would have a decided and fully justified preference for working in Library A [with subject-browseable groups]. (page 131).
Indeed, an interface which reflects the structure of the classification system essentially provides suggestions to a user about further options to pursue following a search. That is, after a search a user can select from the node labels of the classification system near the search hits to identify the subdivisions that may help further refine the search. The classification system can also be used to restrict searches so as to reduce the computational cost and avoid overwhelming users with spurious information.
This paper considers two types of interfaces for accessing books and documents organized in classification systems. The interfaces have been implemented in the X Window System using Motif widgets. The first interface (Section 2) is for the Dewey Decimal Classification (DDC). this uses the hierarchical organization to facilitate browsing and the presentation of book records. The second interface (Section 3) manages documents organized by a type of faceted classification system.
Interfaces for electronic books have now been widely studied, but relatively little attention has been paid to the management of collections of books in these systems. The SuperBook(TM)browser [6,7] takes advantage of the hierarchical structure of individual documents. For instance, it presents chapter and section headings in a dynamic Table of Contents (TOC). However, the SuperBook browser itself is not effective for navigating a hierarchical book classification system; it does not easily support fielded search, and it is not designed for presenting and manipulating short records.
Section 2 describes an interface that incorporates interface features from other systems and adds many new ones. Among these features are fisheye browsing of the classification hierarchy, a full Book Shelf, interlocking operation of the classification hierarchy and Book Shelf display, posting search hits against the classification hierarchy, control of search hit displays on the Shelf, control of the granularity of the search hit displays, and lateral links across the classification hierarchy. Moreover, it supports a realistically large collection of book records.
Some classification systems are partially faceted. For instance, books in the DDC under Art History are organized by geographic areas and historical periods. Books organized by the Library of Congress system include Cutter number extensions which are orthogonal to the main classifications. Many other classification systems, such as the INSPEC Classification for engineering and the ACM Computing Reviews (CR) classification system [2] are faceted.
Figure 1: GUI for Book Records Organized by Dewey Decimal Classification.
Book and document records numbered by the DDC were obtained from the Bellcore Technical Libraries. They covered approximately 50,000 books and technical reports. Each record included the shelf number, author, title, publisher, location, a subject field, and a list of the library locations where the book was held.
Figure 2: Interface after Search for "Human Computer Interaction".
The selection of displayed attributes is determined in response to iterative queries that control a filter mask. Thus, the Book Shelf is "dynamic'' in the same sense as the dynamic graphical query interface described in [22] and as used in general purpose data viewers (e.g., [20]. Nodes in the classification system immediately above the selected books are also presented on the Shelf. The Shelf shows nodes at different levels abutted one after the other. The default display for records on the Shelf shows titles. The user can select other attributes to be presented on the Shelf such as the author name, the length (number of paper pages), and the publisher.
When the user clicks on a book title on the Shelf, a Book Display widget opens showing the full record for that book. Indeed, it is possible to browse the Shelf by selecting successive book titles to be displayed.
For LSI (5) searches, the LSI-value for a node is derived from the position of all the terms in the book titles and subject descriptions of all the books under that node. This is conceptually similar to the approaches of (9,13) for other search algorithms. However, it meant that individual books were not able to be located with LSI. Moreover, because the LSI searches took considerable computational resources for matching vectors, the LSI space had to be precomputed.
Counts of search matches are posted beside the node labels on the TOC widgets. These counts can help the user locate relevant items. For instance, in Figure 2 1184 books match the query and 641 of these are under the heading 000.0 Generalities. This suggests that is the most promising part of the hierarchy for looking for relevant books.
The hierarchical interface is most effective for comparing documents of relatively similar retrieval values because it does not easily display quantitative information about the matches. That is, unlike typical information retrieval (IR) systems that present items ranked by a similarity metric, the interface based on hierarchical structure does not readily show graded retrieval scores. The approach taken here is to set a threshold in the ranked-ordered list and to treat all items above that threshold as hits. Initially, a titration procedure was developed to select the threshold so that, not less than 5 titles and not more than 100 titles would be presented. However, informal user testing suggested that users often wanted to override the titration setting. Thus, a slider for controlling the number of hits displayed was developed. This is similar to the use of a slider for "aggregation manipulation''[12]. In Figure 2, the slider (upper right) has been positioned to show the maximum number of hits (1184 in this example).
It has been believed that book titles are too short to yield effective searches. However, the assumption behind this work is that there are often enough records in a node that relevant words will appear in, at least, some of them. Getting search matches on some of the titles in a node allows the user to reach that node and then to use the Shelf browsing capability of the interface and then to find the most relevant documents. In addition, following a search, the user could easily step forward and backward on the Shelf with the NextMatchNode and PreviousMatchNode buttons.
Figure 3: Graphic Display of Dewey Hierarchy after LSI Search.
Figure 4: Interface for Computing Reviews Classification with Two Constraints Selected.
To show the context of the selected constraint labels, the parents of the constraints are displayed in parentheses on the Constraint List. The Shelf is updated with articles that match the constraints. Of course, the constraints propagate to all their descendants. Constraints can be dropped from the Constraint List by clicking on the "--'' on the right side of the widget.
The interface allows the user either to take documents that match the union of the constraints (AND) or the intersection of the constraints (OR). For large collections, there are often far too many matches for the union. By switching to the AND display, the most relevant documents can be easily found. For the ACM CR collection, there is substantial variability in the number of categories assigned and the criteria for determining relevance of those categories.
Among doctoral dissertations that were cited in ACM Computing Archive [1] as published in 1992, the categories that had two or more overlaps to H.3.3 Information Storage and Retrieval were H.2.4 Systems, H.2.0 General, D.3.2 Design Styles, H.5.2 User Interfaces, and I.2.6 Learning. Thus, a user who accessed articles under H.3.3 could examine those other categories for relevant material. This is a type of lateral link across the hierarchy (see "Extended Features'' section above).
Posting search hits against the hierarchy is more complicated in this case than for the simple hierarchical display because a single document can belong to several categories. The current system uses fractional category memberships when the hits are spread across categories. As noted above, the Book Shelf for the facet interface has no a priori order. Thus, there is no natural order to display search hits. On the other hand, a variety of other ad hoc organizations are possible. For instance, the categories might be ordered by the density of hits. A related problem is which facet hierarchy to pop-open after a search (perhaps to help guide the user to further refine the search).
The greatest problem with these interfaces appears to be complex interactions among features. For instance, in the Hits Only mode there are often too few selections to fill the Shelf Display; thus, the UpBook and DownBook buttons have no effect. In addition, some test users have suggested that the elision in the Hits Only mode should apply to the TOC as well as the Book Shelf. Completely shifting context from one set of screens to another (e.g., with the similar books option) is also difficult.
Beyond the problems of the interface design, there are limitations inherent in this type of interface for hierarchical classification systems. A substantial concern is the user does not know how many books are included under each node. For parts of the hierarchy hierarchy, a user may know or may be able to take a good guess; however, the user may not be at all familiar with other parts of the hierarchy.
The facet interface is probably harder to use than the simple hierarchical interface. This is because of the complexity of managing multiple facet hierarchies and the lack of a natural shelf order for the documents. Moreover, the facet interface described here has not been as well developed as the simple hierarchical interface. For instance, graphical displays might be especially useful for navigation of the facet hierarchies.
Overall, these interfaces suggest that the structure of a classification system can be a useful aid for searching and navigating a digital library. Indeed, it may be worth exploring how digital library classifications can be extended to finding information in less structured domains such as for information in the WWW.
The DDC was used with the permission of the Online Computer Library Center (OCLC). The collection of book records used here was developed for test purposes and is not a Bellcore product. A much earlier version of this paper appeared in Digital Libraries'94, College Station, TX, June, 1994.
[2] ACM, ACM Computing Reviews Classification System. ACM Computing Reviews 35 (1994) 4-44.
[3] Allen, R.B. Obry, P. and Littman, M. An Interface for Navigating Clustered Document Sets Returned by Queries. Proceedings of SIGOIS (Milpitas, CA, June) ACM, New York, 1993, 203-208.
[4] Borgman, C.L., Walter, V.A., Rosenberg, J.B., and Gallagher, A.L., Children's Use of a Direct Manipulation Library Catalog. ACM SIGCHI Bulletin 23, 4(Oct. 1991) 69-70.
[5] Deerwester, S., Dumais, S., Furnas, G., Landauer, T.K., and Harshman, R., Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41 (1990), 391-407.
[6] Egan, D., Lesk, M.E., Ketchum, D., Lochbaum, C.C., Remde, J.R., and Landauer, T.K., Hypertext for the Electronic Library? CORE Sample Results. Hypertext '89 (Pittsburgh, Nov.) ACM, New York, 1989, 299-312.
[7] Egan, D., Remde, J.R., Gomez, L.M., Landauer, T.K., Eberhardt, J., and Lochbaum, C.C., Formative Design and Evaluation of SuperBook. ACM Transactions on Information Systems 7 (1989) 30-57.
[8] Fox, E.A., France, R.K., Sahle, E., Daoud, A., and Cline, B.E., Development of a Modern OPAC: From REVTOLC to MARIAN. Proceedings of SIGIR'93 (Pittsburgh, June) ACM, New York, 1993, 248-259.
[9] Frisse, M.E., Cousins, S.B., and Hassan, S., WALT: A Research Environment for Medical Hypertext. Hypertext'92 (San Antonio, Nov.) ACM, New York, 1992, 389-394.
[10] Furnas, G.W. and Zacks, J., Multitrees: Enriching and Reusing Hierarchical Structure. ACM SIGCHI'93 (Boston, Apr.), ACM, New York, 1993, 330-336.
[11] Godert, W., Facet Classification in Online Retrieval. International Classification 18 (1991) 98-109.
[12] Goldstein, J. and Roth, S.F., Using Aggregation and Dynamic Queries for Exploring Large Data Sets. ACM SIGCHI'93 (Boston, Apr.), ACM, New York, 1993, 23-29.
[13] Hearst, M. and Plaunt, C., Subtopic Structuring for Full-length Document Access. Proceedings SIGIR'93 (Pittsburgh, June), ACM, New York, 1993, 59-68.
[14] Lesk, M.E., What To Do When There's Too Much Information? Hypertext '89 (Pittsburgh, Nov.) ACM, New York, 1989, 305-318.
[15] Mann, T., Library Research Models, New York, Oxford University Press, 1993.
[16] Markey, K. and Demeyer, A.N., Dewey Decimal Classification Online Project: Evaluation of Library Schedule and Index Integrated into the Subject Searching Capabilities of an Online Catalog, OCLC, Dublin OH, 1986, OPR/RR-86-1.
[17] Micco, M. and Basista, T., Beyond Subject Access: The Next Generation of OPAC Software. Proceedings Integrated Online Library Systems (1991), 103-112.
[18] OCLC (Forrest Press), Electronic Dewey. Dublin OH, 1993.
[19] Pejtersen, A.M., A Library System for Information Retrieval Based on a Cognitive Task Analysis and Supported by an Icon-Based Interface. Proceedings of SIGIR'89 (Cambridge, MA, June) ACM, New York, 1989, 40-47.
[20] Swayne, D.F., Cook, D., and Buja, A., Interactive Dynamic Graphics in the Xwindow System with a Link to S. Proceedings of the Section on Statistical Graphics of the American Statistical Association (Atlanta) ASA, 1991, 1-8.
[21] Vickery, B.C., Faceted Classification.} New Brunswick, NJ, Rutgers University Press, 1965.
[22] Williamson, C. and Shneiderman, B., The Dynamic HomeFinder: Evaluating Dynamic Queries in a Real-Estate Information Exploration System. Proceedings of SIGIR'92 (Copenhagen, June) ACM, New York, 1992, 338-346.