leveldb
Version:
Bindings for using LevelDB through node.
911 lines (772 loc) • 427 kB
Plain Text
The Project Gutenberg Etext of LOC WORKSHOP ON ELECTRONIC TEXTS
WORKSHOP ON ELECTRONIC TEXTS
PROCEEDINGS
Edited by James Daly
9-10 June 1992
Library of Congress
Washington, D.C.
Supported by a Grant from the David and Lucile Packard Foundation
*** *** *** ****** *** *** ***
TABLE OF CONTENTS
Acknowledgements
Introduction
Proceedings
Welcome
Prosser Gifford and Carl Fleischhauer
Session I. Content in a New Form: Who Will Use It and What Will They Do?
James Daly (Moderator)
Avra Michelson, Overview
Susan H. Veccia, User Evaluation
Joanne Freeman, Beyond the Scholar
Discussion
Session II. Show and Tell
Jacqueline Hess (Moderator)
Elli Mylonas, Perseus Project
Discussion
Eric M. Calaluca, Patrologia Latina Database
Carl Fleischhauer and Ricky Erway, American Memory
Discussion
Dorothy Twohig, The Papers of George Washington
Discussion
Maria L. Lebron, The Online Journal of Current Clinical Trials
Discussion
Lynne K. Personius, Cornell mathematics books
Discussion
Session III. Distribution, Networks, and Networking:
Options for Dissemination
Robert G. Zich (Moderator)
Clifford A. Lynch
Discussion
Howard Besser
Discussion
Ronald L. Larsen
Edwin B. Brownrigg
Discussion
Session IV. Image Capture, Text Capture, Overview of Text and
Image Storage Formats
William L. Hooton (Moderator)
A) Principal Methods for Image Capture of Text:
direct scanning, use of microform
Anne R. Kenney
Pamela Q.J. Andre
Judith A. Zidar
Donald J. Waters
Discussion
B) Special Problems: bound volumes, conservation,
reproducing printed halftones
George Thoma
Carl Fleischhauer
Discussion
C) Image Standards and Implications for Preservation
Jean Baronas
Patricia Battin
Discussion
D) Text Conversion: OCR vs. rekeying, standards of accuracy
and use of imperfect texts, service bureaus
Michael Lesk
Ricky Erway
Judith A. Zidar
Discussion
Session V. Approaches to Preparing Electronic Texts
Susan Hockey (Moderator)
Stuart Weibel
Discussion
C.M. Sperberg-McQueen
Discussion
Eric M. Calaluca
Discussion
Session VI. Copyright Issues
Marybeth Peters
Session VII. Conclusion
Prosser Gifford (Moderator)
General discussion
Appendix I: Program
Appendix II: Abstracts
Appendix III: Directory of Participants
*** *** *** ****** *** *** ***
Acknowledgements
I would like to thank Carl Fleischhauer and Prosser Gifford for the
opportunity to learn about areas of human activity unknown to me a scant
ten months ago, and the David and Lucile Packard Foundation for
supporting that opportunity. The help given by others is acknowledged on
a separate page.
19 October 1992
*** *** *** ****** *** *** ***
INTRODUCTION
The Workshop on Electronic Texts (1) drew together representatives of
various projects and interest groups to compare ideas, beliefs,
experiences, and, in particular, methods of placing and presenting
historical textual materials in computerized form. Most attendees gained
much in insight and outlook from the event. But the assembly did not
form a new nation, or, to put it another way, the diversity of projects
and interests was too great to draw the representatives into a cohesive,
action-oriented body.(2)
Everyone attending the Workshop shared an interest in preserving and
providing access to historical texts. But within this broad field the
attendees represented a variety of formal, informal, figurative, and
literal groups, with many individuals belonging to more than one. These
groups may be defined roughly according to the following topics or
activities:
* Imaging
* Searchable coded texts
* National and international computer networks
* CD-ROM production and dissemination
* Methods and technology for converting older paper materials into
electronic form
* Study of the use of digital materials by scholars and others
This summary is arranged thematically and does not follow the actual
sequence of presentations.
NOTES:
(1) In this document, the phrase electronic text is used to mean
any computerized reproduction or version of a document, book,
article, or manuscript (including images), and not merely a machine-
readable or machine-searchable text.
(2) The Workshop was held at the Library of Congress on 9-10 June
1992, with funding from the David and Lucile Packard Foundation.
The document that follows represents a summary of the presentations
made at the Workshop and was compiled by James DALY. This
introduction was written by DALY and Carl FLEISCHHAUER.
PRESERVATION AND IMAGING
Preservation, as that term is used by archivists,(3) was most explicitly
discussed in the context of imaging. Anne KENNEY and Lynne PERSONIUS
explained how the concept of a faithful copy and the user-friendliness of
the traditional book have guided their project at Cornell University.(4)
Although interested in computerized dissemination, participants in the
Cornell project are creating digital image sets of older books in the
public domain as a source for a fresh paper facsimile or, in a future
phase, microfilm. The books returned to the library shelves are
high-quality and useful replacements on acid-free paper that should last
a long time. To date, the Cornell project has placed little or no
emphasis on creating searchable texts; one would not be surprised to find
that the project participants view such texts as new editions, and thus
not as faithful reproductions.
In her talk on preservation, Patricia BATTIN struck an ecumenical and
flexible note as she endorsed the creation and dissemination of a variety
of types of digital copies. Do not be too narrow in defining what counts
as a preservation element, BATTIN counseled; for the present, at least,
digital copies made with preservation in mind cannot be as narrowly
standardized as, say, microfilm copies with the same objective. Setting
standards precipitously can inhibit creativity, but delay can result in
chaos, she advised.
In part, BATTIN's position reflected the unsettled nature of image-format
standards, and attendees could hear echoes of this unsettledness in the
comments of various speakers. For example, Jean BARONAS reviewed the
status of several formal standards moving through committees of experts;
and Clifford LYNCH encouraged the use of a new guideline for transmitting
document images on Internet. Testimony from participants in the National
Agricultural Library's (NAL) Text Digitization Program and LC's American
Memory project highlighted some of the challenges to the actual creation
or interchange of images, including difficulties in converting
preservation microfilm to digital form. Donald WATERS reported on the
progress of a master plan for a project at Yale University to convert
books on microfilm to digital image sets, Project Open Book (POB).
The Workshop offered rather less of an imaging practicum than planned,
but "how-to" hints emerge at various points, for example, throughout
KENNEY's presentation and in the discussion of arcana such as
thresholding and dithering offered by George THOMA and FLEISCHHAUER.
NOTES:
(3) Although there is a sense in which any reproductions of
historical materials preserve the human record, specialists in the
field have developed particular guidelines for the creation of
acceptable preservation copies.
(4) Titles and affiliations of presenters are given at the
beginning of their respective talks and in the Directory of
Participants (Appendix III).
THE MACHINE-READABLE TEXT: MARKUP AND USE
The sections of the Workshop that dealt with machine-readable text tended
to be more concerned with access and use than with preservation, at least
in the narrow technical sense. Michael SPERBERG-McQUEEN made a forceful
presentation on the Text Encoding Initiative's (TEI) implementation of
the Standard Generalized Markup Language (SGML). His ideas were echoed
by Susan HOCKEY, Elli MYLONAS, and Stuart WEIBEL. While the
presentations made by the TEI advocates contained no practicum, their
discussion focused on the value of the finished product, what the
European Community calls reusability, but what may also be termed
durability. They argued that marking up--that is, coding--a text in a
well-conceived way will permit it to be moved from one computer
environment to another, as well as to be used by various users. Two
kinds of markup were distinguished: 1) procedural markup, which
describes the features of a text (e.g., dots on a page), and 2)
descriptive markup, which describes the structure or elements of a
document (e.g., chapters, paragraphs, and front matter).
The TEI proponents emphasized the importance of texts to scholarship.
They explained how heavily coded (and thus analyzed and annotated) texts
can underlie research, play a role in scholarly communication, and
facilitate classroom teaching. SPERBERG-McQUEEN reminded listeners that
a written or printed item (e.g., a particular edition of a book) is
merely a representation of the abstraction we call a text. To concern
ourselves with faithfully reproducing a printed instance of the text,
SPERBERG-McQUEEN argued, is to concern ourselves with the representation
of a representation ("images as simulacra for the text"). The TEI proponents'
interest in images tends to focus on corollary materials for use in teaching,
for example, photographs of the Acropolis to accompany a Greek text.
By the end of the Workshop, SPERBERG-McQUEEN confessed to having been
converted to a limited extent to the view that electronic images
constitute a promising alternative to microfilming; indeed, an
alternative probably superior to microfilming. But he was not convinced
that electronic images constitute a serious attempt to represent text in
electronic form. HOCKEY and MYLONAS also conceded that their experience
at the Pierce Symposium the previous week at Georgetown University and
the present conference at the Library of Congress had compelled them to
reevaluate their perspective on the usefulness of text as images.
Attendees could see that the text and image advocates were in
constructive tension, so to say.
Three nonTEI presentations described approaches to preparing
machine-readable text that are less rigorous and thus less expensive. In
the case of the Papers of George Washington, Dorothy TWOHIG explained
that the digital version will provide a not-quite-perfect rendering of
the transcribed text--some 135,000 documents, available for research
during the decades while the perfect or print version is completed.
Members of the American Memory team and the staff of NAL's Text
Digitization Program (see below) also outlined a middle ground concerning
searchable texts. In the case of American Memory, contractors produce
texts with about 99-percent accuracy that serve as "browse" or
"reference" versions of written or printed originals. End users who need
faithful copies or perfect renditions must refer to accompanying sets of
digital facsimile images or consult copies of the originals in a nearby
library or archive. American Memory staff argued that the high cost of
producing 100-percent accurate copies would prevent LC from offering
access to large parts of its collections.
THE MACHINE-READABLE TEXT: METHODS OF CONVERSION
Although the Workshop did not include a systematic examination of the
methods for converting texts from paper (or from facsimile images) into
machine-readable form, nevertheless, various speakers touched upon this
matter. For example, WEIBEL reported that OCLC has experimented with a
merging of multiple optical character recognition systems that will
reduce errors from an unacceptable rate of 5 characters out of every
l,000 to an unacceptable rate of 2 characters out of every l,000.
Pamela ANDRE presented an overview of NAL's Text Digitization Program and
Judith ZIDAR discussed the technical details. ZIDAR explained how NAL
purchased hardware and software capable of performing optical character
recognition (OCR) and text conversion and used its own staff to convert
texts. The process, ZIDAR said, required extensive editing and project
staff found themselves considering alternatives, including rekeying
and/or creating abstracts or summaries of texts. NAL reckoned costs at
$7 per page. By way of contrast, Ricky ERWAY explained that American
Memory had decided from the start to contract out conversion to external
service bureaus. The criteria used to select these contractors were cost
and quality of results, as opposed to methods of conversion. ERWAY noted
that historical documents or books often do not lend themselves to OCR.
Bound materials represent a special problem. In her experience, quality
control--inspecting incoming materials, counting errors in samples--posed
the most time-consuming aspect of contracting out conversion. ERWAY
reckoned American Memory's costs at $4 per page, but cautioned that fewer
cost-elements had been included than in NAL's figure.
OPTIONS FOR DISSEMINATION
The topic of dissemination proper emerged at various points during the
Workshop. At the session devoted to national and international computer
networks, LYNCH, Howard BESSER, Ronald LARSEN, and Edwin BROWNRIGG
highlighted the virtues of Internet today and of the network that will
evolve from Internet. Listeners could discern in these narratives a
vision of an information democracy in which millions of citizens freely
find and use what they need. LYNCH noted that a lack of standards
inhibits disseminating multimedia on the network, a topic also discussed
by BESSER. LARSEN addressed the issues of network scalability and
modularity and commented upon the difficulty of anticipating the effects
of growth in orders of magnitude. BROWNRIGG talked about the ability of
packet radio to provide certain links in a network without the need for
wiring. However, the presenters also called attention to the
shortcomings and incongruities of present-day computer networks. For
example: 1) Network use is growing dramatically, but much network
traffic consists of personal communication (E-mail). 2) Large bodies of
information are available, but a user's ability to search across their
entirety is limited. 3) There are significant resources for science and
technology, but few network sources provide content in the humanities.
4) Machine-readable texts are commonplace, but the capability of the
system to deal with images (let alone other media formats) lags behind.
A glimpse of a multimedia future for networks, however, was provided by
Maria LEBRON in her overview of the Online Journal of Current Clinical
Trials (OJCCT), and the process of scholarly publishing on-line.
The contrasting form of the CD-ROM disk was never systematically
analyzed, but attendees could glean an impression from several of the
show-and-tell presentations. The Perseus and American Memory examples
demonstrated recently published disks, while the descriptions of the
IBYCUS version of the Papers of George Washington and Chadwyck-Healey's
Patrologia Latina Database (PLD) told of disks to come. According to
Eric CALALUCA, PLD's principal focus has been on converting Jacques-Paul
Migne's definitive collection of Latin texts to machine-readable form.
Although everyone could share the network advocates' enthusiasm for an
on-line future, the possibility of rolling up one's sleeves for a session
with a CD-ROM containing both textual materials and a powerful retrieval
engine made the disk seem an appealing vessel indeed. The overall
discussion suggested that the transition from CD-ROM to on-line networked
access may prove far slower and more difficult than has been anticipated.
WHO ARE THE USERS AND WHAT DO THEY DO?
Although concerned with the technicalities of production, the Workshop
never lost sight of the purposes and uses of electronic versions of
textual materials. As noted above, those interested in imaging discussed
the problematical matter of digital preservation, while the TEI proponents
described how machine-readable texts can be used in research. This latter
topic received thorough treatment in the paper read by Avra MICHELSON.
She placed the phenomenon of electronic texts within the context of
broader trends in information technology and scholarly communication.
Among other things, MICHELSON described on-line conferences that
represent a vigorous and important intellectual forum for certain
disciplines. Internet now carries more than 700 conferences, with about
80 percent of these devoted to topics in the social sciences and the
humanities. Other scholars use on-line networks for "distance learning."
Meanwhile, there has been a tremendous growth in end-user computing;
professors today are less likely than their predecessors to ask the
campus computer center to process their data. Electronic texts are one
key to these sophisticated applications, MICHELSON reported, and more and
more scholars in the humanities now work in an on-line environment.
Toward the end of the Workshop, Michael LESK presented a corollary to
MICHELSON's talk, reporting the results of an experiment that compared
the work of one group of chemistry students using traditional printed
texts and two groups using electronic sources. The experiment
demonstrated that in the event one does not know what to read, one needs
the electronic systems; the electronic systems hold no advantage at the
moment if one knows what to read, but neither do they impose a penalty.
DALY provided an anecdotal account of the revolutionizing impact of the
new technology on his previous methods of research in the field of classics.
His account, by extrapolation, served to illustrate in part the arguments
made by MICHELSON concerning the positive effects of the sudden and radical
transformation being wrought in the ways scholars work.
Susan VECCIA and Joanne FREEMAN delineated the use of electronic
materials outside the university. The most interesting aspect of their
use, FREEMAN said, could be seen as a paradox: teachers in elementary
and secondary schools requested access to primary source materials but,
at the same time, found that "primariness" itself made these materials
difficult for their students to use.
OTHER TOPICS
Marybeth PETERS reviewed copyright law in the United States and offered
advice during a lively discussion of this subject. But uncertainty
remains concerning the price of copyright in a digital medium, because a
solution remains to be worked out concerning management and synthesis of
copyrighted and out-of-copyright pieces of a database.
As moderator of the final session of the Workshop, Prosser GIFFORD directed
discussion to future courses of action and the potential role of LC in
advancing them. Among the recommendations that emerged were the following:
* Workshop participants should 1) begin to think about working
with image material, but structure and digitize it in such a
way that at a later stage it can be interpreted into text, and
2) find a common way to build text and images together so that
they can be used jointly at some stage in the future, with
appropriate network support, because that is how users will want
to access these materials. The Library might encourage attempts
to bring together people who are working on texts and images.
* A network version of American Memory should be developed or
consideration should be given to making the data in it
available to people interested in doing network multimedia.
Given the current dearth of digital data that is appealing and
unencumbered by extremely complex rights problems, developing a
network version of American Memory could do much to help make
network multimedia a reality.
* Concerning the thorny issue of electronic deposit, LC should
initiate a catalytic process in terms of distributed
responsibility, that is, bring together the distributed
organizations and set up a study group to look at all the
issues related to electronic deposit and see where we as a
nation should move. For example, LC might attempt to persuade
one major library in each state to deal with its state
equivalent publisher, which might produce a cooperative project
that would be equitably distributed around the country, and one
in which LC would be dealing with a minimal number of publishers
and minimal copyright problems. LC must also deal with the
concept of on-line publishing, determining, among other things,
how serials such as OJCCT might be deposited for copyright.
* Since a number of projects are planning to carry out
preservation by creating digital images that will end up in
on-line or near-line storage at some institution, LC might play
a helpful role, at least in the near term, by accelerating how
to catalog that information into the Research Library Information
Network (RLIN) and then into OCLC, so that it would be accessible.
This would reduce the possibility of multiple institutions digitizing
the same work.
CONCLUSION
The Workshop was valuable because it brought together partisans from
various groups and provided an occasion to compare goals and methods.
The more committed partisans frequently communicate with others in their
groups, but less often across group boundaries. The Workshop was also
valuable to attendees--including those involved with American Memory--who
came less committed to particular approaches or concepts. These
attendees learned a great deal, and plan to select and employ elements of
imaging, text-coding, and networked distribution that suit their
respective projects and purposes.
Still, reality rears its ugly head: no breakthrough has been achieved.
On the imaging side, one confronts a proliferation of competing
data-interchange standards and a lack of consensus on the role of digital
facsimiles in preservation. In the realm of machine-readable texts, one
encounters a reasonably mature standard but methodological difficulties
and high costs. These latter problems, of course, represent a special
impediment to the desire, as it is sometimes expressed in the popular
press, "to put the [contents of the] Library of Congress on line." In
the words of one participant, there was "no solution to the economic
problems--the projects that are out there are surviving, but it is going
to be a lot of work to transform the information industry, and so far the
investment to do that is not forthcoming" (LESK, per litteras).
*** *** *** ****** *** *** ***
PROCEEDINGS
WELCOME
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GIFFORD * Origin of Workshop in current Librarian's desire to make LC's
collections more widely available * Desiderata arising from the prospect
of greater interconnectedness *
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
After welcoming participants on behalf of the Library of Congress,
American Memory (AM), and the National Demonstration Lab, Prosser
GIFFORD, director for scholarly programs, Library of Congress, located
the origin of the Workshop on Electronic Texts in a conversation he had
had considerably more than a year ago with Carl FLEISCHHAUER concerning
some of the issues faced by AM. On the assumption that numerous other
people were asking the same questions, the decision was made to bring
together as many of these people as possible to ask the same questions
together. In a deeper sense, GIFFORD said, the origin of the Workshop
lay in the desire of the current Librarian of Congress, James H.
Billington, to make the collections of the Library, especially those
offering unique or unusual testimony on aspects of the American
experience, available to a much wider circle of users than those few
people who can come to Washington to use them. This meant that the
emphasis of AM, from the outset, has been on archival collections of the
basic material, and on making these collections themselves available,
rather than selected or heavily edited products.
From AM's emphasis followed the questions with which the Workshop began:
who will use these materials, and in what form will they wish to use
them. But an even larger issue deserving mention, in GIFFORD's view, was
the phenomenal growth in Internet connectivity. He expressed the hope
that the prospect of greater interconnectedness than ever before would
lead to: 1) much more cooperative and mutually supportive endeavors; 2)
development of systems of shared and distributed responsibilities to
avoid duplication and to ensure accuracy and preservation of unique
materials; and 3) agreement on the necessary standards and development of
the appropriate directories and indices to make navigation
straightforward among the varied resources that are, and increasingly
will be, available. In this connection, GIFFORD requested that
participants reflect from the outset upon the sorts of outcomes they
thought the Workshop might have. Did those present constitute a group
with sufficient common interests to propose a next step or next steps,
and if so, what might those be? They would return to these questions the
following afternoon.
******
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FLEISCHHAUER * Core of Workshop concerns preparation and production of
materials * Special challenge in conversion of textual materials *
Quality versus quantity * Do the several groups represented share common
interests? *
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress,
emphasized that he would attempt to represent the people who perform some
of the work of converting or preparing materials and that the core of
the Workshop had to do with preparation and production. FLEISCHHAUER
then drew a distinction between the long term, when many things would be
available and connected in the ways that GIFFORD described, and the short
term, in which AM not only has wrestled with the issue of what is the
best course to pursue but also has faced a variety of technical
challenges.
FLEISCHHAUER remarked AM's endeavors to deal with a wide range of library
formats, such as motion picture collections, sound-recording collections,
and pictorial collections of various sorts, especially collections of
photographs. In the course of these efforts, AM kept coming back to
textual materials--manuscripts or rare printed matter, bound materials,
etc. Text posed the greatest conversion challenge of all. Thus, the
genesis of the Workshop, which reflects the problems faced by AM. These
problems include physical problems. For example, those in the library
and archive business deal with collections made up of fragile and rare
manuscript items, bound materials, especially the notoriously brittle
bound materials of the late nineteenth century. These are precious
cultural artifacts, however, as well as interesting sources of
information, and LC desires to retain and conserve them. AM needs to
handle things without damaging them. Guillotining a book to run it
through a sheet feeder must be avoided at all costs.
Beyond physical problems, issues pertaining to quality arose. For
example, the desire to provide users with a searchable text is affected
by the question of acceptable level of accuracy. One hundred percent
accuracy is tremendously expensive. On the other hand, the output of
optical character recognition (OCR) can be tremendously inaccurate.
Although AM has attempted to find a middle ground, uncertainty persists
as to whether or not it has discovered the right solution.
Questions of quality arose concerning images as well. FLEISCHHAUER
contrasted the extremely high level of quality of the digital images in
the Cornell Xerox Project with AM's efforts to provide a browse-quality
or access-quality image, as opposed to an archival or preservation image.
FLEISCHHAUER therefore welcomed the opportunity to compare notes.
FLEISCHHAUER observed in passing that conversations he had had about
networks have begun to signal that for various forms of media a
determination may be made that there is a browse-quality item, or a
distribution-and-access-quality item that may coexist in some systems
with a higher quality archival item that would be inconvenient to send
through the network because of its size. FLEISCHHAUER referred, of
course, to images more than to searchable text.
As AM considered those questions, several conceptual issues arose: ought
AM occasionally to reproduce materials entirely through an image set, at
other times, entirely through a text set, and in some cases, a mix?
There probably would be times when the historical authenticity of an
artifact would require that its image be used. An image might be
desirable as a recourse for users if one could not provide 100-percent
accurate text. Again, AM wondered, as a practical matter, if a
distinction could be drawn between rare printed matter that might exist
in multiple collections--that is, in ten or fifteen libraries. In such
cases, the need for perfect reproduction would be less than for unique
items. Implicit in his remarks, FLEISCHHAUER conceded, was the admission
that AM has been tilting strongly towards quantity and drawing back a
little from perfect quality. That is, it seemed to AM that society would
be better served if more things were distributed by LC--even if they were
not quite perfect--than if fewer things, perfectly represented, were
distributed. This was stated as a proposition to be tested, with
responses to be gathered from users.
In thinking about issues related to reproduction of materials and seeing
other people engaged in parallel activities, AM deemed it useful to
convene a conference. Hence, the Workshop. FLEISCHHAUER thereupon
surveyed the several groups represented: 1) the world of images (image
users and image makers); 2) the world of text and scholarship and, within
this group, those concerned with language--FLEISCHHAUER confessed to finding
delightful irony in the fact that some of the most advanced thinkers on
computerized texts are those dealing with ancient Greek and Roman materials;
3) the network world; and 4) the general world of library science, which
includes people interested in preservation and cataloging.
FLEISCHHAUER concluded his remarks with special thanks to the David and
Lucile Packard Foundation for its support of the meeting, the American
Memory group, the Office for Scholarly Programs, the National
Demonstration Lab, and the Office of Special Events. He expressed the
hope that David Woodley Packard might be able to attend, noting that
Packard's work and the work of the foundation had sponsored a number of
projects in the text area.
******
SESSION I. CONTENT IN A NEW FORM: WHO WILL USE IT AND WHAT WILL THEY DO?
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DALY * Acknowledgements * A new Latin authors disk * Effects of the new
technology on previous methods of research *
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Serving as moderator, James DALY acknowledged the generosity of all the
presenters for giving of their time, counsel, and patience in planning
the Workshop, as well as of members of the American Memory project and
other Library of Congress staff, and the David and Lucile Packard
Foundation and its executive director, Colburn S. Wilbur.
DALY then recounted his visit in March to the Center for Electronic Texts
in the Humanities (CETH) and the Department of Classics at Rutgers
University, where an old friend, Lowell Edmunds, introduced him to the
department's IBYCUS scholarly personal computer, and, in particular, the
new Latin CD-ROM, containing, among other things, almost all classical
Latin literary texts through A.D. 200. Packard Humanities Institute
(PHI), Los Altos, California, released this disk late in 1991, with a
nominal triennial licensing fee.
Playing with the disk for an hour or so at Rutgers brought home to DALY
at once the revolutionizing impact of the new technology on his previous
methods of research. Had this disk been available two or three years
earlier, DALY contended, when he was engaged in preparing a commentary on
Book 10 of Virgil's Aeneid for Cambridge University Press, he would not
have required a forty-eight-square-foot table on which to spread the
numerous, most frequently consulted items, including some ten or twelve
concordances to key Latin authors, an almost equal number of lexica to
authors who lacked concordances, and where either lexica or concordances
were lacking, numerous editions of authors antedating and postdating Virgil.
Nor, when checking each of the average six to seven words contained in
the Virgilian hexameter for its usage elsewhere in Virgil's works or
other Latin authors, would DALY have had to maintain the laborious
mechanical process of flipping through these concordances, lexica, and
editions each time. Nor would he have had to frequent as often the
Milton S. Eisenhower Library at the Johns Hopkins University to consult
the Thesaurus Linguae Latinae. Instead of devoting countless hours, or
the bulk of his research time, to gathering data concerning Virgil's use
of words, DALY--now freed by PHI's Latin authors disk from the
tyrannical, yet in some ways paradoxically happy scholarly drudgery--
would have been able to devote that same bulk of time to analyzing and
interpreting Virgilian verbal usage.
Citing Theodore Brunner, Gregory Crane, Elli MYLONAS, and Avra MICHELSON,
DALY argued that this reversal in his style of work, made possible by the
new technology, would perhaps have resulted in better, more productive
research. Indeed, even in the course of his browsing the Latin authors
disk at Rutgers, its powerful search, retrieval, and highlighting
capabilities suggested to him several new avenues of research into
Virgil's use of sound effects. This anecdotal account, DALY maintained,
may serve to illustrate in part the sudden and radical transformation
being wrought in the ways scholars work.
******
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
MICHELSON * Elements related to scholarship and technology * Electronic
texts within the context of broader trends within information technology
and scholarly communication * Evaluation of the prospects for the use of
electronic texts * Relationship of electronic texts to processes of
scholarly communication in humanities research * New exchange formats
created by scholars * Projects initiated to increase scholarly access to
converted text * Trend toward making electronic resources available
through research and education networks * Changes taking place in
scholarly communication among humanities scholars * Network-mediated
scholarship transforming traditional scholarly practices * Key
information technology trends affecting the conduct of scholarly
communication over the next decade * The trend toward end-user computing
* The trend toward greater connectivity * Effects of these trends * Key
transformations taking place * Summary of principal arguments *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Avra MICHELSON, Archival Research and Evaluation Staff, National Archives
and Records Administration (NARA), argued that establishing who will use
electronic texts and what they will use them for involves a consideration
of both information technology and scholarship trends. This
consideration includes several elements related to scholarship and
technology: 1) the key trends in information technology that are most
relevant to scholarship; 2) the key trends in the use of currently
available technology by scholars in the nonscientific community; and 3)
the relationship between these two very distinct but interrelated trends.
The investment in understanding this relationship being made by
information providers, technologists, and public policy developers, as
well as by scholars themselves, seems to be pervasive and growing,
MICHELSON contended. She drew on collaborative work with Jeff Rothenberg
on the scholarly use of technology.
MICHELSON sought to place the phenomenon of electronic texts within the
context of broader trends within information technology and scholarly
communication. She argued that electronic texts are of most use to
researchers to the extent that the researchers' working context (i.e.,
their relevant bibliographic sources, collegial feedback, analytic tools,
notes, drafts, etc.), along with their field's primary and secondary
sources, also is accessible in electronic form and can be integrated in
ways that are unique to the on-line environment.
Evaluation of the prospects for the use of electronic texts includes two
elements: 1) an examination of the ways in which researchers currently
are using electronic texts along with other electronic resources, and 2)
an analysis of key information technology trends that are affecting the
long-term conduct of scholarly communication. MICHELSON limited her
discussion of the use of electronic texts to the practices of humanists
and noted that the scientific community was outside the panel's overview.
MICHELSON examined the nature of the current relationship of electronic
texts in particular, and electronic resources in general, to what she
maintained were, essentially, five processes of scholarly communication
in humanities research. Researchers 1) identify sources, 2) communicate
with their colleagues, 3) interpret and analyze data, 4) disseminate
their research findings, and 5) prepare curricula to instruct the next
generation of scholars and students. This examination would produce a
clearer understanding of the synergy among these five processes that
fuels the tendency of the use of electronic resources for one process to
stimulate its use for other processes of scholarly communication.
For the first process of scholarly communication, the identification of
sources, MICHELSON remarked the opportunity scholars now enjoy to
supplement traditional word-of-mouth searches for sources among their
colleagues with new forms of electronic searching. So, for example,
instead of having to visit the library, researchers are able to explore
descriptions of holdings in their offices. Furthermore, if their own
institutions' holdings prove insufficient, scholars can access more than
200 major American library catalogues over Internet, including the
universities of California, Michigan, Pennsylvania, and Wisconsin.
Direct access to the bibliographic databases offers intellectual
empowerment to scholars by presenting a comprehensive means of browsing
through libraries from their homes and offices at their convenience.
The second process of communication involves communication among
scholars. Beyond the most common methods of communication, scholars are
using E-mail and a variety of new electronic communications formats
derived from it for further academic interchange. E-mail exchanges are
growing at an astonishing rate, reportedly 15 percent a month. They
currently constitute approximately half the traffic on research and
education networks. Moreover, the global spread of E-mail has been so
rapid that it is now possible for American scholars to use it to
communicate with colleagues in close to 140 other countries.
Other new exchange formats created by scholars and operating on Internet
include more than 700 conferences, with about 80 percent of these devoted
to topics in the social sciences and humanities. The rate of growth of
these scholarly electronic conferences also is astonishing. From l990 to
l991, 200 new conferences were identified on Internet. From October 1991
to June 1992, an additional 150 conferences in the social sciences and
humanities were added to this directory of listings. Scholars have
established conferences in virtually every field, within every different
discipline. For example, there are currently close to 600 active social
science and humanities conferences on topics such as art and
architecture, ethnomusicology, folklore, Japanese culture, medical
education, and gifted and talented education. The appeal to scholars of
communicating through these conferences is that, unlike any other medium,
electronic conferences today provide a forum for global communication
with peers at the front end of the research process.
Interpretation and analysis of sources constitutes the third process of
scholarly communication that MICHELSON discussed in terms of texts and
textual resources. The methods used to analyze sources fall somewhere on
a continuum from quantitative analysis to qualitative analysis.
Typically, evidence is culled and evaluated using methods drawn from both
ends of this continuum. At one end, quantitative analysis involves the
use of mathematical processes such as a count of frequencies and
distributions of occurrences or, on a higher level, regression analysis.
At the other end of the continuum, qualitative analysis typically
involves nonmathematical processes oriented toward language
interpretation or the building of theory. Aspects of this work involve
the processing--either manual or computational--of large and sometimes
massive amounts of textual sources, although the use of nontextual
sources as evidence, such as photographs, sound recordings, film footage,
and artifacts, is significant as well.
Scholars have discovered that many of the methods of interpretation and
analysis that are related to both quantitative and qualitative methods
are processes that can be performed by computers. For example, computers
can count. They can count brush strokes used in a Rembrandt painting or
perform regression analysis for understanding cause and effect. By means
of advanced technologies, computers can recognize patterns, analyze text,
and model concepts. Furthermore, computers can complete these processes
faster with more sources and with greater precision than scholars who
must rely on manual interpretation of data. But if scholars are to use
computers for these processes, source materials must be in a form
amenable to computer-assisted analysis. For this reason many scholars,
once they have identified the sources that are key to their research, are
converting them to machine-readable form. Thus, a representative example
of the numerous textual conversion projects organized by scholars around
the world in recent years to support computational text analysis is the
TLG, the Thesaurus Linguae Graecae. This project is devoted to
converting the extant ancient texts of classical Greece. (Editor's note:
according to the TLG Newsletter of May l992, TLG was in use in thirty-two
different countries. This figure updates MICHELSON's previous count by one.)
The scholars performing these conversions have been asked to recognize
that the electronic sources they are converting for one use possess value
for other research purposes as well. As a result, during the past few
years, humanities scholars have initiated a number of projects to
increase scholarly access to converted text. So, for example, the Text
Encoding Initiative (TEI), about which more is said later in the program,
was established as an effort by scholars to determine standard elements
and methods for encoding machine-readable text for electronic exchange.
In a second effort to facilitate the sharing of converted text, scholars
have created a new institution, the Center for Electronic Texts in the
Humanities (CETH). The center estimates that there are 8,000 series of
source texts in the humanities that have been converted to
machine-readable form worldwide. CETH is undertaking an international
search for converted text in the humanities, compiling it into an
electronic library, and preparing bibliographic descriptions of the
sources for the Research Libraries Information Network's (RLIN)
machine-readable data file. The library profession has begun to initiate
large conversion projects as well, such as American Memory.
While scholars have been making converted text available to one another,
typically on disk or on CD-ROM, the clear trend is toward making these
resources available through research and education networks. Thus, the
American and French Research on the Treasury of the French Language
(ARTFL) and the Dante Project are already available on Internet.
MICHELSON summarized this section on interpretation and analysis by
noting that: 1) increasing numbers of humanities scholars in the library
community are recognizing the importance to the advancement of
scholarship of retrospective conversion of source materials in the arts
and humanities; and 2) there is a growing realization that making the
sources available on research and education networks maximizes their
usefulness for the analysis performed by humanities scholars.
The fourth process of scholarly communication is dissemination of
research findings, that is, publication. Scholars are using existing
research and education networks to engineer a new type of publication:
scholarly-controlled journals that are electronically produced and
disseminated. Although such journals are still emerging as a
communication format, their number has grown, from approximately twelve
to thirty-six during the past year (July 1991 to June 1992). Most of
these electronic scholarly journals are devoted to topics in the
humanities. As with network conferences, scholarly enthusiasm for these
electronic journals stems from the medium's unique ability to advance
scholarship in a way that no other medium can do by supporting global
feedback and interchange, practically in real time, early in the research
process. Beyond scholarly journals, MICHELSON remarked the delivery of
commercial full-text products, such as articles in professional journals,
newsletters, magazines, wire services, and reference sources. These are
being delivered via on-line local library catalogues, especially through
CD-ROMs. Furthermore, according to MICHELSON, there is general optimism
that the copyright and fees issues impeding the delivery of full text on
existing research and education networks soon will be resolved.
The final process of scholarly communication is curriculum development
and instruction, and this involves the use of computer information
technologies in two areas. The first is the development of
computer-oriented instructional tools, which includes simulations,
multimedia applications, and computer tools that are used to assist in
the analysis of sources in the classroom, etc. The Perseus Project, a
database that provides a multimedia curriculum on classical Greek
civilization, is a good example of the way in which entire curricula are
being recast using information technologies. It is anticipated that the
current difficulty in exchanging electronically computer-based
instructional software, which in turn makes it difficult for one scholar
to build upon the work of others, will be resolved before too long.
Stand-alone curricular applications that involve electronic text will be
sharable through networks, reinforcing their significance as intellectual
products as well as instructional tools.
The second aspect of electronic learning involves the use of research and
education networks for distance education programs. Such programs
interactively link teachers with students in geographically scattered
locations and rely on the availability of electronic instructional
resources. Distance education programs are gaining wide appeal among
state departments of education because of their demonstrated capacity to
bring advanced specialized course work and an array of experts to many
classrooms. A recent report found that at least 32 states operated at
lea