AAS 195th Meeting, January 2000
Session 82. Data Handling
Display, Friday, January 14, 2000, 9:20am-6:30pm, Grand Hall

[Previous] | [Session 82] | [Next]

[82.09] Looking at 3,000,000 References Without Growing Grey Hair

M. Demleitner, A. Accomazzi, G. Eichhorn, C.S. Grant, M.J. Kurtz, S.S. Murray (Harvard-Smithsonian CfA)

The article service of the Astrophysics Data System (ADS, http://adswww.harvard.edu) currently holds about 500,000 pages scanned from astronomical journals and conference proceedings. This data set not only facilitates an easy and convenient access to the majority of the astronomical literature from anywhere on the Internet but also allows highly automatized extraction of the information contained in the articles.

As first steps towards processing and indexing the full texts of the articles, the ADS has been extracting abstracts and references from the bitmap images of the articles since May 1999. In this poster we describe the procedures and strategies to (a) automatically identify the regions within a paper containing the abstract or the references, (b) spot and correct errors in the data base or the identification of the regions, (c) resolve references obtained by optical character recognition (OCR) with its inherent uncertainties to parsed references (i.e., bibcodes) and (d) incorporate the data collected in this way into the ADS abstract service. We also give an overview of the extent of additional bibliographical material from this source. We estimate that by January 2000, these procedures will have yielded about 14,000 abstracts and 1,000,000 citation pairs (out of a total of 3,000,000 references) not previously present in the ADS.

If you would like more information about this abstract, please follow the link to http://adswww.harvard.edu. This link was provided by the author. When you follow it, you will leave the Web site for this meeting; to return, you should use the Back comand on your browser.

The author(s) of this abstract have provided an email address for comments about the abstract: mdemleitner@head-cfa.harvard.edu

[Previous] | [Session 82] | [Next]