On-line WAIS Search Capability Brings Astronomy To The Internet By A. Warnock, J. Gass, L. Brotzman, (Hughes STX), M. E. Van Steenberg (NASA/GSFC), D. Kovalsky (Hughes STX), and F. Giovane (NASA/HQ) As part of the STELAR pilot project, NASA's Astrophysics Data Facility (ADF) is making astronomical abstracts and job listings available to the astronomical community. To make these databases available for easy search and retrieval, the ADF is using a highly portable and fully open, multi-disciplinary document query and delivery system known as WAIS (Wide Area Information Server). WAIS is a client/server system originally developed by Thinking Machines Corp. and distributed by them free of charge. It is based on an ISO standard communications protocol (Z39.50 1988). WAIS servers have been ported to UNIX, VMS and, recently, MS-DOS. WAIS clients run on a wide variety of machines, from UNIX-based X-windows systems and character terminals, to MS-DOS and Macintosh microcomputers. The WAIS system includes full-text indexing and searching of documents, network interface and easy access to a variety of document viewers. The WAIS software, for both clients and servers, is available via anonymous FTP from the Internet site think.com. How WAIS Works WAIS uses a client/server model to communicate both locally and over wide area networks like the Internet. The WAIS system, as distributed by Thinking Machines Corp., consists of three software packages - the text indexer, the database server and the client program. The text indexer builds a master index of all words occurring in a database of documents. This index is then used by the retrieval software to find which documents contain the words in a query. The WAIS server software runs on the computer hosting the database of documents and handles the job of responding to queries. A query can either be a search of the document index or a request to retrieve a document to pass back to the client. The client is the user interface. The user formulates a free-format text query which the client translates into the appropriate protocol and then sends to the server. The server processes the query, and sends the results back to the client for display or local storage. The search engine supplied with the free distribution version is quite simple, but surprisingly fast and effective. It matches occurrences of words in the query with the individual words in the documents, and tallies a score for each document based on the number of "hits". The underlying assumption is that, if a document has many words in common with the query, the document is probably relevant. Documents are then returned in ranked order of relevance to the query. A simple extension of this search technique is the notion of "relevancy feedback". The user can select part (or all) of a retrieved document and use it as a query to get, in effect, "all documents like this one." This allows detailed searches without requiring the user to formulate a detailed query. The source code in the distribution system is quite modular, and allows for replacement of individual components. It is possible, therefore, to replace the current engine in the server with a more sophisticated one which might, for example, be capable of handling word stems and/or synonyms, or which might use advanced techniques such as factor spaces. The client uses a "source file" to identify and locate each WAIS database on the network. The source file is a simple ASCII text file which contains the name and description of the database and the network location of the server. More than one source file may be selected by the user, which allows searches to be posed to multiple sources at one time (though they are searched sequentially, not simultaneously). Source files may be obtained by any number of means. The ADF distributes its source files by anonymous FTP, as described below. Other source files, for specialized databases, for example, may be distributed individually via electronic mail or by postings to networks like Usenet. There is also a "white pages" facility by which new public sources may be located by querying the master registry of sources (a WAIS server called directory-of-servers), maintained by Thinking Machines Corp. The source file for this server comes with the distribution software, or may be retrieved by anonymous FTP from the Internet site think.com. Access To The ADF WAIS Databases To get access to WAIS you must first obtain the WAIS client software and get it running on your local machine. Clients for UNIX and the Apple Macintosh (called WAIS-station) are available by anonymous FTP at think.com (IP address 131.239.2.1). The VMS, MS-DOS and Microsoft Windows clients are available by anonymous FTP from wais.oit.unc.edu (IP address 128.109.157.30). The ADF currently offers three text databases to the astronomical community. The corresponding source files are available by anonymous FTP from hypatia.gsfc.nasa.gov (IP address 128.183.115.29), in the directory wais-sources. The source file for the STELAR journal abstracts is called "abstracts.src". The AAS Job Register source file is in "AAS_jobs.src", and the AAS electronic meeting abstracts source file is in "AAS_meeting.src". It is also possible to obtain these source files by automatic mail request. Send an E-mail message to listserv@hypatia.gsfc.nasa.gov. In the body of the mail message put the commands: get stelar abstracts.src get stelar AAS_jobs.src get stelar AAS_meeting.src The source files (not the contents of the databases) will be returned to you by electronic mail. Save these files (without any accompanying message headers) in plain ASCII files in your "wais-sources" directory where they will be accessible to your WAIS client software. For additional information or assistance about the astronomical WAIS server, please contact the authors at stelar-info@hypatia.gsfc.nasa.gov.