Data Publishing: The AAS Journals' Perspective
The 23rd Astronomical Data Analysis Software and Systems (ADASS) meeting took place in Waikoloa, Hawaii, from 30 September through 3 October 2013. At ADASS conferences astronomers and engineers discuss "the acquisition, reduction, analysis, and dissemination of astronomical data." As the AAS journals' data editor I can speak toward the dissemination portion of that mandate. Here I'll summarize the talk I gave at the ADASS meeting regarding the AAS journals' current and future data-publication plans; see also my PowerPoint presentation (3.5-megabyte PDF).
Before I discuss the future of data publishing in the AAS journals — the Astronomical Journal (AJ) and the Astrophysical Journal (Apj), Letters (ApJL), and Supplement (ApJS) — it is important to set the scene by highlighting the journals' evolution over the last 15 years. Our goal is to provide the best journal services to authors, readers, and libraries. For authors the time to publish a journal article has been steadily reduced by about 2 months over the last decade. Similar declines have also been replicated in the costs to publish and subscribe. Figure 1 shows that over the last 15 years the cost of publishing in the AAS journals has declined by a factor of 2 when adjusted for inflation.
Figure 1 has been normalized to account for the fact that since 2011 we no longer charge on the basis of pages but rather digital quanta, e.g., figures, tables, 350-"word" groups, and online-only items. Likewise, the subscription cost has also dropped by a factor of 3 (see Figure 2) when scaled for both inflation and the substantial increase in the number of papers published.
Since the mid-90s, our content has been delivered via multiple methods, but the electronic edition, which is the version of record, contains numerous types of online only content that isn't possible to deliver in print or PDF. The types and amount of online-only content have been growing over the last 15 years. Initially we started with online-only tables in 2000. We convert the author's original table into formatted ASCII with an extensive metadata header that follows the same rules and standards as CDS's VizieR tables. This is our machine readable (MR) format, and there are now over 8,000 tables in our archive. In addition to online-only tables, we also started accepting FITS tabular and image files, MPEG animations, and source codes. In 2004 we developed a method to handle extensive online-only figures. These are called figure sets, and they can scale to very large numbers of online-only figures. One ApJ paper's figure set has over 1,500 components! Our newest project, data behind figures (DbF), was begun in 2010. From the author's original data set, a MR table or FITS file of the underlying data is created and then posted along with the figure. With these DbF files readers no longer have to use crude methods to guestimate figure values. Papers with DbF components are still rare, but as in the early days of MR tables, the popularity of DbFs is growing as more authors and readers become aware of them.
These capabilities meet the majority of our current readers' and authors' data needs, but we felt that we were only scratching the surface of the available data and that much more could be done. In order to determine the level of interest in the author and reader community toward data sharing, the AAS conducted a survey in early 2013. Corresponding authors of AAS journal papers from the previous 2 years were asked their opinions about providing and using data. The results were very encouraging. The majority of participants, 62%, provided some of the data via various means in their papers. About 45% of the provided data was with the journal article. Sixty percent of the authors also used data from other published papers. These results are backed up by our publisher's web logs, which show that online-only data are downloaded in significant numbers.
Our next step was to determine how much data could easily be captured from the figures we already publish. The results of this study were presented at the 221st AAS meeting in Long Beach, California, in January 2013. We found that there was significant "low-hanging fruit" in the form of simple X vs. Y plots whose data could be easily represented as a DbF. To increase awareness of DbFs the AJ immediately began a project for 2013 to review the figures of all submitted papers. Papers with good candidate figures were flagged, and the corresponding author was asked to supply the data before acceptance. The compliance rate has been high, 56%, and further supports the results of our data-sharing survey.
Clearly these results show that authors want to make data available and that readers want to use it in their own research, so where do we go from here? The next step is a workshop after the 2014 winter AAS meeting to discuss metadata semantics, digital structures and formats, and sensible practices for data peer review with select members of the astronomical community. Once these semantic and format issues are resolved we will begin upgrading the different types of future datasets and also how they are delivered. The data will be easily discoverable thanks to the implementation of Virtual Observatory standards and thus integrable within existing astronomical databases. These enhanced datasets will be given digital object identifiers (DOIs) and moved outside our 1-year subscription paywall, which will give individual data products their own independent and citable "identity" and also make them immediately available. Once these new data formats are established our goal is to apply the same enhancements to the data in our archive.
We are excited to be working on developing these new data formats and encourage authors to continue to supply the data behind their papers. Email me if you have questions about including different forms of data in future submissions or if you have other ideas or comments about data publication in the AAS journals.
Note: This article originally appeared on AstroBetter.com.
AAS Journals Data Editor