29 January 2018

Astronomy Software Publishing Special Session at AAS 231

Alice Allen Astrophysics Source Code Library

This post is adapted from one originally appearing on ASCL.net:

On Thursday, 11 January, the Astrophysics Source Code Library (ASCL) and Astronomical Data Group at the Flatiron Institute organized a Special Session at the 231st AAS meeting in National Harbor, MD, entitled "Astronomy Software Publishing: Community Roles and Services. This was the sixth in a series of software-focused sessions that the ASCL, sometimes with others, has organized at AAS meetings.

"Really glad to see software article publication and citation getting attention at the #AAS231 meeting. Great articles like Daniel Foreman-Mackey's "emcee: The MCMC Hammer in PASP is a perfect example of a highly-referenced software article."
Peter Teuben from the University of Maryland opened the session with a few words about the use of software in research articles. He outlined the layout of the session, then a talk by Matteo Cantiello set the scene on how we have reached the point where we are now. Four presentations by representatives from different journals presented their policies on software publication followed Cantiello's talk, and they were followed by presentations by representatives of others with roles in publishing software: the software author, the data editor, the ADS, and the ASCL. The floor was then opened for discussion and Q&A. Teuben moderated the discussion, and at the end of it turned the podium over to Robert Nemiroff from Michigan Technological University for a summary and closing remarks.

Presentations
Some of the main points from each presentation are summarized below; the titles of each are links to the slides used by the presenters.

    • The Evolution of Software Publication in Astronomy, Matteo Cantiello (Flatiron Institute)
      Cantiello states that the complexity of astrophysics requires computationally intensive models, making astronomy a digital science, and that astronomers have a rich computational environment available, allowing them to easily version, share, and deploy astronomy software. Reproducibility paradox
      Despite this, software is often not shared, resulting in a reproducibility paradox: astronomers use computation to provide precise, accurate results, but research has become less transparent with the increase in the use of computational methods. He stated that astronomy has an opportunity to rethink scientific papers as research repositories, with executable objects containing narrative, figures, data, and code.
    • Software Papers and Citation in the AAS Journals, Chris Lintott (AAS Journals)
      The AAS journals policy on software until recently was set in 1964, which stated that the “need for communication between astronomers interested in computation is already supplied by associations of users of automated computing machines.” The AAS journals changed their policies at the beginning of 2016, and recognized that if novel code is important to published research then it is likely appropriate to describe it in such a paper.
      Papers can be short, descriptive, and need not include research results
      In addition, they request people use the \software{} tag to create a software section in a paper; this is similar to the \facilities{} tag already in use. AAS Publishing is introducing the concept of ‘living’ papers, which can be updated with new sections and expanded author lists, so software authors don't need to publish a new paper to give credit to software authors who have contributed to a new version of the software. Lintott encouraged those interested in living papers to contact him.
    • Software Policies and Guidelines at Nature, Leslie J. Sage (Nature)
      First, Sage explained the context in which Nature's policy is created: Nature is driven by biologists, who live in a very different world from astronomers. Unlike astronomers, biologists live in a Windows world. Right now, two journals, Nature Methods and Nature Biotech, require code to be made available, and there are ongoing discussions about whether Nature should do this for other journals. There are formidable problems because of the issue of very specialized code. Nature will be putting out a call for public comment, and Sage hopes astronomers will provide input that is useful for astronomers within that context. Sage raised a number of points that warrant public discussion, such as a preference voiced by some to see detailed descriptions of the algorithms used rather than having the scripts published. Another point to consider for input is that some users may not be aware of the constraints and conditions that may drive software beyond its limits, which raises the question as to whether the results are physically meaningful.
    • SpringerNature Data and Software Policies for Astrophysics Journals, Ramon Khanna (Springer)
      Springer is encouraging authors to take care of transparency and reproducibility of their results presented in articles, allowing them to append relevant information on source code or the full code in an appendix of the paper; authors can also append the full code, or use other methods to provide this information, such as alternative repositories (e.g., CDS, ASCL, Figshare), and making this information available. They would like the full data and code available. Khanna acknowledged some challenges, including that authors are often not willing to share their software and/or data, editors are often not willing or at least not determined enough to execute policy, and citation standards are unclear.
      http://ascl.net/wordpress/wp-content/uploads/2018/01/Screen-Shot-2018-01-16-at-3.00.05-PM.png
      Khanna pointed out that in a field as advanced as astronomy is, which already has some standards and domain resources such as archives, it’s not so much the publisher that should drive new standards, but the community itself.
    • Journal of Open Source Software (JOSS): Design and First-Year Review, Arfon M. Smith (STScI/JOSS)
      Smith stated that he created JOSS from frustration about the overhead of publishing papers about software, and acknowledged that software papers are a hack of the current system to provide a citable, creditable research object for software. JOSS (http://joss.theoj.org) seeks to improve the quality of software; its peer review process is almost entirely about the software that’s submitted, and includes making sure the documentation is sufficiently fleshed out, that the package includes automated tests, and that the software has an open source license so can be reused. Smith said it should take about an hour to write a one-page paper for JOSS for those with a well-set-up repository for their code. The reviews are public on Github and accepted submissions appear on the JOSS site, which has published 200 papers online.
    • Lessons Learned Through the Development and Publication of AstroImageJ, Karen Collins (Center for Astrophysics)
      Collins discussed her experience with publishing her software AstroImageJ, a data reduction and image display interface with analysis capabilities specialized for time series differential photometry. She developed the code over several years to support her research with no intention of releasing the code to the public, but her collaborators saw her plots and graphs and asked to use the software, which was posted to the university's website to give team members access to it. AstroImageJ lessons learned
      When results using AstroImageJ started appearing in journals, she registered the software with the ASCL to give it a citable reference, and as usage (and support tasks) grew, she and others working on the code decided to submit a paper to the Astronomical Journal (AJ) to provide good exposure to the potential userbase for the software. Among the lessons learned in publishing AstroImageJ are to specify how your code is licensed and how it should be cited, make the source code easily accessible, and provide an easy way to install and update the software.
    • The Roles of the AAS Journals’ Data Editors, August Muench (AAS Journals)
      Muench covered the data editors' workflow for all submitted manuscripts. A quick review of 60-90% of all submitted manuscripts is performed, with scripts run on the manuscripts to identify references to code by looking for such things as Github repositories to see whether their citations need to be reviewed. The editors make notes on the software, data, and figures for review by a scientific editor or the author with recommendations for improving citations for these research artifacts. A subset of accepted articles, 15-20%, undergo a more rigorous post-acceptance data review; this includes a review of tabular data, figures, and interactive elements in addition to software. If necessary, the data editors request that authors acquire DOIs or get preferred citations for the software used in the research.
      "People recognize software via plots (and other fingerprints). Make sure you cite the code. I still recognize plots made with PAW."
      He stated that part of a data editor's role to improve software and data citation is educating authors.

    • The Role of the ADS in Software Discovery and Citation, Alberto Accomazzi (NASA Astrophysics Data System)
      Accomazzi shared ADS's traditional core responsibilities: to discover content, typically science papers, related to astronomy. Some years ago, the capability to track citations was introduced. As the expectations of the community have evolved, so have ADS's policies, moving from ingesting records about scientific papers to records about scholarly works, including data catalogs, observing proposals, and other artifacts such as software. They have also evolved from tracking citations to articles to tracking citations to scholarly content. How ADS awards citations
      Accomazzi covered how ADS ingestion works and also discussed how citations are tracked and what ADS needs to count a citation, going through several examples of what does and does not work for citation. The bottom line for software is to cite it by using a formal citation and a unique identifier; a URL to a website or a DOI in a footnote are not captured as citations. ASCL, JOSS, and Zenodo are ways software can get a persistent identifier to use in a formal citation, and these citations can be tracked by ADS. Accomazzi also discussed how software may have several records in ADS, and that in the future, these records will be crosslinked, as will different versions of a software package.

    • The Astrophysics Source Code Library: Supporting Software Publication and Citation, Alice Allen (ASCL/UMD)
      Allen gave a brief overview of what the ASCL is, and stated that though entries in this citable online registry usually point to a software package's download, the ASCL can and does serve as a repository and assigns a DOI to software that it stores. She covered the three main reasons the ASCL exists: to make research more transparent, to improve communication about research computations, and to disseminate software of utility to others. Allen said the ASCL focuses on software that has been used in refereed research or submitted for refereeing, this to support the research record. The ASCL supports software publication and citation in a number of ways, including providing a citation avenue for software and listing preferred citation information in ASCL entries. The ASCL supports the Force11 software citation principles and was a party to developing them and was also party to a Dagstuhl Manifesto, an effort that focused on steps members of a research community can take on their own. Among these steps is citing software properly — in a trackable way — and when reviewing a paper, ensuring that it cites the software used in the research.

Discussion
After the presentations, Teuben commented that he thought journals could do a better job in instructing referees about software, to identify when code is involved in research and insist on citations to it. He hoped the discussion would touch on this, and then opened the floor to all. Discussion was lively; some of the major points were:

  • There's still fear about releasing software, still resistance to doing so.
  • Science is all about reproducibility; it’s not science if it’s not reproducible.
  • Who should push for greater openness is an open question, with some wanting journals to do this, and others feeling it's up to the astronomy community — us! — to enforce the standards we want.
  • Astronomers are often not trained in software engineering techniques; greater education in this area would be helpful.

"If software developers were well funded, ie would be easier to get people to share their code."

Teuben brought the discussion to an end and turned the floor over to Robert Nemiroff (Michigan Technological University), who briefly summarized the presentations and discussion and closed the session.