Creating the Galaxy
During this time at Penn State, James worked closely with me (Anton Nekrutenko) and others to start the Galaxy Project, a comprehensive web platform for open and reproducible computational data analysis. The first software commit to the project repository was on June 1, 2005, by James. Today, Galaxy needs no introduction to anyone working in genomics, but at the time was a major advance above the ad hoc command line bioinformatics analysis that dominated the field. The initial version introduced support for accessing remote data resources and visualizing the results [4]. The project grew to incorporate thousands of analysis tools into one unified graphical user interface, accessible to anyone via a web browser. The Galaxy Project remains a landmark achievement and has forever changed the way scientists analyze and share data.
After completing his Ph.D. in 2006, James worked for 2 years as a visiting member of the Courant Institute for Mathematical Sciences at New York University. During these 2 years, Galaxy became one of his principal projects as James coded several of its iconic features, including the three-pane interface, the “noodly” workflow editor, and the dynamic genome browser Trackster. In 2008, James started his laboratory at Emory University in the Department of Biology and the Department of Mathematics & Computer Science. He was promoted to Associate Professor with tenure in 2013, shortly before his move to JHU where he was promoted to Professor in 2018.
It was at Emory University, and later JHU, where the Galaxy platform exploded in popularity, driven by the growth of high-throughput sequencing data and large-scale cloud computing. This combination of technologies has proven to be transformative to the field, and the Galaxy Project has reached a wide audience of scientists evidenced by thousands of citations. Today, thousands of scientists around the world use Galaxy daily.
The Galaxy Project is not only a software platform but also a scientific community. James’ dedication to accessible, reproducible, and transparent research promoted a community of researchers that extended far beyond the original development team. Scientists, often working completely independently from the founders, have taken Galaxy into entire new research domains. The strength of the Galaxy community is also seen every year at the community-run conference that brings hundreds of participants together to share their latest contributions and applications.
James was an ardent and principled advocate for open science, especially open access to scientific data and open-source software. James said that software may come and go, including even Galaxy, but the metadata that Galaxy collects will ultimately be his most valuable contribution to science. This metadata enables anyone to observe all analysis steps and reproduce entire analyses, providing the bedrock for future discoveries. Without such transparency and rigor, he explained, the entire field will suffer.