PEGR: a flexible management platform for reproducible epigenomic and genomic research

Reproducibility is a significant challenge in (epi)genomic research due to the complexity of experiments composed of traditional biochemistry and informatics. Recent advances have exacerbated this as high-throughput sequencing data is generated at an unprecedented pace. Here, we report the development of a Platform for Epi-Genomic Research (PEGR), a web-based project management platform that tracks and quality controls experiments from conception to publication-ready figures, compatible with multiple assays and bioinformatic pipelines. It supports rigor and reproducibility for biochemists working at the bench, while fully supporting reproducibility and reliability for bioinformaticians through integration with the Galaxy platform. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02671-5.

Galaxy User coming along and running the custom tools and potentially inadvertently adding information to the pegrDB. A more generalized approach would seem to be to use/design standard tools that can produce the statistics needed, which could be available from the ToolShed and could be useful to standard Galaxy users; then have PEGR interact directly with the Galaxy API. The two potential negatives that this would introduce would be that it would require PEGR to poll Galaxy for job completion (instead of just waiting for a POST) and some minimal parsing of e.g. text files with tables containing values, instead of JSON. But this approach would allow e.g. any number of PEGR servers to work with a single Galaxy server. And in fact, could be configured to allow a single PEGR server to work with multiple Galaxy servers for e.g. different workflows or organisms, etc. Could the authors please comment on these points?
The instructions on configuring PEGR with analysis workflow/pipeline support is beyond minimal, but is available by inspecting these github repos: "The PEGR-Galaxy communication scripts are available at https://github.com/CEGRcode/pegr-galaxy_tools and the PEGR NGS pipeline scripts are available at https://github.com/CEGRcode/pegr-ngs_pipeline under the MIT license." There must be clear instructions for installing and enabling the advertised interconnectivity of PEGR and Galaxy. More reliance on Galaxy API could simplify a lot of the configuration currently needed within PEGR.
The PEGR-Galaxy communication scripts are available at https://github.com/CEGRcode/pegr-galaxy_tools Listed in this manuscript (created 3 months ago): $ git clone https://github.com/CEGRcode/pegr-galaxy_tools.git Cloning into 'pegr-galaxy_tools'.. There are no differences between the tools in these separate repositories: $ diff -r cegr-galaxy/tools/cegr_statistics/ pegr-galaxy_tools/tools/ It is not clear why a new repo was created 3 months ago with the tools manually copied over from a 5 year old repository without including attribution of the primary committer.
It would be helpful if the authors could highlight the specific differences between "PEGR" and "CEGR" [my apologies if this is not the correct naming for this previously existing system], in particular the interconnectivity of the LIMS and analysis workflow systems (e.g. Galaxy).

Reviewer 2
Were you able to assess all statistics in the manuscript, including the appropriateness of statistical tests used? There are no statistics in the manuscript.
Were you able to directly test the methods? No.

Comments to author:
The paper introduces a web-based platform for sample tracking and project management of genomic experiments that integrates with bioinformatics analysis performed on Galaxy. The paper is well written and provides sound justification for the approach and good technical details, with several screenshots and diagrams that clarify how the system works. An initial version of the platform was presented in 2020 on ACM Intl. Conference Proceeding Series, and although it is still probably early days, it would be nice if the authors could include some numbers on user uptake, or at least a description of its impact where it was deployed at their labs. I have tried without success to install PEGR following the instruction on GitHub (see suggestion below about providing a Docker image or demo server), so I'll have to base my comments on the description given in the paper. Integrating platforms like LIMS and workflow management systems is a complex undertaking (especially for an open source project), where there is a risk of creating a platform that is too complex or not good enough (in terms of features or usability) for the single tasks. In this respect, I support the authors' approach of not reinventing the wheel for the analysis subsystem, although the way in which Galaxy tools are integrated seems half-baked (see detailed comment below). In conclusion, the paper and the software address the important issue of extending reproducibility from the wet lab up to the final report of a bioinformatic analysis, and I hope the authors succeed in popularising PEGR and continue its development.
Comments: -Background, 1st paragraph: references 4-11 seem very general and not particularly relevant to the reproducibility issues described.
-Page 5, 1st paragraph: "To prevent disorganization ..., ItemType's are organized into an ItemTypeCategory" -maybe replace "organized" with "grouped". Also, is this a proper hierarchical structure (with ItemCategory's grouped in a higher ItemCatoregory)? -Page 5: Does PEGR comw with some predefined ItemType's? Would a AppStore-like website be helpful for sharing definitions among PEGR admins? - Figure 3 caption: "(E) 'ProtocolGroup' is accessible through the Admin console" -Why can ProtocolGroup's can be created only by admins? This seems something that can become a bottleneck.
-Page 9, 3rd paragraph: "A traced sample can be added to an experiment using either the webinterface or the QR barcode system" -Can samples be added in advance (before creating a new Experiment? There is a "Samples" button in the top menu.
-Page 24, Availability of data and materials: I'd suggest to also make PEGR available as a Docker image and/or a demo server. I've tried to run it following the instructions at https://github.com/seqcode/pegr but ended with a long page full of Java exceptions. -Page 20: "When an analysis step finishes, its output data will be posted to PEGR immediately." -It's not clear how this happens, was Galaxy modified to initiate a HTTP POST request towards the PEGR RESTful API after a tool is run, or is it the tool's responsibility? By looking at https://github.com/CEGRcode/pegr-galaxy_tools it seems it is the latter, in fact each Galaxy tool published there sends its output to PEGR autonomously. If that is the case, this is a serious limitation to the usefulness of PEGR, because users wouldn't be able to directly make use of the thousands of Galaxy tools available on the Galaxy ToolShed.
Minor comments: -Background, 1st paragraph: "What is needed is systematic metadata capture..." -This sentence could be made a bit more absolute, something like "One way to tackle these issues is to apply systematic metadata capture..." -Background, 3rd paragraph: https://doi.org/10.1093/bioinformatics/btt115 is an (abandoned) attempt to extend Galaxy with LIMS functionalities that may be worth to mention. -Background, 4th paragraph: "Galaxy.org" -> "Galaxy" -Page 7, last paragraph: "organize the variety of protocols that often come sequentially in a pipeline": do protocols have to come in a sequence? I think it should be possible to execute some steps in parallel, so maybe "organize the variety of protocols that often compose an experiment" -Page 9, 1st paragraph: "The PEGR 'Experiment' interface is designed to track and maintain the relational links between reagents, protocols, and the resulting end products" -I think that "samples" should be listed here even if they are formally introduced some lines below.
- Figure 4 caption: "...the effect of electing a Protocol Group" -"electing" -> "selecting" -Page 11, 2nd paragraph: "...XML-wrapper python scripts which send a JSON file to PEGR in a standard POST request." -> "...XML-wrappers for Python scripts which send a JSON file to PEGR RESTful API in a standard HTTP POST request." - Figure 6 caption: "...the users affiliated projects..." -> "...the user's affiliated projects..." -Page 16, 1st paragraph: "It supports reproducibility... management [22,26]" -A bit unreadable, I would rewrite as "It supports scientific data management by tracking samples from the very first step of sample preparation to the end of bioinformatics analysis and data reporting thus supporting the FAIR principles [22] and the reproducibility goals of the Galaxy platform [26]." -Supplemental Figure 1: It would be useful to include or link the source file used to generate this figure, the diagrams are too small to be useful/readable. Diversity and inclusion are a part of Cornell University's heritage. We are a recognized employer and educator valuing AA/EEO, Protected Veterans, and Individuals with Disabilities.
We kindly thank the reviewers for their critical assessments. Our responses to the issues raised are written in bold. Pertinent modified text in the manuscript is italic small text.
Reviewer #1: Software Testing: Starting with a fresh Ubuntu 20.04.3 virtual machine running under VirtualBox that was updated and had required packages installed, I was able to successfully compile PEGR from source, resulting in a "pegr-0.1.war" that I was able to successfully run and interact with. However, for the rest of this review I used the latest officially released pre-compiled pegr.war file provided (https://github.com/seqcode/pegr/releases/tag/v0.2.6).
1. Barcode Scanner to add samples to experiment does not work. Clicking the QR/Barcode link will launch the android app, and then scanning barcodes will work, and then automatically load back into the PEGR website, but the textbox is not filled in with the barcode value.
We are extremely grateful to the reviewer for the extensive list of detected issues and have attempted to address as many of them as possible which we detail below. In attempting to reproduce this bug, we used the 'Barcode Scanner' app by 'ZXing' on an Android device as described in the manuscript and were unable to reproduce this issue. While we were unable to reproduce this particular bug, we note that we have now upgraded PEGR to run on Grails 4 and Java 11. We are hopeful that between this large-scale version upgrade and the several dozen bug fixes we have implemented since our initial submission, that this particular issue has been resolved. 2. "Samples" will not be populated until a BioSample that has been added to inventory is added to an experiment.
We agree with the reviewer that this is a potentially unclear aspect of PEGR's design. This design consideration was done in strong collaboration with Dr. Frank Pugh, leveraging his extensive experience in biochemical experiment design and implementation. The question of when a sample becomes a 'Sample' was determined to be the point at which it becomes involved within an experiment. The utility of initializing a Sample before an experiment was performed seemed problematic as this could potentially encourage users to initialize any number of 'theoretical' Samples within PEGR that are not linked to any actually performed experiment. We have further expanded the text within the Experiment section to better explain the rationale for why this is an important distinction in biochemical experimental design.
A typical lab process is to generate common laboratory reagent stocks (e.g., wash buffers) that are used multiple times across many different downstream experiments. However, more complicated experimental setups like ChIP-seq, involve a 'traced' sample which moves through multiple sequential experiments and combines with different reagents as it transitions through product states (e.g., sonicated chromatin converts to DNA library). A 'traced' sample typically begins as a 'BioSample' in PEGR. The 'BioSample' is assigned a unique 'Sample' ID within the PEGR database the moment it as added to an Experiment. This provides a clear delineation in the creation of new Samples in PEGR and helps to prevent users from initializing any number of theoretical Samples that are unlinked to any Experiment. This functionality mirrors the best practices of a standard laboratory notebook. As lab notebooks are not designed to record proposed experiments, but only provides the record of a performed Experiment, this logic is consistent with standard biochemical wet-bench practices. Importantly in the case of traced samples, PEGR can display all the states that a sample has transitioned through allowing for full experimental history tracking. A traced sample can be added to an experiment using either the web-interface or the QR barcode system ( Figure 4E). Importantly, PEGR allows multiple samples to be attached to a single protocol. This enables the operator to process multiple samples in Diversity and inclusion are a part of Cornell University's heritage. We are a recognized employer and educator valuing AA/EEO, Protected Veterans, and Individuals with Disabilities.
a batch while only needing to enter the related information once (e.g., when performing ChIP-seq on 8 samples in parallel).
3. When creating BioSample inventory (sonicated chromatin) and selecting Genus: "Homo", Species: "Sapien", and clicking save, it incorrectly saves the values as Saccharomyces cerevisiae. Going into edit, and changing to Homo sapien, it then requires setting "strain" to a value in order to save. I did create several S. cerevisiae samples first, and they can be created without strain, but if you go into edit and then try to save, you are forced to fill out strain to save these as well. It is unclear why strain is always required.
We were unable to reproduce the error as described by the reviewer. However as noted above, we hope that the significant upgrades performed since submission have ameliorated this particular issue. We also agree with the reviewer that 'Strain' should not be a required field for initializing a BioSample and we have removed that requirement from PEGR: (https://github.com/seqcode/pegr/commit/96fecc3ff6b94f14c5174d04bd6f5b1198368c57) 4. I did not have an illumina sequencer available to fully test instrument support, but I created a sequence run in PEGR UI. Clicking the "Add Master Pool" results in an empty screen in web browser, and a traceback in shell log. Errors (and ideally suggestions to fix) should be displayed to the user, or at least an indication of an error instead of an empty page. We were also unable to reproduce the error as described by the reviewer, however many of the bug fixes made in the repo since submission directly address outputting StdErr to the user on the web-front end instead of solely through logging files a standard user may not have access to. Example updates:

5.
When adding inventory to protocol instance, if you are creating new, and you do not set a barcode under "search", it will allow you to save and go to the next stage and let you name the new item, but you cannot set barcode value, and trying to save results in an error printed only to shell logs.
We thank the reviewer for identifying this issue. It should now be resolved in the latest version of PEGR: (https://github.com/seqcode/pegr/commit/bad84a21393a9c2f45debd998766417fe02cd901) 6. When trying to add inventory item to protocol instance, if you enter a barcode for the wrong itemtype, you get a blank screen in the browser, but no error message --there is an error in the shell log.
We thank the reviewer for identifying the lack of error messaging to the web front end user. We have update PEGR to appropriately inform the user of the error: (https://github.com/seqcode/pegr/commit/4fca2e4175a75299a0cc3e9105bfb7d2c2039f68) 7. Two example workflows, listed under "Pipelines" in PEGR cannot be accessed, they are behind a PSU.edu forced login: https://chipexo-gw.aci.ics.psu.edu/workflow/display_by_id?id=f2db41e1fa331b3e https://chipexo-gw.aci.ics.psu.edu/workflow/display_by_id?id=a799d38679e985db Also, it seems that the above PSU.edu Galaxy instance has been configured without a proper secret key, as the encoded workflow ids "f2db41e1fa331b3e" and "a799d38679e985db" matches integer 1 and 6, using the default non-secure "id_secret". Recommend following instructions from Galaxy project on setting up a secure production server.
We apologize for the error in workflow availability. As the reviewer noted, the default PEGR workflows were incorrectly pointing to a private Galaxy development instance. We have now updated the baseline SQL database to appropriately point at publicly available workflows: (https://github.com/seqcode/pegr/commit/edb7bba3b1a28cb47b7701793150ef07298a91a3) (https://raw.githubusercontent.com/CEGRcode/pegr-galaxy_tools/main/workflows/paired_002.ga) (https://raw.githubusercontent.com/CEGRcode/pegr-galaxy_tools/main/workflows/single_002.ga) 8. While LIMS systems are of interest, it is the integration of PEGR with Galaxy and other potential Workflow/Analysis systems that is the novel (and dare I say "interesting") contribution.
Briefly, the interconnection between PEGR and Galaxy utilizes a set of scripts run on the computer where the "NGS repo" is located (e.g. where sequencing datasets are deposited by illumina sequencer) that will upload data to Galaxy and then execute workflows. For Galaxy to work with PEGR, it must have a custom set of tools installed that are able to report back status and statistics about their parent jobs. These custom tools then POST JSON values to the PEGR server. PEGR provides its own API to receive these posts from the custom Galaxy Tools. Additionally, the workflow being called in Galaxy must have parallel Steps defined inside of PEGR, which will be used to relay status.
This design seems less than ideal. First, it requires the manual installation of nonstandard tools into a Galaxy instance --tools are not installable from ToolShed. There is one new configuration file per Galaxy instance that provides the details for these custom tools to interact with PEGR API. This essentially causes the case where you need to have a Custom Galaxy instance that is specific to a particular PEGR server, and vice versa. A single Galaxy server cannot work with more than 1 PEGR server and a PEGR server cannot work with more than 1 Galaxy server. And a PEGR instance must be specifically configured with a specific Galaxy server and the Galaxy server must be specifically configured to work with that PEGR server. There is no way to e.g. take advantage of a general use institutional Galaxy server. Having to duplicately define workflow steps in PEGR and Galaxy seems unnecessary, PEGR could parse Galaxy API description of Workflow.
An example of one of these custom tools is "BWA-MEM single read output statistics", which takes a BAM file as input, provides a python file that e.g. calls "samtools view" and may perform some computations in e.g. numpy and then builds up a custom PEGRonly JSON file containing statistics and POSTs back to PEGR. There doesn't seem to be any concern for e.g a standard Galaxy User coming along and running the custom tools and potentially inadvertently adding information to the pegrDB. A more generalized approach would seem to be to use/design standard tools that can produce the statistics needed, which could be available from the ToolShed and could be useful to standard Galaxy users; then have PEGR interact directly with the Galaxy API. The two potential negatives that this would introduce would be that it would require Diversity and inclusion are a part of Cornell University's heritage. We are a recognized employer and educator valuing AA/EEO, Protected Veterans, and Individuals with Disabilities.
PEGR to poll Galaxy for job completion (instead of just waiting for a POST) and some minimal parsing of e.g. text files with tables containing values, instead of JSON. But this approach would allow e.g. any number of PEGR servers to work with a single Galaxy server. And in fact, could be configured to allow a single PEGR server to work with multiple Galaxy servers for e.g. different workflows or organisms, etc. Could the authors please comment on these points?
We agree with the reviewer that the design for PEGR-Galaxy communication is not currently optimized and is a point that was also raised by Reviewer 2. As this reviewer noted, the dependency on Galaxy sending communication to PEGR provides the significant upside of not requiring PEGR to continually poll Galaxy for job status. This functionality would have essentially unnecessarily reproduced the workflow functionality of Galaxy. The downside of this design is that while PEGR can communicate with multiple Galaxy instances, a true manyto-many relationship between multiple Galaxy and PEGR instances is not currently supported. We have now added text to the Discussion section describing the future of PEGR-Galaxy tool development and how we plan to allow for a true many-to-many relationship between PEGR and Galaxy instances.
Future PEGR development will focus on supporting additional bioinformatic workflows and genomic assays. The currently supplied bioinformatic analysis processing workflow is hard-coded to the Illumina sequencing platform. Future upgrades that can be made to PEGR include providing compatibility with non-Illumina sequencing pipelines (e.g., PacBio, Oxford Nanopore) and enhancing the sample submission process using the native web interface. Our long-term goals include enhancing role security to provide compliance with the EU GDPR, NY SHIELD, and California CCPA privacy laws for storing de-identified patient meta-information. We also believe that given the prominence of many internationally funded Galaxy instances (e.g., https://usegalaxy.org/, https://usegalaxy.eu/), a key future upgrade will be to enable multiple PEGR instances to communicate with multiple Galaxy instances in a full many-to-many relationship. This will enable researchers to directly benefit from well-funded bioinformatic rigor and reproducibility initiatives by reducing the overhead required for smaller groups to run their own Galaxy instances. These upgrades and more provide a clear path forward for providing rigorous and reproducible research across the biochemical and biomedical fields.
There are no differences between the tools in these separate repositories: $ diff -r cegr-galaxy/tools/cegr_statistics/ pegr-galaxy_tools/tools/ It is not clear why a new repo was created 3 months ago with the tools manually copied over from a 5 year old repository without including attribution of the primary committer.
We apologize for any confusion. The GitHub repo referenced by the reviewer was the original developmental repo, however over the course of the 6 years of the project, as a separate matter from the project, the NIH funded Principal Investigators for this project changed institutions years ago, which changed the distributions and control of the NIH funding. The PIs involved decided a new repo would be the easiest mechanism for the move to the new institution. While the original attribution was maintained for this repo: (https://github.com/CEGRcode/pegr-ngs_pipeline) we neglected to include the attribution under the second repo generated:

(https://github.com/CEGRcode/pegr-galaxy_tools)
The lack of attribution during the repo change was an oversight and has now been rectified. We also note that the file path changes remarked upon by the reviewer were chosen for the purpose of emphasizing generalized file paths new users would need to customize for their own PEGR deployments. While we regret any confusion, the nature of this explanation goes far beyond what we feel should be reported in the manuscript.
(https://github.com/CEGRcode/pegr-galaxy_tools) before, we now provide a generalized tool which communicates from nearly any given tool in Galaxy to PEGR, while still sending critical required information about the status of the Galaxy tool and analyzed dataset: (https://github.com/CEGRcode/pegr-galaxy_tools/commit/06f836cf2fb46e452119ad8f6c45650cb6a2e370) We have also expanded the Discussion to note that future development will focus on better optimization and generalization of PEGR-Galaxy tool communication.
We also believe that given the prominence of many internationally funded Galaxy instances (e.g., https://usegalaxy.org/, https://usegalaxy.eu/), a key future upgrade will be to enable multiple PEGR instances to communicate with multiple Galaxy instances in a full many-to-many relationship. This will enable researchers to directly benefit from well-funded bioinformatic rigor and reproducibility initiatives by reducing the overhead required for smaller groups to run their own Galaxy instances.
In conclusion, the paper and the software address the important issue of extending reproducibility from the wet lab up to the final report of a bioinformatic analysis, and I hope the authors succeed in popularising PEGR and continue its development.
Comments: -Background, 1st paragraph: references 4-11 seem very general and not particularly relevant to the reproducibility issues described. -Page 5, 1st paragraph: "To prevent disorganization ..., ItemType's are organized into an ItemTypeCategory" -maybe replace "organized" with "grouped". Also, is this a proper hierarchical structure (with ItemCategory's grouped in a higher ItemCatoregory)?

We appreciate the reviewer's comments and have updated the references to include additional publications directly related to issues in experimental
Fixed. ItemTypeCategory's are not currently designed to be hierarchical, however we can add that functionality if users find a greater need for better hierarchical organization.
-Page 5: Does PEGR comw with some predefined ItemType's? Would a AppStore-like website be helpful for sharing definitions among PEGR admins?
PEGR does come with a variety of pre-defined ItemType's that are designed to support ChIP-seq and RNA-seq genomic experiments. We like the idea of a AppStore-style interface and would be interested in supporting that as an extended feature system in the future. In the short-term we have provided a mechanism by which a user can upload a CSV-file containing a large number of ItemType's. This provides a mechanism for a user to initialize a PEGR instance customized for their particular use without requiring them to manually input each item in through the webform.
- Figure 3 caption: "(E) 'ProtocolGroup' is accessible through the Admin console" -Why can ProtocolGroup's can be created only by admins? This seems something that can become a bottleneck.
This design consideration was strongly suggested by Dr. Frank Pugh who provided critical UX insight based on his extensive biochemistry training experience (>30 years). A common issue in laboratories that generate novel genomic assays is the production of a wide-range of variant assays that never see the light of day. While these approaches should be thoroughly documented, they do not represent a bestpractice protocol for the general lab. This was the rationale for why a PEGR user is able to initialize and execute any novel protocol and experiment but is unable to formalize it into a ProtocolGroup visualized by all lab members. A ProtocolGroup represents a thoroughly vetted workflow that has passed some form of review to be adopted as a general laboratory protocol. We have now expanded the Protocol section to better explain this. -Page 9, 3rd paragraph: "A traced sample can be added to an experiment using either the web-interface or the QR barcode system" -Can samples be added in advance (before creating a new Experiment? There is a "Samples" button in the top menu.

This point was also raised by Reviewer 1 and has been addressed above. In short, we believe that a clear delineation in when a Sample comes into existence occurs when the experiment is performed. This functionality reflects real life wet-bench best practices and we believe is relevant to the PEGR user experience.
A typical lab process is to generate common laboratory reagent stocks (e.g., wash buffers) that are used multiple times across many different downstream experiments. However, more complicated experimental setups like ChIP-seq, involve a 'traced' sample which moves through multiple sequential experiments and combines with different reagents as it transitions through product states (e.g., sonicated chromatin converts to DNA library). A 'traced' sample typicall begins as a 'BioSample' in PEGR. The 'BioSample' is assigned a unique 'Sample' ID within the PEGR database the moment it as added to an Experiment. This provides a clear delineation in the creation of new Samples in PEGR and helps to prevent users from initializing any number of theoretical Samples that are unlinked to any Experiment. This functionality mirrors the best practices of a standard laboratory notebook. As lab notebooks are not designed to record proposed experiments, but only the record of a performed Experiment, this logic is consistent with standard biochemical wet-bench practices. Importantly in the case of traced samples, PEGR can display all the states that a sample has transitioned through allowing for full experimental history tracking. A traced sample can be added to an experiment using either the web-interface or the QR barcode system ( Figure  4E). Importantly, PEGR allows multiple samples to be attached to a single protocol. This enables the operator to process multiple samples in a batch while only needing to enter the related information once (e.g., when performing ChIP-seq on 8 samples in parallel).
-Page 24, Availability of data and materials: I'd suggest to also make PEGR available as a Docker image and/or a demo server. I've tried to run it following the instructions at https://github.com/seqcode/pegr but ended with a long page full of Java exceptions.
We agree wholeheartedly and note this issue was raised by Reviewer 1 as well. As mentioned above, we now provide a Docker instance of PEGR for deployment without requiring a user to install the full suite of software dependencies.
(https://hub.docker.com/repository/docker/dshao/pegr) -Page 20: "When an analysis step finishes, its output data will be posted to PEGR immediately." -It's not clear how this happens, was Galaxy modified to initiate a HTTP POST request towards the PEGR RESTful API after a tool is run, or is it the tool's responsibility? By looking at https://github.com/CEGRcode/pegr-galaxy_tools it seems it is the latter, in fact each Galaxy tool published there sends its output to PEGR autonomously. If that is the case, this is a serious limitation to the usefulness of PEGR, because users wouldn't be able to directly make use of the thousands of Galaxy tools available on the Galaxy ToolShed.
This issue was raised with Reviewer 1 as well. We agree that relying on individual autonomous tools dramatically reduces the plug-and-play nature of Galaxy-PEGR communications. However, we note that the majority of tools required to communicate the progress of a workflow to PEGR can likely function as a simple Boolean call of pass/fail. This would require only a far smaller pool of tools to require custom autonomous POST requests to PEGR with results tailored to the specialized analysis. We have now made a generalized PEGR-Galaxy communication tool available on the Git repo: (https://github.com/CEGRcode/pegr-galaxy_tools/commit/06f836cf2fb46e452119ad8f6c45650cb6a2e370) Minor comments: -Background, 1st paragraph: "What is needed is systematic metadata capture..." -This sentence could be made a bit more absolute, something like "One way to tackle these issues is to apply systematic metadata capture..."

Edited as follows:
One method to address these issues is to apply systematic metadata capture and management software that is tailored to (epi)genomic data collection.

Edited as follows:
To our knowledge, there are no free open-source platforms in active development that manage entire experimental pipelines, from wet-bench experiments to bioinformatic analyses [23].
-Background, 4th paragraph: "Galaxy.org" -> "Galaxy" Fixed -Page 7, last paragraph: "organize the variety of protocols that often come sequentially in a pipeline": do protocols have to come in a sequence? I think it should be possible to execute some steps in parallel, so maybe "organize the variety of protocols that often compose an experiment"

Edited as suggested:
Similar to how ItemTypeCategory is used to organize the wide variety of ItemTypes in the 'Inventory', Protocol Groups are used to consolidate and organize the variety of protocols that often compose an experiment ( Figure  3E).
-Page 9, 1st paragraph: "The PEGR 'Experiment' interface is designed to track and maintain the relational links between reagents, protocols, and the resulting end products" -I think that "samples" should be listed here even if they are formally introduced some lines below.