Mineotaur: a tool for high-content microscopy screen sharing and visual analytics

High-throughput/high-content microscopy-based screens are powerful tools for functional genomics, yielding intracellular information down to the level of single-cells for thousands of genotypic conditions. However, accessing their data requires specialized knowledge and most often that data is no longer analyzed after initial publication. We describe Mineotaur (http://www.mineotaur.org), a open-source, downloadable web application that allows easy online sharing and interactive visualisation of large screen datasets, facilitating their dissemination and further analysis, and enhancing their impact. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0836-5) contains supplementary material, which is available to authorized users.

Mineotaur is a web application to share and visually analyse high-throughput/high-content microscopy screens developed at the Carazo Salas lab of the University of Cambridge, Department of Genetics.
The project website can be found at http://www.mineotaur.org. Please cite the following paper when using Mineotaur: B. Antal, A. Chessel, R. E. Carazo Salas: Mineotaur: interactive visual analytics for high-content microscopy screens, under revision. Contents:

CHAPTER 1
Getting started

Motivation
Despite the ground-breaking discoveries in genomics, the genomes of most organisms remain black boxes with the function of the majority of genes and gene products still unknown. Moreover, many genes and proteins play roles in multiple biological processes. High-throughput/high-content microscopy-based screening (HT/HCS) provides an increasingly powerful tool to discover and functionally annotate genes and biological pathways, which already led to several important discoveries, like the systematic identification of genes important for mitosis, endocytosis, and other fundamental processes. Specialised large-scale image and data analysis methods are needed to produce phenotypic data, limiting such functional genomic annotation techniques to researchers of groups that possess that expertise. This means that the community at large is limited in their access to data and their ability to further mine it after publication, reducing the impact of the expensive HT/HC screens. Overall, while technical advances led to an explosion in the amount of data being acquired, suitable data handling, visualization and analysis techniques are still lagging behind.

What is Mineotaur?
Here we propose a novel data visualization tool called Mineotaur (http://www.mineotaur.org), which will allow the community to mine further the raw multidimensional feature data and knowledge from published HT/HC screens leading to a better exploitation of experimental results. The user interface allows the members of the community without any computational knowledge to extract meaningful information from the data. The web interface can be used for querying the data and the results are visualized as plots (e.g. scatter plot, histogram) in real-time. The tool is based on a novel data model allowing the visualization and analysis of extremely large amounts of data.

Generting a Mineotaur instance from text files
To generate a Mineotaur instance, you have to provide three input files: a data file containing all the measurements you want to include in Mineotaur, a label file containing the annotations assigned to the objects in Mineotaur and a file setting several options in Mineotaur. A sample for all input file can be downloaded here..

Data file
The input data file can ?SV (? Separated Values), where ? is an appropriate separator set in the options file (e.g. TSV -Tab Separated Values). Each line describes a set of measurements for a descriptive object, which is a unique obejct of interest in the experiment. Each descriptive object should be connected to a group object. Examples: descriptive object -cell, group object -gene. The file is consists of a header, an object and a type descriptor and the data lines.

Header
The first line of the data file. The header describes the names of the properties to be stored in the Mineotaur. Each name must be unique for a given object type and should not contain non-alphanumerical characters.

Object descriptor
The second line of the data file. The object descriptor describes what kind of real-world object does the respective column belongs to. The object descriptors can be any string. However it is advised to give semantically relevant names to future usage. Examples: Gene, Cell, Experiment.

Type descriptor
The third line of the data file. The type descriptor describes the data type for each column. The following types are accepted: * ID: identifier for a given object. Can be multiple IDs for one object type. * NUMBER: numerical data. Each numerical column of the descriptive can be queried. * TEXT: non-numerical data.

Data lines
Each line after starting from the fourth should contain the actual measurements for a descriptive object and other meatadat connecting them to experimental conditions.

Label file
The label file contain the annotations for the group level objects. For example, what genes were picked up as hits in a study. The label file consists of a header line and multiple label lines.
The first line of the label file. The first column contains the name of the group object ID property from the data file, while the rest of the columns contain the annotations.
Each line starting from the second contains a group object ID and a 1 for each annotation assigned to the group object or 0, otherwise.

Options file
The options describes metadata for the instance generation. All options are in the following format: option_name = option_value. The following options can be set: • (REQUIRED) name: name of the instance • group: name of the group object (same as described in the data file). Default: GENE • groupName: group object ID (same as described in the data file). Default: geneID • descriptive: name of the group object (same as described in the data file). Default: CELL • total_memory: the amount of memory can be used by Neo4J. Default: 4G • separator: character used to separate columns in the data and the label files. Default: \t • overwrite: whether to overwrite the current instance with the same name. Default: true 2.5 Generation from command line 1. Download the latest jar file from http://www.mineotaur.org.

Create a property file, a data file and a label file (see documentation and example input data)
3. Start the data import with the following command: java -jar <path_to_jar file> -import mineotaur.input chia_sample.tsv chia_labels.tsv 4. After the database creation is completed you can start your Mineotaur instance with the following command:

Generation using the wizard
1. Download the latest jar file from http://www.mineotaur.org.
2. Create a property file, a data file and a label file (see documentation and example input data) 3. Start the data import with the following command: java -jar <path_to_jar file> -wizard 4. After the database creation is completed you can start your Mineotaur instance with the following command: java -jar <path_to_jar file> -start <instance_name> 5. You can start querying at http://127.0.0.1:8080 in your browser.

Layout
Each Mineotaur instance use the same web interface layout.

Menu
The top element is the menu, showing the different querying tools which can be selected.
By clicking on any of the elements, the respective query panel will be activated and shown.

Query panel
In each query panel, a variety of different option can be set to customize the query.
For more details, please go to the Query tools, Scatterplots and the Distribution plots pages.

Plot area
The plot showing the requested data will be shown below the query panel.
After the query, the Tools menu is activated, which allows different actions regarding the plot and the queried data.
For more details, please go to the The Tools menu, Scatterplots and the Distribution plots pages.

Help
(Optional) If provided with the Mineotaur instance, clicking on the Help link show information on the elements shown on the current page.

10
Chapter 3. Using the web interface CHAPTER 4 Query tools

Variable selection
First, the two variables (properties) to be shown needs to be selected.
Clicking on the selection panel shows the available variables.
(Group level scatterplot only)The group level variables are aggregated from the descriptive level data. The aggregation mode selection panel allows the selection of how the group level value are supposed to be caluclated.
Finally, the descriptive data can be filtered via a filter property. That is, in this example only those cells are used in the query, which are in the selected cell cycle stage.
The entered gene names will be validated and the ones included in the screen will be selected. Once every option is selected, the submit button needs to be clicked. If you want to start over, click the Reset button which will turn every option to be their default settings.

Coloring
The coloring of the data point are based on the colors associated to each annotation (hit) type, which can be seen in the top right corner of the plot: The nodes are also transparent, which enables the visual representation of multiple annotations (for which the coloring is the addition of the colors) as well as showing distribution of the data points.

Exploring invidiual data points Name and values
To see the name of the underlying data point and the respective values for the queried variables, hover the mouse over the data point.

Logarithm
Clicking on the Logarithm checkbox transform the axes of the plot to logarithmic scale.
To go back to the original scale, untick the checkbox.

Transpose
Clicking on the Logarithm checkbox swaps the X-axis and the Y-axis.

Chapter 5. Scatterplots
To go back to the original scale, untick the checkbox.

Regression
Clicking on the Regression checkbox fits a regression line on the data shown in the current plot. The type of the regression line can be selected from the selection box next to the checkbox.
To see the correlation coefficient of the regression line, hover the mouse over the line: To remove the regression line, untick the checkbox.

Select area
To analyze a specific area of the plot, use the Select area tool. Checking the box transforms the cursor to an area selection tool, what you can use to draw a rectangle around the area to be selected: in different color. The legend is provided in the top right corner.

Kernel Density Estimation
Kernel Density Estimation plots show a continious approximation of the distribution with a Gaussian function fitted to the data. In group level plots, the different colors refer to the data point annotations. The legend is provided in the top right corner.

Download
By clicking on the CSV or SVG link, the raw data for creating the plot or the plot itself in a vectorgraphial format can be downloaded, respectively.

Statistics
The Statistic tool provide a summary on the underlying data of the plot.

Share link
A permanent link for each plot can be generated with Mineotaur. The permanent link can be pasted in any browsers address bar and it will load the appropriate plot. The link has no expiry date, it invokes a mechanism which will recreate the plot.

Architecture of Mineotaur
The Mineotaur web server can be accessed from both a web interface and programatically using REST. The web server handles the interaction with the graph database containing the HT/HCS data.

Server side architecture
The web server if based on the Spring Model-View-Controller (MVC), using Thymeleaf as a template engine. The data is stored in the Neo4j graph database. A web client can access the content by making an HTTP request to the server, which will query the appropriate data from the database and render a web page from a Thymeleaf template.