Analysis of gene length dependency of BLASTp scores. a BLAST log10(bit score) for all hits between Homo sapiens (Homo_sapiens.GRCh37.60.pep.all, 21,841 sequences) and Mus musculus (Mus_musculus.NCBIM37.60.pep.all, 23,111 sequences).
b –log10(e-value) for all hits between and Homo sapiens and Mus musculus. To avoid infinite values, BLAST scores of zero have been replaced with the lowest obtainable value 10−180. The heat map in both cases goes from blue (lowest density of hits) to red (highest). c The F-score (red), recall (blue) and precision (green) of orthogroup inference using OrthoMCL plotted as a function of sequence length. The sequences were sorted according to length and divided into four bins with the same number of sequences in each. The F-score, recall and precision were calculated for each bin and the scores plotted against the geometric mean of the length of the sequences in each bin. The error bars show the lower and upper limits of sequence lengths for the shortest and longest sequences in each bin and the dot shows the geometric mean of these lengths. d Histogram of all protein-coding gene lengths in Homo sapiens is provided for reference