From: OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy

The gene length and phylogenetic distance normalisation procedure for a single species pair. a BLAST bit scores for all hits between Homo sapiens and Mus musculus. b BLAST bit scores for the top 5 % of BLAST hits with least-squares fit of the equation log10 B qh = a log10 L qh + b., where B qh is the bit score for the hit between sequence q and sequence h and L qh is the product of the gene lengths (measured in amino acids). c Gene length and phylogenetic distance normalised BLAST bit scores. Note that there are a large number of poor scoring hits for long sequences due to these hits exceeding the BLAST search e-value cutoff. d The same top 5 % of BLAST hits as shown in (b) after normalisation for reference

