HGT simulations. (a) Eleven core gene datasets for each analyzed genome were determined and, for each genome, 11 Markov models were built based on these different gene datasets. (b) The efficiency of our MM approach, the BM approach, and a GC% approach to detect foreign ORFs was tested by performing in silico HGT simulations using a variety of core gene datasets. (c) For the HTG simulations, 100 genes were chosen from the other 118 genomes and 100 random core ORFs were in silico introduced in the genome under analysis. (d) The average number of these ORFs that were detected as atypical (false positives, expected to be low) was determined. (e) After 100 simulations we searched for the core genes dataset and the cut-off where the average detection of simulated HGT was the highest but the average detection of native core genes was the lowest. (f-h) Average result after 100 HGT simulations for the 119 analyzed genomes using the MM, BM and GC% methods with species-specific core gene datasets and cut-offs. Blue dots represent the average number of true positives detected. Green dots represent the average number of false positives detected. The MM method had a significantly higher rate of detection of true positives than the BM method (Wilcoxon test W = 11,849, P-value < 2.2 e-16, means = 86.8 and 74.8 for the MM and BM methods, respectively; and Wilcoxon-test W = 13,824, P-value < 2.2 e-16, means = 86.8 and 52.6 for MM and GC%, methods, respectively). No significant differences were found between the MM and BM methods in the detection of false positives (Wilcoxon-test W = 8,359, P-value = 0.0311, means = 12.4 and 11.0 for the MM and BM methods, respectively).