Reference Genomes

This COMPARE Reference Genomes page offers a curated selection of published reference sequences covering viral (Norovirus, Hepatitis A virus), bacterial (Salmonella enterica enterica, Listeria monocytogenes, Escherichia coli) and protozoan (Cryptosporidium) genomes.

The set of reference genomes has been selected to cover some of the most important foodborne pathogens, which are of great public health relevance. The reference genome set is provided as a first step in enabling standardized, comparable genomic analysis within each of the organisms.

For each of the organisms the reference set has been selected to cover the most important clusters/types, to the extent possible with the publically available genomes.

These sets of reference sequences can be used to reliably and exhaustively blast/map/annotate or genotype NGS reads or contigs originating from the corresponding microorganisms, from any type of NGS experiment. The sets are periodically updated, and new sets for additional microorganisms are added, as soon as new data become available through the COMPARE project. This site further provides up-to-date background on each of the organisms and their nomenclature.

These reference sets are used by the following COMPARE NGS analysis tools/pipelines.

More information by microorganism:

These sequences can be searched and retrieved via the following URLs as tagged records in the European Nucleotide Archive (ENA).

To retrieve the complete COMPARE Reference Genomes dataset in the browser, please go to the following URL:


and select 'COMPARE-RefGenome' as the XREF source. Please use the 'expanded' option to see the complete taxonomic information.

Programmatic retrieval of the complete COMPARE Reference Genomes dataset can be done via the following URL:


ENA sequence or sample accessions for a single sample/isolate in the dataset can be returned using the following URL:


where source_accession is the isolate/sample name as shown in the table below, for example:


The ENA record, shown in the 'Target primary accession' column of the result from the above URL, can be retrieved with the following URL:


where Target_primary_accession has been inserted from the response to the previous URL (e.g. http://www.ebi.ac.uk/ena/data/view/GQ856465).

More extensive functions are described for RESTservices relating to the COMPARE Reference Genomes. Users should note that records in the dataset are served from ENA and are denoted as belonging to the dataset through ENA http://www.ebi.ac.uk/ena/about/cross-references annotations.



Eva Møller Nielsen
Senior Scientist
Statens Serum Institut


Tine Hald
Senior Researcher
Technical University of Denmark


Michel-Yves Mistou
French Agency for Food, Environmental and Occupational Health and Safety