Skip to main content

Table 4 Experimental feature definitions

From: Functional constraint and small insertions and deletions in the ENCODE regions of the human genome

Feature

Term

Definition

RNA transcription (coding and noncoding)

CDS

Coding sequence: well characterized transcribed regions with an annotated protein-coding open reading frame (ORF)

 

RACEfrags

5' and 3' rapid Amplification of cDNA ends (RACE), using polyA or total RNA to construct full-length cDNA. This technique has revealed previously unrecognized UTRs

 

TARs/transfrags

Transcriptionally active regions/transcribed fragments as determined by analyses of cellular RNA (polyA or total) hybridizations to multiple microarray platforms. For the analyses reported here, portions of TARs/transfrags overlapping any CDS, 5' or 3' UTR annotations were removed from the dataset

 

Pseudo-exons

A pre-mRNA sequence that resembles an exon but is not recognized as such by the splicing machinery

 

TSS

Transcription start site

 

5' UTR

Untranslated region: portions of CDS-containing transcripts before the start codon. For the analyses reported here, 5' UTRs overlapping alternatively transcribed CDS annotations were removed from the dataset

 

TUF

Transcripts of unknown function for noncoding transcripts

 

3' UTR

Untranslated region: portions of CDS-containing transcripts after the stop codon

Transcript regulation: open chromatin/DNA-protein interaction

DHS

DNAse I hypersensitive sites are short regions of DNA that are relatively easily cleaved by deoxyribonuclease. Regions of open chromatin detected by quantitative chromatin profiling and novel microarray-based methods. For the analyses reported here, regions that overlap repetitive sequence were removed. Measures of DHS are reported using two sources: the ENCODE Regulome group and the NHGRI

 

FAIRE-sites

Formaldehyde assisted isolation of regulatory elements: a procedure used to isolate chromatin that is resistant to the formation of protein-DNA crosslinks. Data suggest that depletion of nucleosomes (the most basic organizational unit of chromatin) at active regulatory regions, such as promotors, is the primary underlying basis for FAIRE [38]

 

HisPolTAF

Histone modifications, RNA polymerase II (PolII), and transcription regulator TAF250

 

Sequence specific factors

Regions of DNA determined to be bound by sequence-specific transcription factors through chromatin immunoprecipitation followed by microarray chip hybridization (so-called 'ChIP-Chip') analyses

 

Sequence specific (all motifs)

Computationally identified short sequence motifs found to be over-represented in the sequence specific factors dataset

Ancestral repeats

 

Mobile elements with well defined consensus sequences that inserted into the ancestral genome prior to mammalian radiation. These sequences are considered to be predominantly non-functional and are often used as models of neutrally evolving DNA

Cell cycle

EarlyRepSeg

Early replicating segments

 

MidRepSeg

Mid replicating segments

 

LateRepSeg

Late replicating segments

Evolutionary constraint

MCS strict

Multi-species conserved sequences: strict criteria

 

MCS moderate

Multi-species conserved sequences: modest criteria

 

MCS loose

Multi-species conserved sequences: loose criteria