README
|
Terms and License |
The DAVID Knowledgebase Site: |
|
DAVID_knowledgebase
|
Applications:
|
- For given genes, to access the corresponding heterogeneous
functional annotations, which cover over
50 categories from dozens of public databases, in a
high-throughput manner.
- For given gene identifiers, to translate to other types of
gene identifiers representing the same gene entries in a
high-throughput manner.
- For given annotation terms, to access the corresponding
genes in a high-throughput manner.
|
Some Important Points of the DAVID Knowledgebase: |
- DAVID
Knowledgebase does not create and own any of the annotation
contents. Thus, the annotation contents in DAVID Knowledgebase is free
to all users.DAVID Team is not responsible for the accuracy
of the annotation contents which come from original
resources.
- The DAVID Knowledgebase is an integrated database by
collecting the
heterogeneuos annotations from those public data sources, and
thereafter integrating them into
one centralized space. DAVID
Knowledgebase is only responsible for the integration problems, such as
certain annotation-gene assignment not consistent with original data
sources.
- DAVID Gene IDs are created with an
unique single-linkage
procedure. DAVID Gene ID is non-redundant gene cluster ID which
holds
many different types of gene identifiers for one single gene entry.
- DAVID Gene IDs are used as the unique index IDs to
link ALL
types of gene identifiers and corresponding annotations throughout
DAVID Knowledgebase. Thus, DAVID Gene ID, owned by DAVID Team and
subjected license requirement (pending, not available yet) to
for-profit uses, plays the
central role in the integration.
- All data including gene identifiers and annotation contents
are
stored in a
sturcture as simple pair-wide flat files. All the files
are cross linked with the DAVID Gene IDs. The file names are created
based on the original data sources, such as david2entrez_gene.txt or
david2goterm_mf_levle1.txt.
- Each files contain all available contents for all available
species regarding the particular annotation categories.
- All text files are compressed to zipped files. Users need
compressing programs, such as winzip, to unzip the files before using
them. Files are operating system independant, i.e. the
unzipped files can be read in DOS, Windows or Unix/Linux environments
with any text editors, such as: MS word; Notepad, EditPlus, more, vi,
etc. Some file may be very large.
|
File Organization and Structures
for Downloads*
|
|
Main Category Folder
|
Database Files
|
Special Comments
|
Disease
|
DAVID2GENETIC_ASSOCIATION_DB.txt
DAVID2OMIM_PHENOTYPE.txt
|
|
Functional_Categories |
DAVID2COG_KOG_ONTOLOGY.txt
DAVID2PIR_SEQ_FEATURE.txt
DAVID2SP_COMMENT_TYPE.txt
DAVID2SP_PIR_KEYWORDS.txt
DAVID2UP_SEQ_FEATURE.txt
|
|
Gene_Tissue_Expression |
DAVID2CGAP_EST.txt
DAVID2CGAP_SAGE.txt
DAVID2GNP_MICROARRAY_GCRMA.txt
DAVID2GNP_MICROARRAY_MAS5.txt
DAVID2UNIGENE_EST_PROFILE.txt
|
The gene-tissue pair means
that the gene highly expressed in that tissue.
|
General_Annotations |
DAVID2ALIAS_GENE_SYMBOL.txt
DAVID2CHROMOSOME.txt
DAVID2CYTOBAND.txt
DAVID2GENE_NAME.txt
DAVID2GENE_SYMBOL.txt
DAVID2HOMOLOGOUS_GENE.txt
......
|
|
Literature |
DAVID2GENERIF_SUMMARY.txt
DAVID2HIV_INTERACTION_PUBMED_ID.txt
DAVID2PUBMED_ID.txt |
|
Main_Accessions |
DAVID2AFFY_ID.txt
DAVID2ENTREZ_GENE_ID.txt
DAVID2GENPEPT_ACCESSION.txt
DAVID2PIR_ACCESSION.txt
DAVID2PIR_ID.txt
DAVID2PIR_NREF_ID.txt
DAVID2REFSEQ_GENOMIC.txt
DAVID2REFSEQ_MRNA.txt
DAVID2REFSEQ_PROTEIN.txt
DAVID2REFSEQ_RNA.txt
DAVID2UNIGENE.txt
DAVID2UNIPROT_ACCESSION.txt
DAVID2UNIPROT_ID.txt
DAVID2UNIREF100_ID.txt
|
These files are the key
files to be used to map users' ID to DAVID IDs, or to other types of
public gene IDs.
|
Ontologies |
DAVID2GOTERM_BP_1.txt
DAVID2GOTERM_BP_2.txt
DAVID2GOTERM_BP_3.txt
DAVID2GOTERM_BP_4.txt
DAVID2GOTERM_BP_5.txt
DAVID2GOTERM_BP_ALL.txt
DAVID2GOTERM_CC_1.txt
DAVID2GOTERM_CC_2.txt
DAVID2GOTERM_CC_3.txt
DAVID2GOTERM_CC_4.txt
DAVID2GOTERM_CC_5.txt
DAVID2GOTERM_CC_ALL.txt
DAVID2GOTERM_MF_1.txt
DAVID2GOTERM_MF_2.txt
DAVID2GOTERM_MF_3.txt
DAVID2GOTERM_MF_4.txt
DAVID2GOTERM_MF_5.txt
DAVID2GOTERM_MF_ALL.txt
DAVID2PANTHER_TERM_BP.txt
DAVID2PANTHER_TERM_MF.txt
|
"xxxx-ALL" contains all
the levels of GO terms. Therefore, "xxx-1,2,3,4,5" files are
subsets of the "xxx-ALL" files.
|
Other_Accessions |
DAVID2DICTYBASE_ID.txt
DAVID2ECOGENE_ID.txt
DAVID2FLYBASE_ID.txt
DAVID2GENEDB_SPOMBE_ID.txt
DAVID2GLYCOSUITEDB_ID.txt
DAVID2HAMAP_ID.txt
..........
|
|
Pathways |
DAVID2BBID.txt
DAVID2BIOCARTA.txt
DAVID2EC_NUMBER.txt
DAVID2KEGG_COMPOUND.txt
DAVID2KEGG_PATHWAY.txt
DAVID2KEGG_REACTION.txt
DAVID2PANTHER_PATHWAY.txt
|
|
Protein_Domains |
DAVID2BLOCKS_ID.txt
DAVID2COG_KOG_NAME.txt
DAVID2INTERPRO_NAME.txt
DAVID2PANTHER_FAMILY.txt
DAVID2PANTHER_SUBFAMILY.txt
DAVID2PDB_ID.txt
DAVID2PFAM_NAME.txt
DAVID2PIR_ALN.txt
DAVID2PIR_HOMOLOGY_DOMAIN.txt
DAVID2PIR_SUPERFAMILY_NAME.txt
DAVID2PRINTS_NAME.txt
DAVID2PRODOM_NAME.txt
DAVID2PROSITE_NAME.txt
DAVID2SCOP_ID.txt
DAVID2SMART_NAME.txt
DAVID2TIGRFAMS_NAME.txt
|
|
Protein_Interactions |
DAVID2BIND.txt
DAVID2DIP.txt
DAVID2HIV_INTERACTION.txt
DAVID2HIV_INTERACTION_CATEGORY.txt
DAVID2HPRD_INTERACTION.txt
DAVID2MINT.txt
DAVID2NCICB_CAPATHWAY.txt
DAVID2REACTOME_INTERACTION.txt
DAVID2TRANSFAC_ID.txt
|
|
Species
|
DAVID2TAX.txt
|
Gene species information.
|
Gene_Names_Symbols
|
DAVID2GENE_NAME.txt
DAVID2GENE_SYMBOL.txt |
Map DAVID ids to gene
names or symbols.
|
*Note:
- Each database file represents an particular annotation
source. From
the naming convention, users should understand the original sources.
For example, DAVID2BIND.txt mean BIND interaction database in DAVID.
- The database files are organized into 11 bigger categories
(consistent with the interface organization on DAVID Functional
Annotation Tool) to facilitate the quick access to the area of users'
interests.
- The gene-annotation pair in each file mean the parcitular
gene associates with the according annotation term.
- You probably do not need to download all files. For
example, you have 1000 interesting Affy IDs, you want to study the
KEGG pathways. For this purpose, you only need download
three files: david2affy_id.txt, david2KEGG_Pathway.txt and
david2gene_name.txt.
- DAVID data files are species independant. Thus, each data
files in DAVID
Knowledgebase contain all available contents for all available species.
If ones are only interested in certain species, they can parse files
that
you need in your studies according to david2taxid.txt where contains
species information. Or you can directly use the files as it is and
ignore the extra information for other species in the files
- DAVID Web site provides query interface. If users only need
a small set of data, i.e. some annotations for 10 genes, all above
information can be queried through the DAVID Functional Annotation Table that is part of DAVID
Functional Annotation Tool
|
Example 1: Cross Mapping Gene IDs
|
|
Task:
I have 35439_at,679_at
, .... 1000 Affy IDs. I would like to know the corresponding NCBI
Entrez IDs, Uniprot Accessions, Gene Name and Gene Symbols.
Solution:
Step 1:
- Map Affy ID 35439_at to the corresponding
DAVID ID with file of Main_Accessions/DAVID2AFFY_ID.txt. We can
get pair of DAVID ID <- Affy ID as 2875235 <-35439_at
Step 2:
- Map DAVID ID 2875235 to corresponding Entrez ID with file
of Main_Accessions/DAVID2ENTREZ_GENE_ID.txt. We can get
pair of DAVID ID to Entrez ID as 2875235 -> 7536.
- Map DAVID ID 2875235 to corresponding Uniprot Accesion with
file of Main_Accessions/
DAVID2UNIPROT_ACCESSION.txt. We can get pair of DAVID ID to
Uniprot Accession as 2875235 ->Q9UEI0.
- Map DAVID ID 2875235 to corresponding Gene Name wit file
of Gene_Names_Symbols/
DAVID2GENE_NAME.txt. We can get pair of DAVID ID to Gene Name
as 2875235 -> transcription factor ZFM1.
- Map DAVID ID 2875235 to corresponding Gene Name wit file
of Gene_Names_Symbols/
DAVID2GENE_NAME.txt. We can get pair of DAVID ID to Gene
Symbol as 2875235 -> SF1.
- By now, with DAVID Knowledgebase, 35439_at is cross
referenced to Entrez Gene 7536, UniProt Accession Q9UEI0, Gene
Name "transcription factor ZFM1", and Gene Symbol "SF1".
Step 3:
- Repeat Step 1 & Step 2 for rest of Affy IDs.
|
Example 2: Query annotation contents for a given gene
|
|
Task:
I have Affy ID 35439_at,
what are the associated terms of Gene Ontology(GO)/Biological
Process(BP)/All level?
Solution:
Step 1: Map Affy ID 35439_at to
the corresponding DAVID ID with file of
Main_Accessions/DAVID2AFFY_ID.txt. We can get pair of
DAVID ID <- Affy ID as 2875235 <-35439_at
Step 2: Map DAVID ID 2875235 to corresponding Gene Ontology with
file of Ontologies/ DAVID2GOTERM_BP_ALL.txt. We can get pair
of DAVID ID to GOTERM_BP_ALL as 2875235 ->
"TRANSCRIPTION, DNA-DEPENDENT"
|
Edited
by DAVID Team on Feb.
2022
|
DAVID
|