The Argonaute (Ago) protein family was initially discovered in Arabidopsis thaliana, and many archaeal and bacterial organisms also found the orthologous proteins. In eukaryotic organisms, argonaute plays a vital role in eukaryotic RNA interference (RNAi) Pathways. eAgos load with the short 5′-phosphorylated and 3’-hydroxyl RNAs guides that is a chain of siRNA or miRNA processed by Dicer and subsequently respectively cleave the complementary target mRNA or control translational repression[1-3]. In general, all eAgos contain six domains, respectively N (N-terminal) domain, PAZ (PIWI-Argonaute-Zwille) domain, MID (middle) domain, PIWI (P element–induced wimpy testis) domain interconnected by two linker domains, L1 and L2. It is worth mentioning that the PIWI domain, which is most conserved domain in Ago proteins, includes the cleavage activity of RNase H fold[5, 6]. The domain firstly processes combined double-stranded siRNAs or miRNAs by cutting the passenger strand and then processes guide-target RNA duplexes by cleaving the target strand[7-9].
Many prokaryotic organisms also encompass Ago proteins[3, 10-12]. pAgos are divided into two classes, long pAgos and short pAgos. Long pAgos possess all domains as eAgos, however short pAgos contain only the MID and PIWI domains. Like eAgos, pAgos load 5′-phosphorylated oligonucleotides guides, especially DNA guides because some pAgos have higher affinity for DNA guides than for RNA guides[11, 13-15], to cleave target nucleotides. In procaryotic organism, pAgos have been came up with function in defense against foreign invasive genomes. Actually, lately there are reports that pAgos bound guide RNAs or DNAs act on foreign DNA in vivo[12, 16].
A recent explosive report that Natronobacterium gregoryi Argonaute (NgAgo) with guide DNAs (gDNAs) can more efficiently edit genome than Cas9 in human cells at 37 °C catched people’s eyes, which aroused researchers’ great interest in looking for more efficient genome editing tools than Cas9. Nevertheless, with the rapidly growing of the newly sequenced archaeal and bacterial genomes, the speedy, effective and accurate recognition of new Ago proteins is encountering great obstacles. And it is necessary to analyze the function of these proteins that are possible Ago proteins with the cleavage activity for discovering new gene editing tools. Normally, the site mutation of experiments is the most precise method to study proteins, but a major challenge in unknown protein structures was to select mutation sites, which takes amount of materials and a lot of time in considering all conditions. Thus, an accurate identification of protein domains can guide mutation design.
At present, domain models or profiles and patterns of proteins were collected widely in a variety of databases such as PROSITE, Pfam, BLOCKS, SMART, PRINTS, CDDand PRODOM. Furthermore, some tools provided by these databases can computationally annotate the corresponding domains of query proteins. Pfam, SMART and CD-search, to an extent, are representative tools of protein identification and domain annotation. Pfam utilizes two multiple sequence alignments and profile hidden Markov models (HMMs) as protein domain family related files to annotate domains of submitted sequences. One of the two alignments is the seed alignment that embraces a set of representative quality homologous sequence, which built HMMs through HMMer2 package, and another is the full alignment encompassing all detectable related members. SMART is a web service integrating several tools. The part of domain prediction first finds candidate homologue sequences of query proteins with the help of BLAST, Search, and MACAW, and then establishes profiles, alignments and HMMs to match domains. In addition, it provides the domain prediction interface of Pfam. Conserved Domain Search(CD-search) is a web service based on Conserved Domain Database(CDD) that stores collected conserved domain HMM models. It combines the position-specific scoring matrices(PSSMs) that saves domain-models alignments with RPS-BLAST algorithm which makes use of query sequences against these associated PSSMs. However, these web services are object oriented domain analysis of all proteins, which needs to take into account most suitable coverage of domains, specificity, sensitivity and annotation quality according to domains of different type proteins. They are not efficient for annotating domains of a specific type protein.
As far as we know, MUSCLE is a global multiple sequence alignment software by log-expectation. The speed and accuracy of MUSCLE are better than ClustalW2 or T-Coffee and MAFFT on the basis of the test aligning 5000 sequences possessed average length 350. Although MUSCLE possesses many features, its results are in ALN format that can’t get the domain boundaries of specific proteins. It is convenient and efficient that the domains of proteins were known roundly. To increase the efficiency of Agos researches, web service tools that can delimit domain boundaries of Agos rapidly and visually are urgently needed. Nevertheless, as is known to all, no specialized web tool for automatic identification and annotation domains of Ago proteins is available.
Here, we developed a web service AgoNotes, which initially groups submitted protein sequences into three parts, respectively pAgos, eAgos and not Agos by local protein similarity search program BLASTP. Subsequently the groups, pAgos and eAgos, were aligned respectively against prepared pAgo sequences and eAgo sequences. Finally, the detailed results were displayed vividly in pages in a series of processed treatments. AgoNotes can efficiently identify and annotate domains of Ago proteins.
1. Bohmert, K.; Camus, I.; Bellini, C.; Bouchez, D.; Caboche, M.; Benning, C., AGO1 defines a novel locus of Arabidopsis controlling leaf development. The EMBO journal 1998, 17, (1), 170-80.
2. Makarova, K. S.; Wolf, Y. I.; van der Oost, J.; Koonin, E. V., Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biology direct 2009, 4, 29.
3. Sheng, G.; Zhao, H.; Wang, J.; Rao, Y.; Tian, W.; Swarts, D. C.; van der Oost, J.; Patel, D. J.; Wang, Y., Structure-based cleavage mechanism of Thermus thermophilus Argonaute DNA guide strand-mediated DNA target cleavage. Proceedings of the National Academy of Sciences of the United States of America 2014, 111, (2), 652-7.
4. Kaya, E.; Doxzen, K. W.; Knoll, K. R.; Wilson, R. C.; Strutt, S. C.; Kranzusch, P. J.; Doudna, J. A., A bacterial Argonaute with noncanonical guide RNA specificity. Proceedings of the National Academy of Sciences of the United States of America 2016, 113, (15), 4057-62.
5. Song, J. J.; Smith, S. K.; Hannon, G. J.; Joshua-Tor, L., Crystal structure of Argonaute and its implications for RISC slicer activity. Science 2004, 305, (5689), 1434-7.
6. Parker, J. S.; Roe, S. M.; Barford, D., Crystal structure of a PIWI protein suggests mechanisms for siRNA recognition and slicer activity. The EMBO journal 2004, 23, (24), 4727-37.
7. Lingel, A.; Sattler, M., Novel modes of protein-RNA recognition in the RNAi pathway. Current opinion in structural biology 2005, 15, (1), 107-15.
8. Jinek, M.; Doudna, J. A., A three-dimensional view of the molecular machinery of RNA interference. Nature 2009, 457, (7228), 405-12.
9. Parker, J. S., How to slice: snapshots of Argonaute in action. Silence 2010, 1, (1), 3.
10. Vogel, J., Biochemistry. A bacterial seek-and-destroy system for foreign DNA. Science 2014, 344, (6187), 972-3.
11. Yuan, Y. R.; Pei, Y.; Ma, J. B.; Kuryavyi, V.; Zhadina, M.; Meister, G.; Chen, H. Y.; Dauter, Z.; Tuschl, T.; Patel, D. J., Crystal structure of A. aeolicus argonaute, a site-specific DNA-guided endoribonuclease, provides insights into RISC-mediated mRNA cleavage. Molecular cell 2005, 19, (3), 405-19.
12. Olovnikov, I.; Chan, K.; Sachidanandam, R.; Newman, D. K.; Aravin, A. A., Bacterial argonaute samples the transcriptome to identify foreign DNA. Molecular cell 2013, 51, (5), 594-605.
13. Swarts, D. C.; Makarova, K.; Wang, Y.; Nakanishi, K.; Ketting, R. F.; Koonin, E. V.; Patel, D. J.; van der Oost, J., The evolutionary journey of Argonaute proteins. Nature structural & molecular biology 2014, 21, (9), 743-53.
14. Ma, J. B.; Yuan, Y. R.; Meister, G.; Pei, Y.; Tuschl, T.; Patel, D. J., Structural basis for 5'-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature 2005, 434, (7033), 666-70.
15. Wang, Y.; Juranek, S.; Li, H.; Sheng, G.; Tuschl, T.; Patel, D. J., Structure of an argonaute silencing complex with a seed-containing guide DNA and target RNA duplex. Nature 2008, 456, (7224), 921-6.
16. Swarts, D. C.; Jore, M. M.; Westra, E. R.; Zhu, Y.; Janssen, J. H.; Snijders, A. P.; Wang, Y.; Patel, D. J.; Berenguer, J.; Brouns, S. J.; van der Oost, J., DNA-guided DNA interference by a prokaryotic Argonaute. Nature 2014, 507, (7491), 258-61.
17. Gao, F.; Shen, X. Z.; Jiang, F.; Wu, Y.; Han, C., DNA-guided genome editing using the Natronobacterium gregoryi Argonaute. Nature biotechnology 2016, 34, (7), 768-73.
18. Chai, G.; Yu, M.; Jiang, L.; Duan, Y.; Huang, J., HMMCAS: a web tool for the identification and domain annotations of Cas proteins. IEEE/ACM transactions on computational biology and bioinformatics 2017.
19. Finn, R. D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; Sonnhammer, E. L.; Tate, J.; Punta, M., Pfam: the protein families database. Nucleic acids research 2014, 42, (Database issue), D222-30.
20. Bairoch, A.; Bucher, P.; Hofmann, K., The PROSITE database, its status in 1997. Nucleic acids research 1997, 25, (1), 217-21.
21. Henikoff, J. G.; Pietrokovski, S.; Henikoff, S., Recent enhancements to the Blocks Database servers. Nucleic acids research 1997, 25, (1), 222-5.
22. Schultz, J.; Milpetz, F.; Bork, P.; Ponting, C. P., SMART, a simple modular architecture research tool: identification of signaling domains. Proceedings of the National Academy of Sciences of the United States of America 1998, 95, (11), 5857-64.
23. Attwood, T. K.; Beck, M. E.; Bleasby, A. J.; Degtyarenko, K.; Michie, A. D.; Parry-Smith, D. J., Novel developments with the PRINTS protein fingerprint database. Nucleic acids research 1997, 25, (1), 212-7.
24. Marchler-Bauer, A.; Bo, Y.; Han, L.; He, J.; Lanczycki, C. J.; Lu, S.; Chitsaz, F.; Derbyshire, M. K.; Geer, R. C.; Gonzales, N. R.; Gwadz, M.; Hurwitz, D. I.; Lu, F.; Marchler, G. H.; Song, J. S.; Thanki, N.; Wang, Z.; Yamashita, R. A.; Zhang, D.; Zheng, C.; Geer, L. Y.; Bryant, S. H., CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic acids research 2017, 45, (D1), D200-D203.
25. Servant, F.; Bru, C.; Carrere, S.; Courcelle, E.; Gouzy, J.; Peyruc, D.; Kahn, D., ProDom: automated clustering of homologous domains. Briefings in bioinformatics 2002, 3, (3), 246-51.
26. Mount, D. W., Using the Basic Local Alignment Search Tool (BLAST). CSH protocols 2007, 2007, pdb top17.
27. Pearson, W. R., Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 1991, 11, (3), 635-50.
28. Schuler, G. D.; Altschul, S. F.; Lipman, D. J., A workbench for multiple alignment construction and analysis. Proteins 1991, 9, (3), 180-90.
29. Marchler-Bauer, A.; Bryant, S. H., CD-Search: protein domain annotations on the fly. Nucleic acids research 2004, 32, (Web Server issue), W327-31.
30. Edgar, R. C., MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 2004, 32, (5), 1792-7.