Title: | Species Identification using DNA Barcodes |
---|---|
Description: | To perform species identification using DNA barcodes. |
Authors: | Ai-bing ZHANG [aut, cre], Meng-di HAO [aut], Cai-qing YANG [aut], Zhi-yong SHI [aut] |
Maintainer: | Ai-bing ZHANG <[email protected]> |
License: | GPL-2 |
Version: | 1.0-3 |
Built: | 2025-02-26 03:14:56 UTC |
Source: | https://github.com/cran/BarcodingR |
Evaluate two barcodes using species identification success rate criteria.
barcodes.eval(barcode1, barcode2, kmer1 = kmer1, kmer2 = kmer2)
barcodes.eval(barcode1, barcode2, kmer1 = kmer1, kmer2 = kmer2)
barcode1 |
object of class "DNAbin" based on barcode1, which contains taxon information. |
barcode2 |
object of class "DNAbin" based on barcode2, which contains taxon information. |
kmer1 |
a numeric to indicate the length of kmer1 for barcode1, the optimal kmer could be found by the function optimize.kmer() before running this function. |
kmer2 |
a numeric to indicate the length of kmer2 for barcode2, see above. |
a list containing p_value of prop.test(), and so on.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA.
zhangab2008 (at) mail. cnu. edu. cn.
prop.test()
data(TibetanMoth) barcode1<-as.DNAbin(as.character(TibetanMoth[1:30,])) barcode2<-barcode1 b.eval<-barcodes.eval(barcode1,barcode2,kmer1=1,kmer2=3) b.eval
data(TibetanMoth) barcode1<-as.DNAbin(as.character(TibetanMoth[1:30,])) barcode2<-barcode1 b.eval<-barcodes.eval(barcode1,barcode2,kmer1=1,kmer2=3) b.eval
Calculation of DNA barcoding gap. Besides K2P distance, raw distance and euclidean could also be used for calculation DNA barcoding gap.
barcoding.gap(ref, dist = dist)
barcoding.gap(ref, dist = dist)
ref |
object of class "DNAbin" used as a reference dataset, which contains taxon information. |
dist |
a character string which takes one of ("raw","K80","euclidean"). |
a list indicates the summary statistics of interspecific and intraspecific genetic distance, such as k2P distance.
the current version of the function can only be used for protein-coding barcodes, such as, COI. The futuren version may incorporate calculation for non-coding barcodes,for instance, ITS1, ITS2.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA, contact at zhangab2008(at)mail.cnu.edu.cn
Meyer, Christopher P., and Gustav Paulay. (2005). ”DNA barcoding: error rates based on comprehensive sampling.”.PLoS biology 3.12: e422.
F.Jiang, Q. Jin, L. Liang, A.B. Zhang,and Z.H. Li.(2014). Existence of Species Complex Largely Reduced Barcoding Success for Invasive Species of Tephritidae: A Case Study in Bactrocera spp. Mol Ecol Resour. 14(6):1114-1128 DOI: 10.1111/1755-0998.12259.
data(TibetanMoth) TibetanMoth<-as.DNAbin(as.character(TibetanMoth[1:20,])) b.gap<-barcoding.gap(ref=TibetanMoth,dist="K80") b.gap
data(TibetanMoth) TibetanMoth<-as.DNAbin(as.character(TibetanMoth[1:20,])) b.gap<-barcoding.gap(ref=TibetanMoth,dist="K80") b.gap
Species identification using protein-coding barcodes with different methods,including BP-based method (Zhang et al. 2008), fuzzy-set based method (Zhang et al. 2012), Bayesian-based method (Jin et al. 2013).
barcoding.spe.identify(ref, que, method = "bpNewTraining")
barcoding.spe.identify(ref, que, method = "bpNewTraining")
ref |
object of class "DNAbin" used as a reference dataset, which contains taxon information. |
que |
object of class "DNAbin", whose identities (species names) need to be inferred. |
method |
a character string indicating which method will be used to train model and/or infer species membership. One of these methods ("fuzzyId", "bpNewTraining", "bpNewTrainingOnly", "bpUseTrained","Bayesian") should be specified. |
a list containing model parameters used, species identification success rates using references, query sequences, species inferred, and corresponding confidence levels (bp probability for BP-based method / FMF values for fuzzy set theory based method / posterior probability for Bayesian method) when available.
functions fasta2DNAbin() from package:adegenet and read.dna() from package:ape were used to obtain DNAbin object in our package. The former is used to read large aligned coding DNA barcodes, the latter unaligned ones. ref and que should be aligned with identical sequence length. We provided a pipeline to perform fast sequences alignment for reference and query sequences. Windows users could contact zhangab2008(at)mail.cnu.edu.cn for an exec version of the package. For very large DNA dataset, read.fas() package:phyloch is strongly suggested instead of fasta2DNAbin() since the latter is very slow.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA. zhangab2008(at)mail.cnu.edu.cn
Zhang, A. B., M. D. Hao, C. Q. Yang, and Z. Y. Shi. (2017). BarcodingR: an integrated R package for species identification using DNA barcodes. Methods Ecol Evol. 8(5):627-634. https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12682.
Jin,Q., H.L. Han, X.M. Hu, X.H. Li,C.D. Zhu,S. Y. W. Ho, R. D. Ward, A.B. Zhang . (2013). Quantifying Species Diversity with a DNA Barcoding-Based Method: Tibetan Moth Species (Noctuidae) on the Qinghai-Tibetan Plateau. PloS One 8: e644. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0064428.
Zhang, A. B., C. Muster, H.B. Liang, C.D. Zhu, R. Crozier, P. Wan, J. Feng, R. D. Ward.(2012). A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding. Molecular Ecology, 21(8):1848-63. https://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2011.05235.x
Zhang, A. B., D. S. Sikes, C. Muster, S. Q. Li. (2008). Inferring Species Membership using DNA sequences with Back-propagation Neural Networks. Systematic Biology, 57(2):202-215. https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12682
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:5,])) que<-as.DNAbin(as.character(TibetanMoth[50:55,])) bsi<-barcoding.spe.identify(ref, que, method = "fuzzyId") bsi bsi<-barcoding.spe.identify(ref, que, method = "bpNewTraining") bsi bsi<-barcoding.spe.identify(ref, que, method = "Bayesian") bsi
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:5,])) que<-as.DNAbin(as.character(TibetanMoth[50:55,])) bsi<-barcoding.spe.identify(ref, que, method = "fuzzyId") bsi bsi<-barcoding.spe.identify(ref, que, method = "bpNewTraining") bsi bsi<-barcoding.spe.identify(ref, que, method = "Bayesian") bsi
Species identification based on fuzzy-set method (Zhang et al. 2012)and kmer.
barcoding.spe.identify2(ref, que, kmer = kmer, optimization = TRUE)
barcoding.spe.identify2(ref, que, kmer = kmer, optimization = TRUE)
ref |
object of class "DNAbin" used as a reference dataset, which contains taxon information. |
que |
object of class "DNAbin", whose identities (species names) need to be inferred. |
kmer |
a numeric to indicate the length of maximum kmer to try in the range of 1 to kmer in case of optimization = TRUE, otherwise, only a certain length of kmer is used. |
optimization |
a character string, indicating whether different length of kmer (up to kmer) will be used or just a specified length of kmer will be used. |
a list indicating the identified species.
read.dna() from package ape was used to obtain DNAbin object for unaligned non-coding barcodes.
Ai-bing ZHANG, Cai-qing YANG, Meng-di HAO, CNU, Beijing, CHINA, contact at zhangab2008 (at) mail. cnu. edu. cn.
Zhang, A. B., M. D. Hao, C. Q. Yang, and Z. Y. Shi. (2017). BarcodingR: an integrated R package for species identification using DNA barcodes. Methods Ecol Evol. 8(5):627-634. https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12682.
Jin,Q., H.L. Han, X.M. Hu, X.H. Li,C.D. Zhu,S. Y. W. Ho, R. D. Ward, A.B. Zhang . (2013). Quantifying Species Diversity with a DNA Barcoding-Based Method: Tibetan Moth Species (Noctuidae) on the Qinghai-Tibetan Plateau. PloS One 8: e644. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0064428.
Zhang, A. B., C. Muster, H.B. Liang, C.D. Zhu, R. Crozier, P. Wan, J. Feng, R. D. Ward.(2012). A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding. Molecular Ecology, 21(8):1848-63. https://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2011.05235.x
Zhang, A. B., D. S. Sikes, C. Muster, S. Q. Li. (2008). Inferring Species Membership using DNA sequences with Back-propagation Neural Networks. Systematic Biology, 57(2):202-215. https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12682
data(pineMothITS2) ref<-pineMothITS2 que<-ref spe.id<-barcoding.spe.identify2(ref,que, kmer = 1, optimization = FALSE) spe.id
data(pineMothITS2) ref<-pineMothITS2 que<-ref spe.id<-barcoding.spe.identify2(ref,que, kmer = 1, optimization = FALSE) spe.id
Species identification using BP-based method for both protein-coding barcodes, for instance, COI, and non-coding barcodes, such as, ITS, using kmer statistics.
bbsik(ref, que, kmer = kmer, UseBuiltModel = FALSE, lr = 5e-05, maxit = 1e+06)
bbsik(ref, que, kmer = kmer, UseBuiltModel = FALSE, lr = 5e-05, maxit = 1e+06)
ref |
object of class "DNAbin" used as a reference dataset, which contains taxon information. |
que |
object of class "DNAbin", which needs to be inferred. |
kmer |
a numeric indicating the length of kmer used. |
UseBuiltModel |
logic value to indicate whether a built model is used or not. |
lr |
parameter for weight decay. Default 5e-5. |
maxit |
maximum number of iterations. Default 1e+6. |
a list containing model parameters used, species identification success rates using references, query sequences, species inferred, and corresponding confidence levels (bp probability for BP-based method).
Ai-bing ZHANG, Meng-di HAO, Cai-qing YANG, CNU, Beijing, CHINA. zhangab2008 (at) mail. cnu. edu.cn
Zhang, A. B., D. S. Sikes, C. Muster, S. Q. Li. (2008). Inferring Species Membership using DNA sequences with Back-propagation Neural Networks. Systematic Biology, 57(2):202-215. https://academic.oup.com/sysbio/article/57/2/202/1622290
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) que<-as.DNAbin(as.character(TibetanMoth[51:60,])) out<-bbsik(ref, que, kmer = 1, UseBuiltModel = FALSE) out out$convergence out$success.rates.ref data(pineMothITS2) ref<-pineMothITS2 que<-pineMothITS2 out<-bbsik(ref, que, kmer = 1, UseBuiltModel = FALSE) out out$convergence out$success.rates.ref
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) que<-as.DNAbin(as.character(TibetanMoth[51:60,])) out<-bbsik(ref, que, kmer = 1, UseBuiltModel = FALSE) out out$convergence out$success.rates.ref data(pineMothITS2) ref<-pineMothITS2 que<-pineMothITS2 out<-bbsik(ref, que, kmer = 1, UseBuiltModel = FALSE) out out$convergence out$success.rates.ref
Conversion from a character vector to an integer vector.
char2NumVector(c)
char2NumVector(c)
c |
character vector. |
an integer vector.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA. zhangab2008 (at) mail. cnu. edu.cn.
zhangab2008 (at) mail. cnu. edu. cn.
c<-c("a","a","b") num<-char2NumVector(c) num
c<-c("a","a","b") num<-char2NumVector(c) num
Comparision between two delimitations of a group of samples, for instance, traditionally morphological delimitation and molecular delimitation (MOTU).
compare2delimitations(deli1, deli2)
compare2delimitations(deli1, deli2)
deli1 |
a character array (vector),containing a set of, for example, morphological identification (species names), to compare with |
deli2 |
a character array (vector),containing a set of, molecular delimitation (MOTU). |
a list containing the adjusted Rand index comparing the two partitions (a scalar). This index has zero expected value in the case of random partition, and it is bounded above by 1 in the case of perfect agreement between two partitions; the numbers of matches, splits,merges, and corresponding percentage.
This is for the same set of samples with two partitions/delimitations.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA.
L. Hubert and P. Arabie (1985) Comparing Partitions, Journal of the Classification 2:193-218.
deli1<-c(1,1,1,1,1,1) deli2<-c(1,1,2,1,1,3) out<-compare2delimitations(deli1,deli2) out
deli1<-c(1,1,1,1,1,1) deli2<-c(1,1,2,1,1,3) out<-compare2delimitations(deli1,deli2) out
Make consensus for identifications from two or more methods, usually for a set of query sequences.
consensus.identify(identifiedBy2orMore)
consensus.identify(identifiedBy2orMore)
identifiedBy2orMore |
an object of class "data.frame", containing (queIDs, as rownames), identifiedByMethod1,identifiedByMethod2,and so on. |
a data frame with consensus.identification, and corresponding votes.
Suitable for case where a set of queries were identified by more than two methods.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA, contact at zhangab2008(at)mail.cnu.edu.cn
queIDs<-c("q1","q2","q3") bp<-c("sp1","sp1","sp1") bpk<-c("sp1","sp1","sp2") bayes<-c("sp2","sp1","sp3") fuzzyID<-c("sp1","sp1","sp2") identifiedBy2orMore<-data.frame(bp=bp,bpk=bpk,bayes=bayes,fuzzyID=fuzzyID) rownames(identifiedBy2orMore)<-queIDs<-c("q1","q2","q3") ccs<-consensus.identify(identifiedBy2orMore)
queIDs<-c("q1","q2","q3") bp<-c("sp1","sp1","sp1") bpk<-c("sp1","sp1","sp2") bayes<-c("sp2","sp1","sp3") fuzzyID<-c("sp1","sp1","sp2") identifiedBy2orMore<-data.frame(bp=bp,bpk=bpk,bayes=bayes,fuzzyID=fuzzyID) rownames(identifiedBy2orMore)<-queIDs<-c("q1","q2","q3") ccs<-consensus.identify(identifiedBy2orMore)
Digitize an object of DNAbin.
digitize.DNA(seqs)
digitize.DNA(seqs)
seqs |
an object of DNAbin. |
a numeric matrix of DNA sequences digitized.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA.
zhangab2008(at)mail.cnu.edu.cn
data(TibetanMoth) digitized.DNA<-digitize.DNA(seqs=TibetanMoth) digitized.DNA
data(TibetanMoth) digitized.DNA<-digitize.DNA(seqs=TibetanMoth) digitized.DNA
Calculation of kmer frequency matrices from DNAbin for both reference and query sequences.
DNAbin2kmerFreqMatrix(ref, que, kmer = kmer)
DNAbin2kmerFreqMatrix(ref, que, kmer = kmer)
ref |
Object of class "DNAbin" used as a reference dataset, which contains taxon information. |
que |
Object of class "DNAbin", which needs to be inferred. |
kmer |
a numeric to indicate the length of kmer used. |
kmer frequency matrices for both ref and que sequences, but only based on kmers found in ref!!! new kmers in que will be ignored.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA. zhangab2008(at)mail.cnu.edu.cn
zhangab2008(at)mail.cnu.edu.cn
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) que<-as.DNAbin(as.character(TibetanMoth[51:60,])) out<-DNAbin2kmerFreqMatrix(ref,que,kmer=3) out
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) que<-as.DNAbin(as.character(TibetanMoth[51:60,])) out<-DNAbin2kmerFreqMatrix(ref,que,kmer=3) out
Calculation fuzzy membership function value given a distance from query to a potenial species, maximal intraspecific variation of the potential species theta1, and minimal interspecific distance (here, the distance between the potential species and its nearest neighbor theta2) (fuzzy-set based method, Zhang et al. 2012), different definition of distances could also be used.
FMF(xtheta12)
FMF(xtheta12)
xtheta12 |
a numerical vector containing three elements, a distance from query to a potenial species, maximal or sd of intraspecific variation of the potential species theta1,minimal or mean interspecific distance. |
a numeric between 0 and 1.
different definitions of distances could also be used.
Ai-bing ZHANG, Zhi-yong SHI. CNU, Beijing, CHINA, contact at zhangab2008(at)mail.cnu.edu.cn
Zhang, A. B., C. Muster, H.B. Liang, C.D. Zhu, R. Crozier, P. Wan, J. Feng, R. D. Ward.(2012). A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding. Molecular Ecology, 21(8):1848-63.
xtheta12<-c(0.6289163,0.1465522,0.6379375) FMF.out<-FMF(xtheta12) FMF.out
xtheta12<-c(0.6289163,0.1465522,0.6379375) FMF.out<-FMF(xtheta12) FMF.out
Calculation intraspecific variation (sd) of the potential species theta1, and mean interspecific distance (here, the mean distance between the potential species and its nearest neighbor theta2) (fuzzy-set based method,slightly modified from Zhang et al. 2012). The calculation was done for all species in the reference dataset.
FMFtheta12(ref)
FMFtheta12(ref)
ref |
object of class "DNAbin" used as a reference dataset, which contains taxon information. |
a data frame containing intraspecific (sd, theta1) and interspefic variation (mean) of all species, and their corresponding nearest neighbor (NN).
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA, contact at zhangab2008 (at) mail.cnu.edu.cn.
Zhang, A. B., C. Muster, H.B. Liang, C.D. Zhu, R. Crozier, P. Wan, J. Feng, R. D. Ward.(2012). A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding. Molecular Ecology, 21(8):1848-63.
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) FMF.theta12<-FMFtheta12(ref) FMF.theta12
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) FMF.theta12<-FMFtheta12(ref) FMF.theta12
Extract sequence names from different objects of DNAbin, including generated from fasta2DNAbin() (package:adegenet), and read.dna() (package:ape).
NAMES(seqs)
NAMES(seqs)
seqs |
object of class "DNAbin", generated from fasta2DNAbin() (package:adegenet), and read.dna() (package:ape). |
a character string array/vector.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA.
zhangab2008(at)mail.cnu.edu.cn
data(TibetanMoth) seqNames<-NAMES(TibetanMoth) seqNames
data(TibetanMoth) seqNames<-NAMES(TibetanMoth) seqNames
Optimize kmer length by trying kmers which length is in the range from 1 to max.kmer. The optimal kmer will have maximal species identification success rate.
optimize.kmer(ref, max.kmer = max.kmer)
optimize.kmer(ref, max.kmer = max.kmer)
ref |
object of class "DNAbin" used as a reference dataset, which contains taxon information. |
max.kmer |
a numeric to indicate the length of maximal kmer. |
a numeric indicating the optimal kmer in the range examined.
Ai-bing ZHANG, Cai-qing YANG, Meng-di HAO, CNU, Beijing, CHINA.
zhangab2008 (at) mail. cnu. edu. cn/zhangab2008 (at) gmail.com.
data(TibetanMoth) ref<-TibetanMoth[1:10,] optimial.kmer<-optimize.kmer(ref,max.kmer=5)
data(TibetanMoth) ref<-TibetanMoth[1:10,] optimial.kmer<-optimize.kmer(ref,max.kmer=5)
COI DNA barcodes of Pine Moth in China.
data("pineMothCOI")
data("pineMothCOI")
The format is: 'DNAbin' raw [1:140, 1:652] a t a a ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:140] "A111,Lasiocampidae_Dendrolimus_punctatus" "A22,Lasiocampidae_Dendrolimus_punctatus" "A23,Lasiocampidae_Dendrolimus_punctatus" "A27,Lasiocampidae_Dendrolimus_punctatus" ... ..$ : NULL
COI DNA barcodes of Pine Moth (Lasiocampidae) in China.
http://dx.plos.org/10.1371/journal.pone.0064428.
Dai Q-Y, Gao Q, Wu C-S, Chesters D, Zhu C-D, and A.B. Zhang*. (2012) Phylogenetic Reconstruction and DNA Barcoding for Closely Related Pine Moth Species (Dendrolimus) in China with Multiple Gene Markers. PLoS ONE 7(4): e32544.
data(pineMothCOI) pineMothCOI
data(pineMothCOI) pineMothCOI
ITS1 sequences of seven closely related pine moths species sampled through China.
data("pineMothITS1")
data("pineMothITS1")
The format is: List of 69 $ A43,Lasiocampidae_ Dendrolimus_ punctatus: raw [1:698] 48 48 48 28 ... $ A54,Lasiocampidae_ Dendrolimus_ punctatus: raw [1:696] 48 18 28 18 ... - attr(*, "class")= chr "DNAbin"
ITS1 used as DNA barcodes for closely related rine moth species (Dendrolimus) in China.
http://dx.plos.org/ 10.1371/ journal.pone.0064428.
Dai Q-Y, Gao Q, Wu C-S, Chesters D, Zhu C-D, and A.B. Zhang*. (2012) Phylogenetic Reconstruction and DNA Barcoding for Closely Related Pine Moth Species (Dendrolimus) in China with Multiple Gene Markers. PLoS ONE 7(4): e32544.
data(pineMothITS1) pineMothITS1
data(pineMothITS1) pineMothITS1
ITS2 sequences of seven closely related pine moths species sampled in China.
data("pineMothITS2")
data("pineMothITS2")
The format is: List of 97 $ A22,Lasiocampidae_ Dendrolimus_ punctatus: raw [1:568] 48 48 48 28 ... $ A23,Lasiocampidae_ Dendrolimus_ punctatus: raw [1:574] 48 18 28 18 ... $ A29,Lasiocampidae_ Dendrolimus_ punctatus: raw [1:569] 88 48 48 48 ... $ A52,Lasiocampidae_ Dendrolimus_ punctatus: raw [1:570] 48 18 28 18 ... - attr(*, "class")= chr "DNAbin"
ITS2 used as DNA barcodes for closely related rine moth species (Dendrolimus) in China.
http://dx.plos.org /10.1371/ journal.pone.0064428.
Dai Q-Y, Gao Q, Wu C-S, Chesters D, Zhu C-D, and A.B. Zhang*. (2012) Phylogenetic Reconstruction and DNA Barcoding for Closely Related Pine Moth Species (Dendrolimus) in China with Multiple Gene Markers. PLoS ONE 7(4): e32544.
data(pineMothITS2) pineMothITS2
data(pineMothITS2) pineMothITS2
Randomly sample reference data at different levels of taxon.
sample.ref(ref, sample.porp = 0.5, sample.level = "full")
sample.ref(ref, sample.porp = 0.5, sample.level = "full")
ref |
Object of class "DNAbin" used as a reference dataset, which contains taxon information. |
sample.porp |
a numeric value between 0 and 1, indicating proportion of samples to draw at each level of taxon. |
sample.level |
a character string choosing from c("full","family","genus","species"). |
a list containing the selected samples and the samples left, in DNAbin format stored in a matrix or a list.
the ref must contain information on taxonomy, in format like, ">LS0909030M,Noctuidae_Himalaea_unica", i.e., "seqID,family_genus_species", or ">LS0909030M,Himalaea_unica"; in case there is only one sample/individual for a taxon level, this sample will be retained in ref.selected.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA.
zhangab2008(at)mail.cnu.edu.cn;
data(TibetanMoth) data(pineMothITS2) ref<-TibetanMoth ref2<-pineMothITS2 out<-sample.ref(ref,sample.porp=0.5,sample.level="full") out out2<-sample.ref(ref2,sample.porp=0.5,sample.level="full") out2
data(TibetanMoth) data(pineMothITS2) ref<-TibetanMoth ref2<-pineMothITS2 out<-sample.ref(ref,sample.porp=0.5,sample.level="full") out out2<-sample.ref(ref2,sample.porp=0.5,sample.level="full") out2
Output identified results to an outfile in temporty directory (found by tempdir() function).
save.ids(outfile = "identified.txt", ids)
save.ids(outfile = "identified.txt", ids)
outfile |
character string to indicate outfile name. |
ids |
object of class "BarcodingR", which contains identified taxon information. |
no value returned,but an output file.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA.
zhangab2008(at)mail.cnu.edu.cn
barcoding.spe.identify()
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) que<-as.DNAbin(as.character(TibetanMoth[50:60,])) bsi<-barcoding.spe.identify(ref, que, method = "fuzzyId") bsi save.ids(outfile="identified.txt",bsi)
data(TibetanMoth) ref<-as.DNAbin(as.character(TibetanMoth[1:50,])) que<-as.DNAbin(as.character(TibetanMoth[50:60,])) bsi<-barcoding.spe.identify(ref, que, method = "fuzzyId") bsi save.ids(outfile="identified.txt",bsi)
Summarize taxon information, sequence statistics,barcodes numbers per species for reference dataset.
summarize.ref(ref, taxonStat = TRUE, seqStat = TRUE, barcodeStat = TRUE)
summarize.ref(ref, taxonStat = TRUE, seqStat = TRUE, barcodeStat = TRUE)
ref |
object of class "DNAbin" used as a reference dataset, which contains taxon information, or just an array containing taxon information only. |
taxonStat |
logic value to indicate whether the item is calculated only when ref is an object of class "DNAbin". |
seqStat |
logic value to indicate whether the item is calculated |
barcodeStat |
logic value to indicate whether the item is calculated |
a list containing taxon statistics, sequence statistics, population parameters,barcoding statistics ()
Ai-bing ZHANG, Meng-di HAO, CNU, Beijing, CHINA.
zhangab2008(at)mail.cnu.edu.cn./zhangab2008(at)gmail.com.
data(TibetanMoth) s.r<-summarize.ref(TibetanMoth,taxonStat=TRUE,seqStat=TRUE,barcodeStat=TRUE) s.r
data(TibetanMoth) s.r<-summarize.ref(TibetanMoth,taxonStat=TRUE,seqStat=TRUE,barcodeStat=TRUE) s.r
To calculate TDR value for a set of queries and one potential species. Its value is in the range of [0,1], 0 indicates extremly weak species membership, values close 1 indicating strong species membership.
TDR2(oneSpe, que, boot, boot2)
TDR2(oneSpe, que, boot, boot2)
oneSpe |
object of class "DNAbin" which contains DNA sequences from one species |
que |
object of class "DNAbin" which contains DNA sequences different samples |
boot |
a numeric value indicating times of resampling along sequence columns |
boot2 |
a numeric value indicating times of resampling along sequence rows (different samples) |
a numeric vector represents TDR values for each query against the species
oneSpe and que should be the same in sequence length, i.e., they should be aligned in prior. It's strongly recommended that oneSpe should have large enough sample size,e.g., 20.
Ai-bing ZHANG, PhD. CNU, Beijing, CHINA, contact at zhangab2008(at)mail.cnu.edu.cn
Jin Q, L,J.He, A.B. Zhang* (2012). A Simple 2D Non-Parametric Resampling Statistical Approach to Assess Confidence in Species Identification in DNA Barcoding-An Alternative to Likelihood and Bayesian Approaches. PLoS ONE 7(12): e50831. doi:10.1371/ journal. pone. 0050831. http://dx.plos.org/ 10.1371/ journal. pone. 0050831.
data(TibetanMoth) sampleSpeNames<-NAMES(TibetanMoth) Spp<-gsub(".+,","",sampleSpeNames) oneSpe<-TibetanMoth[grep("Macdunnoughia_crassisigna", Spp, value = FALSE,fixed = TRUE),] oneSpe<-as.DNAbin(as.character(oneSpe[1:5,])) que<-TibetanMoth[grep("Agrotis_justa", Spp, value = FALSE,fixed = TRUE),] que2<-oneSpe[1:2,] out<-TDR2(oneSpe,que, boot=10,boot2=10) ### true false identification
data(TibetanMoth) sampleSpeNames<-NAMES(TibetanMoth) Spp<-gsub(".+,","",sampleSpeNames) oneSpe<-TibetanMoth[grep("Macdunnoughia_crassisigna", Spp, value = FALSE,fixed = TRUE),] oneSpe<-as.DNAbin(as.character(oneSpe[1:5,])) que<-TibetanMoth[grep("Agrotis_justa", Spp, value = FALSE,fixed = TRUE),] que2<-oneSpe[1:2,] out<-TDR2(oneSpe,que, boot=10,boot2=10) ### true false identification
COI DNA barcodes of Tibetan Moth in China.
data("TibetanMoth")
data("TibetanMoth")
The format is: 'DNAbin' raw [1:319, 1:630] a t a a ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:319] "LS0909030M,Noctuidae_Himalaea_unica" "LZ0827026M,Noctuidae_Amphipyra_pyramidea" "ML0829010M,Noctuidae_Auchmis_saga" "BM0830055M,Noctuidae_Auchmis_saga" ... ..$ : NULL
COI DNA barcodes of Tibetan Moth (Noctuidae) in China.
http://dx.plos.org/10.1371/journal.pone.0064428.
Q. Jin, H.L. Han, X.M. Hu, X.H. Li,C.D. Zhu,S. Y. W. Ho, R. D. Ward, A.B. Zhang* . (2013).Quantifying Species Diversity with a DNA Barcoding-Based Method: Tibetan Moth Species (Noctuidae) on the Qinghai-Tibetan Plateau. PloS One 8: e644.
data(TibetanMoth) TibetanMoth
data(TibetanMoth) TibetanMoth