旧TCMSP首页官方介绍如下,同样适用于新TCMSP:
TCMSP : Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform
TCMSP is a unique systems pharmacology platform of Chinese herbal medicines that captures the relationships between drugs, targets and diseases. The database includes chemicals, targets and drug-target networks, and associated drug-target-disease networks, as well as pharmacokinetic properties for natural compounds involving oral bioavailability, drug-likeness,intestinal epithelial permeability, blood-brain-barrier, aqueous solubility and etc. This breakthrough has sparked a new interest in the search of candidate drugs in various types of traditional Chinese herbs.
Please Cite: Jinlong Ru; Peng Li; Jinan Wang; Wei Zhou; Bohui Li; Chao Huang; Pidong Li; Zihu Guo; Weiyang Tao; Yinfeng Yang; Xue Xu; Yan Li; Yonghua Wang; Ling Yang. TCMSP: a database of systems pharmacology for drug discovery from herbal medicines. J Cheminformatics. 2014 Apr 16;6(1):13.
由以上信息,我们知道:
TCMSP ,全称是Traditional Chinese Medicine Systems Pharmacology ,中文名是中药系统药理学数据库与分析平台。
TCMSP捕获药物、靶点和疾病之间的关系。该数据库包罗化学物质、靶点和药物-靶点网络,以及相干的药物-靶点-疾病网络,还包罗涉及口服生物利用度、药物相似性、肠道上皮通透性、血脑屏障、水溶性等天然化合物的药代动力学特性。
请引用:金龙茹;李鹏;王金男;周伟;李博辉;黄超;李炜东;郭子虎;陶伟阳;杨茵峰;薛旭;李燕;王永华;杨玲.TCMSP:用于从草药中发现药物的系统药理学数据库。化学信息学杂志。2014 年 4 月 16 日;6(1):13. 总结如上,即TCMSP收录的都是中药的相干数据,其主要且常见的功能如下:
如上,其中旧官网首页的“请引用”的有其发表的可跳转的论文链接,让我们点进去继续深挖该数据库。
其论文发表于2014年的《Journal of Cheminformatics》(化学信息学杂志)上,我摘录了数据相干部分
该2014年论文提到的数据体量和现下2024年10月24日截止我在旧TCMSP网站上观察到的数据体量是差不多的。
中药没有新增收录,化合物(因素)新增收录了五六百个。靶点新增收录了三十个左右。疾病新增收录了三十个左右。
而新版TCMSP与旧TCMSP数据没有区别。
3.1.数据的局限性(动物类、矿石类中药不全)
Description
It consists of all the 499 Chinese herbs registered in the Chinese pharmacopoeia with 29,384 ingredients, 3,311 targets and 837 associated diseases. Twelve important ADME-related properties like human oral bioavailability, half-life, drug-likeness, Caco-2 permeability, blood-brain barrier and Lipinski’s rule of five are provided for drug screening and evaluation.
形貌
TCMSP包含了中国药典中注册的所有 499 种中草药,包罗 29,384 种因素、3,311 个靶标和 837 种相干疾病。为药物筛选和评估提供了 12 个紧张的 ADME 相干特性,如人类口服生物利用度、半衰期、药物相似性、Caco-2 通透性、血脑屏障和 Lipinski 五法则。 这说明什么,说明它收录的中药中的动物药就不怎么全了,假如全的话也会把数据列入论文里面的hhh
3.2.与其他中药相干数据库的比对(优劣势)
Background
Presently, several databases have provided useful tools in different aspects for TCM investigations. For example, TCM-ID [12] and TCM Database@Taiwan [13] provide the largest number of herbal ingredients with 3D structures and functional properties. Chem-TCM [14] and HIT [15] focus on herbal compounds and their corresponding targets. TCMID [16] comprises TCM formulae, herbs, ingredients and the targets and diseases. CVDHD [17] collects those natural products related to cardiovascular diseases and targets. Comparisons among these databases are listed on the TCMSP website.
In order to gather all available information about ingredients of herbal medicines, we performed an extensive literature search for each herbal medicine. Structure files of molecules were downloaded from PubChem [18] Compound database, ChEMBL [19] and ChemSpider [20], or produced by ISIS Draw 2.5 (MDL Information Systems, Inc.) and further optimized by Sybyl 6.9 (Tripos, Inc.) with Sybyl force field and default parameters [2, 21]. Different format types of the chemical files were converted to SDF format by Open Babel [22]. The duplicates were removed according to InChIKey.
为了收集有关草药因素的所有可用信息,我们对每种草药举行了广泛的文献检索。分子的布局文件从 PubChem [18] 化合物数据库、ChEMBL [19] 和 ChemSpider [20] 下载,或由 ISIS Draw 2.5 (MDL Information Systems, Inc.) 天生,并由 Sybyl 6.9 (Tripos, Inc.) 利用 Sybyl 力场和默认参数 [2, 21] 进一步优化。Open Babel 将差异格式类型的化学文件转换为 SDF 格式 [22]。根据 InChIKey 删除了重复项。 意思就是,化合物的布局文件来自于PubChem和ChemBL和ChemSpider,大概布局式画图天生。
Drug targeting and disease association
Target information was obtained from DrugBank database [26]. Drug-Target mappings were obtained from two sources. Experimental validated drug-target pairs were retrieved from HIT database [15]. For those compounds without validated targets, the SysDT model constructed in our previous work [27] was used to predict the potential targets of a compound. SysDT shows impressive performance of prediction for drug-target interactions, with a concordance of 82.83%, a sensitivity of 81.33%, and a specificity of 93.62%, respectively. The disease information was obtained from TTD database [28] and PharmGKB (https://www.pharmgkb.org/).
靶标信息来自 DrugBank 数据库 [26]。药物-靶标图谱来自两个泉源。从 HIT 数据库中检索实验验证的药物-靶点对 [15]。对于那些没有验证靶标的化合物,我们利用我们之前工作 [27] 中构建的 SysDT 模型来猜测化合物的潜在靶标。SysDT 表现出令人印象深刻的药物-靶点相互作用猜测性能,同等性为 82.83%,敏感性为 81.33%,特异性为 93.62%。疾病信息来自 TTD 数据库 [28] 和 PharmGKB (https://www.pharmgkb.org/)。 意思就是,靶点信息来自于DrugBank,查询某种药物相干的靶点有哪些是来自于HIT数据库大概模型猜测。疾病信息来自于TTD和PharmGKB。
4.官网整体数据库下载
由于TCMSP不提供直接的导出或下载,所以我们可以利用爬虫大概充值TCMSP会员,大概其他方法来获得整个数据库。
在这里本人引入谷歌插件Instant Data Scraper,它可以获取页面所含的excel或csv表格。是本人好友(CSDN账号:BlastOrange)推荐的一个方法,非常适合没有代码基础大概电脑没有编程环境的人。插件利用详情可以24年12月后私信打搅他。
除此之外,还可以直接访问:https://old.tcmsp-e.com/load_intro.php?id=31,看到如下界面,不过所有文件都是下载不了的……