HBV pre-S区域的深度测序揭示了HBV基因型的高度异质性以及字 - 学术讨论& HBV English

15/10/02说明：此前论坛服务器频繁出错，现已更换服务器。今后论坛继续数据库备份，不备份上传附件。

肝胆相照论坛 › 论坛 › 学术讨论& HBV English › HBV pre-S区域的深度测序揭示了HBV基因型的高度异质性以 ...

查看: 1061\|回复: 4	go HBV pre-S区域的深度测序揭示了HBV基因型的高度异质性以及字 [复制链接]

StephenW

资深会员

Rank: 8 Rank: 8

现金: 62111 元
精华: 26
帖子: 30437
注册时间: 2009-10-5
最后登录: 2022-12-28

才高八斗

1楼

发表于 2018-2-25 14:48 |只看该作者 |倒序浏览 |打印

PLoS Genet. 2018 Feb 23;14(2):e1007206. doi: 10.1371/journal.pgen.1007206. [Epub ahead of print]
Deep sequencing of HBV pre-S region reveals high heterogeneity of HBV genotypes and associations of word pattern frequencies with HCC.Bai X1,2,3, Jia JA4,5, Fang M4, Chen S4, Liang X2,6, Zhu S6, Zhang S1,2,7, Feng J1,2,8, Sun F1,2,3, Gao C4.
Author information
1Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China.2Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.3Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America.4Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, China.5Department of Laboratory Medicine, the 105th Hospital of PLA, Hefei, China.6School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.7Shanghai Key Laboratory for Comtemporary Applied Mathematics, Fudan University, Shanghai, China.8Department of Computer Science, University of Warwick, Coventry, United Kingodm.

AbstractHepatitis B virus (HBV) infection is a common problem in the world, especially in China. More than 60-80% of hepatocellular carcinoma (HCC) cases can be attributed to HBV infection in high HBV prevalent regions. Although traditional Sanger sequencing has been extensively used to investigate HBV sequences, NGS is becoming more commonly used. Further, it is unknown whether word pattern frequencies of HBV reads by Next Generation Sequencing (NGS) can be used to investigate HBV genotypes and predict HCC status. In this study, we used NGS to sequence the pre-S region of the HBV sequence of 94 HCC patients and 45 chronic HBV (CHB) infected individuals. Word pattern frequencies among the sequence data of all individuals were calculated and compared using the Manhattan distance. The individuals were grouped using principal coordinate analysis (PCoA) and hierarchical clustering. Word pattern frequencies were also used to build prediction models for HCC status using both K-nearest neighbors (KNN) and support vector machine (SVM). We showed the extremely high power of analyzing HBV sequences using word patterns. Our key findings include that the first principal coordinate of the PCoA analysis was highly associated with the fraction of genotype B (or C) sequences and the second principal coordinate was significantly associated with the probability of having HCC. Hierarchical clustering first groups the individuals according to their major genotypes followed by their HCC status. Using cross-validation, high area under the receiver operational characteristic curve (AUC) of around 0.88 for KNN and 0.92 for SVM were obtained. In the independent data set of 46 HCC patients and 31 CHB individuals, a good AUC score of 0.77 was obtained using SVM. It was further shown that 3000 reads for each individual can yield stable prediction results for SVM. Thus, another key finding is that word patterns can be used to predict HCC status with high accuracy. Therefore, our study shows clearly that word pattern frequencies of HBV sequences contain much information about the composition of different HBV genotypes and the HCC status of an individual.

PMID:29474353DOI:10.1371/journal.pgen.1007206

回复引用

Rank: 8 Rank: 8

现金: 62111 元
精华: 26
帖子: 30437
注册时间: 2009-10-5
最后登录: 2022-12-28

才高八斗

2楼

发表于 2018-2-25 14:48 |只看该作者

PLoS Genet。 2018年2月23日; 14（2）：e1007206。 doi：10.1371 / journal.pgen.1007206。 [电子版提前打印]
HBV pre-S区域的深度测序揭示了HBV基因型的高度异质性以及字形频率与HCC的关联。
Bai X1,2,3，Jia JA4,5，Fang M4，Chen S4，Liang X2.6，Zhu S6，Zhang S1,2,7，Feng J1,2,8，Sun F1,2,3，Gao C4。
作者信息

1
复旦大学数学科学学院计算系统生物学中心，中国上海。
2
中国上海复旦大学智力脑科学与技术研究所。
3
分子和计算生物学课程，美国加利福尼亚州洛杉矶南加州大学生物科学系。
4
第二军医大学东方肝胆外科医院检验科，上海，中国。
五
中国人民解放军第105医院检验医学科。
6
复旦大学计算机科学技术学院和上海市智能信息处理重点实验室。
7
复旦大学上海市现代应用数学重点实验室，中国上海。
8
华威大学计算机科学系联合王国考文垂。

抽象

乙型肝炎病毒（HBV）感染是世界上常见的问题，特别是在中国。超过60-80％的肝细胞癌（HCC）病例可归因于HBV高度流行区域的HBV感染。虽然传统的Sanger测序已经被广泛用于研究HBV序列，但NGS正在变得越来越常用。此外，尚不清楚下一代测序（NGS）的HBV阅读的单词模式频率是否可用于研究HBV基因型并预测HCC状态。在本研究中，我们使用NGS对94例HCC患者和45例慢性HBV（CHB）感染个体的HBV序列的前S区进行测序。使用曼哈顿距离计算并比较所有个体的序列数据中的词模式频率。个人使用主坐标分析（PCoA）和分层聚类进行分组。词模式频率也被用来建立使用K最近邻（KNN）和支持向量机（SVM）的HCC状态预测模型。我们使用单词模式显示了分析HBV序列的极高能力。我们的主要发现包括PCoA分析的第一主坐标与基因型B（或C）序列的分数高度相关，第二主坐标与HCC的可能性显着相关。分层聚类首先根据个体的主要基因型和HCC状态对个体进行分组。使用交叉验证，获得KNN的约0.88和SVM的0.92的受试者操作特征曲线（AUC）下的高面积。在46例HCC患者和31例CHB患者的独立数据集中，使用SVM获得了0.77的良好AUC评分。进一步显示每个个体3000个读数可以产生稳定的SVM预测结果。因此，另一个关键的发现是可以使用单词模式以高精度预测HCC状态。因此，我们的研究清楚地表明，HBV序列的单词模式频率包含关于不同HBV基因型的组成和个体的HCC状态的许多信息。

结论：
29474353
DOI：
10.1371 / journal.pgen.1007206

回复引用

举报返回顶部

newchinabok