Construction and verification of trimethylation modification of lysine 4 on histone H3 related long non-coding RNA prognostic model for gastric cancer
作者: 胡震1,祁玉忠1,赵绍基2,王光熙1,孙开宇2,吴文辉1
单位:1.中山大学附属第七医院消化医学中心,广东深圳518107;2.中山大学附属第一医院胃肠外科,广东广州510080
Authors: Hu Zhen1, Qi Yuzhong1, Zhao Shaoji2,
Wang Guangxi1, Sun Kaiyu2, Wu Wenhui1
Unit: 1.Gastroenterology Center, the
Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, Guangdong,
China;2.Department of Gastrointestinal Surgery, the
First Affiliated Hospital of Sun Yat-sen University, Guangzhou 510080,
Guangdong, China
摘要:
目的 探索胃癌组蛋白H3第4位赖氨酸的三甲基化(trimethylation of lysine 4 on histone H3,H3K4me3)修饰相关长链非编码RNA(long non-coding RNA,LncRNA)特征,构建相关预后模型并预测胃癌免疫治疗疗效。方法 从癌症基因组图谱数据库下载胃癌相关转录组测序数据和对应的患者临床资料,通过构建H3K4me3相关调节因子基因与LncRNA 的共表达网络识别H3K4me3修饰相关LncRNA,并将癌症基因组图谱数据库中370例符合筛选标准的胃癌患者样本(整体组)按1:1 随机抽样划分为训练组(n=185)和验证组(n=185)。随后基于单因素Cox回归、Lasoo回归分析构建H3K4me3相关LncRNA预后风险评分模型并进行内部验证。Kaplan-Meier生存分析和受试者操作特征曲线(receiver operating characteristic curve, ROC曲线)被用于验证模型的预测性能。通过单因素和多因素Cox回归分析评估风险评分等临床指标的预后预测价值。结合风险评分、年龄和肿瘤TNM分期构建预测胃癌患者总生存率的列线图模型,ROC曲线与校准图被用于评估列线图的预测准确性。借助共识聚类识别异质性聚类亚群并进行免疫治疗疗效预测。结果 基于共表达网络关系识别了14个具有预后价值的H3K4me3相关LncRNA并构建了相关风险模型及评价体系。根据训练组预后风险评分模型获得的中位风险评分将训练组、验证组和整体组胃癌患者划分为高、低风险,Kaplan-Meier生存曲线显示低风险患者的总体生存情况要优于高风险患者(P<0.05)。此外,该模型在训练组中预测胃癌患者1、3、5年总生存率的曲线下面积(area under curve, AUC)分别为0.708、0.730、0.770,在验证组中分别为0.690、0.648、0.713,而在整体组中分别为0.697、0.670、0.724。多因素Cox回归分析显示基于H3K4me3相关LncRNA构建的风险评分模型是预测胃癌患者预后的独立因素(P<0.001)。构建的列线图预测胃癌患者1、2、3年总生存率的AUC分别为0.727、0.780、0.717,且其校准曲线与理想曲线相接近。基于共识聚类算法进一步识别了2种具有异质性免疫特征的H3K4me3-LncRNA亚群,其中亚组Ⅰ具有更高的免疫细胞浸润水平和更强的免疫应答潜力,并对5-氟尿嘧啶、奥沙利铂等显示出更高的药物敏感性,而亚组Ⅱ则可能对磷脂酰肌醇3-激酶特异性抑制剂具有更高的敏感性。结论 本研究构建了一个H3K4me3-LncRNA风险评分模型以预测胃癌患者的预后,并揭示了其异质性微观特征及在预测免疫治疗疗效方面的潜在价值。
关键词:
胃癌;组蛋白H3第4位赖氨酸的三甲基化修饰;组蛋白修饰;长链非编码RNA;免疫治疗
Abstract:
Objective To explore the characteristics of long non-coding RNA (LncRNA) associated with trimethylation modification of lysine 4 on histone H3 protein (H3K4me3) in gastric cancer (GC), construct a related prognostic model and predict the efficacy of immunotherapy for GC. Method The RNA transcriptome sequencing data and clinical information of GC patients were downloaded from the cancer genome atlas database, and the H3K4me3 modification related LncRNA was identified by constructing a coexpression network of H3K4me3-related regulatory factor genes and LncRNA. In addition, 370 GC patients’ samples (integral set) from the cancer genome atlas database meeting the screening criteria were randomly divided into training set (n=185) and verification set (n=185) according to 1:1 random sampling. Subsequently, the H3K4me3-related LncRNA prognostic risk score model was constructed based on univariate Cox regression and Lasoo regression analysis, and internal verification was conducted. Kaplan-Meier survival analysis and receiver operating characteristic curve (ROC curve) were used to verify the predictive performance of the model. Univariate and multivariate Cox regression analysis were used to evaluate the predictive prognosis value of clinical indicators such as risk score. Combined with risk score, age and tumor TNM stage, a nomogram model was constructed to predict the overall survival rates of GC patients. The predictive accuracy of the nomogram was assessed by applying ROC curve and calibration curves. The heterogeneous cluster subsets were identified by consensus clustering and the therapeutic effect of immunotherapy was predicted. Result Based on the co-expression network relationship, 14 H3K4me3-related LncRNA with prognostic value were identified and the related risk model and evaluation system were constructed. According to the median risk score obtained by the training set's prognostic risk score model, GC patients in the training set, verification set and integral set were divided into high risk and low risk. The overall survival curves of GC patients with low-risk drawn by Kaplan-Meier method were better than those of the GC patients with high-risk in the training set, validation set, and integral set (P<0.05). The area under curve (AUC) of the prognostic model for predicting the 1-year, 3-year and 5-year overall survival rates of GC patients in the training set were 0.708, 0.730 and 0.770 respectively; in the verification set, they were 0.690, 0.648 and 0.713, respectively; and in the integral set, they were 0.697, 0.670 and 0.724, respectively. Multivariate Cox regression analysis confirmed that the risk score model based on H3K4me3 modification related LncRNA was an independent factor for predicting the prognosis of GC patients (P<0.001). The AUC of the constructed nomogram for predicting the 1-year, 2-year and 3-year overall survival rates of GC patients were 0.727, 0.780 and 0.717, respectively, and the calibration curves were close to the ideal curves. Based on consensus clustering algorithm, 2 H3K4me3-LncRNA subsets with heterogeneous immune characteristics were further identified, of which subgroup Ⅰ had higher levels of immune cell infiltration and stronger immune response potential, and showed higher drug sensitivity to 5-fluorouracil and oxaliplatin, whereas subgroup Ⅱ was likely to be more sensitive to phosphatidylinositol 3-kinase specific inhibitors. Conclusion In this study, we constructed an H3K4me3 -LncRNA risk score model to predict the prognosis of patients with GC, and revealed its heterogeneous microscopic characteristics and potential value in predicting the efficacy of immunotherapy.
Key Words: Gastric cancer;
Trimethylation of lysine 4 on histone H3; Histone modification; Long non-coding
RNA; Immunotherapy
注:网络优先发布
关注我们