Improving patient clustering by incorporating structured label relationships in similarity measures

medRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览9
暂无评分
摘要
Context Patient stratification is the cornerstone of numerous health studies, serving to enhance medicine efficacy estimation and facilitate patient matching. To stratify patients, similarity measured between patients can be computed from medical health records databases, such as medico-administrative databases. Importantly, the variables included in medico-administrative databases can be associated with labels, which can be organized in ontologies or other classification systems. However, to the best of our knowledge, the relevance of considering such label classification in the computation of patient similarity measures has been poorly studied. Objective We propose and evaluate several weighted versions of the Cosine similarity that consider structured label relationships to compute patient similarities from a medico-administrative database. Material and Methods As a use case, we analyze medicine reimbursements contained in the Échantillon Généraliste des Bénéficiaires, a French medico-administrative database. We compute the standard Cosine similarity between patients based on their medicine reimbursement. In addition, we computed a weighted Cosine similarity measure that includes variable frequencies and two weighted Cosine similarity measures that consider label relationships. We construct patient networks from each similarity measure and identify clusters of patients. We evaluate the performance of the different similarity measures with enrichment tests using information on chronic diseases. Results The similarity measures that include label relationships perform better to identify similar patients. Indeed, using these weighted measures, we identify distinct patient clusters with a higher number of chronic disease enrichments as compared to the other measures. Importantly, the enrichment tests provide clinically interpretable insights into these patient clusters. Conclusion Considering label relationships when computing patient similarities improves stratification of patients regarding their health status. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This work was supported by the Inserm cross-cutting program Genomic variability 2018 GOLD. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: We confirm that this study has been declared to INSERM (Institut National de la Santé et de la Recherche Médicale, https://www.inserm.fr/) The data from this study are extracted from the EGB (Echantillon Généraliste de Bénéficiaires), a permanent 1/97 representative sample of the National Health Data System (Système National de Données de Santé, SNDS). The information provided to individuals in EGB on the possible re-use of their data and the procedures for exercising their rights comply with the legislative and regulatory provisions applicable to the processing of personal data in the SNDS. According to French regulation, individuals in SNDS database are informed of the reuse of their data for research and can opposed to this reuse as defined by Articles 92 to 95 of Decree No. 2005-1309 of 20 October 2005 (https://www.legifrance.gouv.fr/loda/article_lc/LEGIARTI000037300884/). As required from French regulation, EGB data can be reuse for research projects from authorized persons once the research project is declared to their institution (INSERM). I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes Data used for analysis is confidential and cannot be shared
更多
查看译文
关键词
structured label relationships
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要