Chemical Industry and Engineering Progress ›› 2025, Vol. 44 ›› Issue (S1): 29-37.DOI: 10.16085/j.issn.1000-6613.2025-1064

• Chemical processes and equipment • Previous Articles     Next Articles

Multi-flavor molecule prediction model based on pre-training and fine-tuning strategies

SONG Yingjie(), ZHANG Lei(), DU Jian   

  1. College of Chemical Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2025-07-24 Revised:2025-08-29 Online:2025-11-24 Published:2025-10-25
  • Contact: ZHANG Lei

基于预训练和微调策略的多味道分子预测模型

宋英杰(), 张磊(), 都健   

  1. 大连理工大学化学工程学院,辽宁 大连 116024
  • 通讯作者: 张磊
  • 作者简介:宋英杰(1999—),男,硕士研究生,研究方向为过程系统工程。E-mail:songyj@mail.dlut.edu.com
  • 基金资助:
    国家自然科学基金(22278053);国家自然科学基金(22078041);国家自然科学基金优秀青年科学基金(22422801)

Abstract:

Taste perception analysis plays a vital role in food science, directly affecting food consumption, nutrition, and health. Traditional sensory evaluation methods for flavor are highly subjective and time-consuming, making them inadequate for the rapid screening and optimization of flavor molecules. To overcome this limitation, we integrated multiple open-source databases and constructed a standardized dataset comprising 17633 taste-related molecules, covering five categories: sweet, bitter, umami, sour, and other less common tastes (e.g., salty, spicy, astringent, numbing, etc.). Based on this dataset, we developed a high-accuracy multi-flavor prediction model by combining the Uni-Mol2 pre-trained molecular model with fine-tuning strategies. Furthermore, we employed integrated gradients and atomic contribution analysis to interpret the predictive mechanisms of the model. Experimental results showed that the proposed model achieved accurate predictions across all taste categories, with an overall accuracy of 95.2%, thereby validating the effectiveness of multi-database integration and fine-tuning strategies for taste molecule prediction and providing a new technical pathway for the rapid screening and functional analysis of flavor molecules.

Key words: flavor molecule prediction, pre-training, multi-classification, machine learning, fine-tuning strategies

摘要:

味觉感知分析在食品科学研究中起着至关重要的作用,它直接影响食品消费、人体营养和健康。传统的风味感官分析方法由于主观性强且研发周期长,难以满足现代食品科学对风味分子快速筛选和优化的需求。因此,本文通过整合多个开源数据库,构建了一个包含17633个味道分子的标准化数据集,涵盖甜、苦、鲜、酸和其他少见味型(如咸味、辛辣、涩味、麻味等)5类味道分子。通过结合Uni-Mol2预训练模型与微调策略,构建了高精度的分子多味道预测模型,并利用积分梯度和原子贡献分析对模型的机制进行解释性分析。结果表明,模型在甜、苦、鲜、酸和其他少见味型上均实现了准确预测,准确率达到95.2%,验证了多数据库融合与精细化微调策略在味觉分子预测中的有效性,为风味分子的快速筛选与功能解析提供了新的技术途径。

关键词: 风味分子预测, 预训练, 多分类, 机器学习, 微调策略

CLC Number: 

京ICP备12046843号-2;京公网安备 11010102001994号
Copyright © Chemical Industry and Engineering Progress, All Rights Reserved.
E-mail: hgjz@cip.com.cn
Powered by Beijing Magtech Co. Ltd