Chemical Industry and Engineering Progress ›› 2025, Vol. 44 ›› Issue (10): 5563-5569.DOI: 10.16085/j.issn.1000-6613.2024-1289

• Chemical processes and equipment • Previous Articles    

An enhanced deep reinforcement learning algorithm for industrial process control

ZHANG Jiaxin(), DONG Lichun()   

  1. School of Chemistry and Chemical Engineering, Chongqing University, Chongqing 400044, China
  • Received:2024-08-07 Revised:2024-09-29 Online:2025-11-10 Published:2025-10-25
  • Contact: DONG Lichun

增强型深度强化学习方法应用于化工过程控制

张佳鑫(), 董立春()   

  1. 重庆大学化学化工学院,重庆 400044
  • 通讯作者: 董立春
  • 作者简介:张佳鑫(1995—),男,博士研究生,研究方向为过程系统工程。E-mail:zhangjx@cqu.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(22108019)

Abstract:

Deep reinforcement learning (DRL) algorithms have recently attracted considerable attentions in the field of industrial process control due to their strong ability to achieve optimal control policies through environment-agent interactions without relying on historical data or prior knowledge. Among a variety of DRL models, the twin delayed deep deterministic policy gradient (TD3) model can effectively address the problem of "Q-values overestimation" endured by the deep deterministic policy gradient (DDPG) model, establishing itself as a leading DRL model for industrial process control. However, the original TD3-based controller shows limitations in the industrial process control with considerable policy fluctuations, especially, the Q-values underestimation may result in suboptimal control policies. Accordingly, this study introduced an enhanced TD3 (ETD3) model to improve the performance of TD3 in practical industrial process control. In the ETD3 model, an evaluation criterion was firstly presented to assess the overestimation or underestimation of actor network parameters, and then the loss function that was input to the critic network was adjusted according to the assessment results. Subsequently, the fixed learning rate in the original TD3 model was replaced by a triangular decay cycle learning rate, which can enhance the model's training convergence and control performance. Finally, the effectiveness of the ETD3 model was verified by the performance of the ETD3 controller in the natural gas dehydration process under different disturbances.

Key words: process control, deep reinforcement learning, twin delayed deep deterministic policy gradient (TD3) model, triangular decay cycle

摘要:

深度强化学习(DRL)算法因其无须依赖历史数据和先验知识,仅通过环境与智能体的互动即可实现策略优化和自主学习,在工业过程控制领域表现出良好的应用前景。其中,基于双延迟深度确定性策略梯度(TD3)算法的控制策略可有效克服深度确定性策略梯度(DDPG)模型中Q值易被高估,导致次优策略和鲁棒性不佳的缺陷,成为目前最领先的基于深度强化学习的控制模型。然而,原始TD3方法在应用于具有较显著策略波动的工业过程控制时仍显示出局限性,特别是其Q值低估问题会导致模型控制性能不佳。为了解决这些限制,本文提出了一种适用于工业过程控制的增强型TD3控制模型(ETD3),该模型首先建立评估指标来判断行动者(Actor)网络参数的高估或低估情况,并根据评估结果调整输入到批评家(Critic)网络的损失函数。然后,通过替换原始TD3中的固定学习率为三角衰减周期学习率,以提升模型的训练收敛性和控制性能。本文最后通过将增强型TD3算法应用于工业天然气脱水过程的控制过程验证了其有效性。

关键词: 过程控制, 深度强化学习, 双延时深度确定性策略梯度, 三角衰减周期

CLC Number: 

京ICP备12046843号-2;京公网安备 11010102001994号
Copyright © Chemical Industry and Engineering Progress, All Rights Reserved.
E-mail: hgjz@cip.com.cn
Powered by Beijing Magtech Co. Ltd