A Novel Two-Stage Dynamic Pricing Model for Logistics Planning Using an Exploration-Exploitation Framework: A Multi-Armed Bandit Problem

Mahmoud Tajik,Babak Mohamadpour Tosarkani,Ahmad Makui,Rouzbeh Ghousi

Expert Systems with Applications An International Journal（2024）

引用 0|浏览12

暂无评分

摘要

Dynamic pricing is utilized as a tool for determining pricing strategies in different sectors, such as the food industries, airline companies, and transportation services. Dynamic pricing aims to maximize revenue based on learning the price-demand relationship. It is imperative to perform price change testing within a short time frame since rapid changes can negatively impact the market. Therefore, there are two conflicting objectives: maximizing revenue and learning the demand function, which is called the exploration-exploitation concept. In this study, we propose an exploration-exploitation framework, focused on the domestic delivery service of Tipax Express Pars Company (TEPC), to determine the optimal selling price. In the exploration process, the initial demand function is proposed based on previous sales data using weighted linear regression. Then, the selling price and delivery lead-time parameters are incorporated into the demand function hypotheses. After that, the value of the optimal demand function parameters is determined based on the Z-Price Changes Algorithm (ZPCA). In the exploitation process, price optimization is designed as a Multi-armed Bandit (MAB) problem and solved using (i) Upper Confidence Bound Method (UCBM) (i.e., deterministic approach), and (ii) Thompson Sampling Method (TSM) (i.e., stochastic approach). The proposed exploration-exploitation framework is evaluated in 361 days, of which the exploration and exploitation durations are 31 and 330 days, respectively. In this study, 10,000 demand functions are investigated, which leads to the complexity of the developed model. The cross-validation approach, based on Mean Squared Error (MSE), is developed to decrease the complexity of the proposed model. A simulation study of 500 simulations demonstrates that TSM is more useful than UCBM because its solution is more robust. Furthermore, the results demonstrate that TEPC can gain maximum total expected revenue, maximum total number of selections during the exploitation stage, and maximum actual demand using a gaining market share strategy.

查看译文

关键词

Dynamic pricing,Multi-armed Bandit (MAB),Upper Confidence Bound Method (UCBM),Thompson Sampling Method (TSM),Logistics planning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要