TY - JOUR
T1 - Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward
AU - Chen, Lixing
AU - Xu, Jie
AU - Lu, Zhuo
N1 - Funding Information:
L. Chen and J. Xu’s work is supported in part by the Army Research Office under Grant W911NF-18-1-0343. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Publisher Copyright:
© 2018 Curran Associates Inc.All rights reserved.
PY - 2018
Y1 - 2018
N2 - In this paper, we study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions. CC-MAB inherits properties from both contextual bandit and combinatorial bandit: it aims to select a set of arms in each round based on the side information (a.k.a. context) associated with the arms. By “volatile arms”, we mean that the available arms to select from in each round may change; and by “submodular rewards”, we mean that the total reward achieved by selected arms is not a simple sum of individual rewards but demonstrates a feature of diminishing returns determined by the relations between selected arms (e.g. relevance and redundancy). Volatile arms and submodular rewards are often seen in many real-world applications, e.g. recommender systems and crowdsourcing, in which multi-armed bandit (MAB) based strategies are extensively applied. Although there exist works that investigate these issues separately based on standard MAB, jointly considering all these issues in a single MAB problem requires very different algorithm design and regret analysis. Our algorithm CC-MAB provides an online decision-making policy in a contextual and combinatorial bandit setting and effectively addresses the issues raised by volatile arms and submodular reward functions. The proposed algorithm is proved to achieve O(cT 2 3 α α + + D D log(T)) regret after a span of T rounds. The performance of CC-MAB is evaluated by experiments conducted on a real-world crowdsourcing dataset, and the result shows that our algorithm outperforms the prior art.
AB - In this paper, we study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions. CC-MAB inherits properties from both contextual bandit and combinatorial bandit: it aims to select a set of arms in each round based on the side information (a.k.a. context) associated with the arms. By “volatile arms”, we mean that the available arms to select from in each round may change; and by “submodular rewards”, we mean that the total reward achieved by selected arms is not a simple sum of individual rewards but demonstrates a feature of diminishing returns determined by the relations between selected arms (e.g. relevance and redundancy). Volatile arms and submodular rewards are often seen in many real-world applications, e.g. recommender systems and crowdsourcing, in which multi-armed bandit (MAB) based strategies are extensively applied. Although there exist works that investigate these issues separately based on standard MAB, jointly considering all these issues in a single MAB problem requires very different algorithm design and regret analysis. Our algorithm CC-MAB provides an online decision-making policy in a contextual and combinatorial bandit setting and effectively addresses the issues raised by volatile arms and submodular reward functions. The proposed algorithm is proved to achieve O(cT 2 3 α α + + D D log(T)) regret after a span of T rounds. The performance of CC-MAB is evaluated by experiments conducted on a real-world crowdsourcing dataset, and the result shows that our algorithm outperforms the prior art.
UR - http://www.scopus.com/inward/record.url?scp=85064809214&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064809214&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85064809214
VL - 2018-December
SP - 3247
EP - 3256
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
SN - 1049-5258
T2 - 32nd Conference on Neural Information Processing Systems, NeurIPS 2018
Y2 - 2 December 2018 through 8 December 2018
ER -