搜索结果: 1-10 共查到“统计学 Bandits”相关记录10条 . 查询时间(0.093 秒)
Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling Contextual Bandits Linear Payoffs
2012/11/23
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several s...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ step...
We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker ...
Efficient Optimal Learning for Contextual Bandits
Efficient Optimal Learning Contextual Bandits
2011/7/6
We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken.
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
Finite-Time Multi-armed Bandits Problems Kullback-Leibler Divergences
2011/6/20
We consider a Kullback-Leibler-based algorithmfor the stochastic multi-armed bandit prob-
lem in the case of distributions with finite supports (not necessarily known beforehand),
whose asymptotic r...
Lipschitz Bandits without the Lipschitz Constant
Lipschitz Bandits Constant strategy environments
2011/6/20
We consider the setting of stochastic bandit problems with
a continuum of arms. We first point out that the strategies considered so
far in the literature only provided theoretical guarantees of the...
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
PAC-Bayesian Analysis Martingales Multiarmed Bandits
2011/6/21
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent
random variables. The first is based on a new lemma that enables to bound expectations
of convex functions of...
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
Stochastic Bandits Beyond KL-UCB
2011/3/21
This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded rew...
Nonparametric Bandits with Covariates
Bandit regression regret inferior sampling rate minimax rate
2010/3/11
We consider a bandit problem which involves sequential sampling from two populations
(arms). Each arm produces a noisy reward realization which depends on an observable
random covariate. The goal is...
We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be
a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect to a
dissimi...