2026
-
PROMISE: Proof Automation as Structural Imitation of Human ReasoningPreprint
-
[C29] ACPO: Agent-Chained Policy Optimization for Multi-Agent Reinforcement Learning
-
[C28] Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
2025
-
Group-Invariant Unsupervised Skill Discovery: Symmetry-aware Skill Representations for Generalizable BehaviorPreprint
-
Semi-gradient DICE for Offline Constrained Reinforcement LearningPreprint
-
[C27] FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning
-
[C26] SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
2024
-
[C23] Mitigating Covariate Shift in Behavioral Cloning via Robust Stationary Distribution Correction
-
[C22] ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making
-
[C25] Body Transformer: Leveraging Robot Embodiment for Policy Learning
-
[C24] Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policiespaper spotlight
2023
-
[C21] AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
-
[C20] SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations
-
[C19] Tempo Adaptation in Non-stationary Reinforcement Learning
2022
-
[C15] LobsDICE: Offline Imitation Learning from Observation via Stationary Distribution Correction Estimation
-
[C14] Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions
-
[C18] COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation
-
[C17] DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations
-
[C16] GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems
2021
-
[C12,W5] OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
-
[C11] Representation Balancing Offline Model-based Reinforcement Learning
-
[C13] Monte-Carlo Planning and Learning with Language Action Value Estimates
2020
-
[C7] Reinforcement Learning for Control with Multiple Frequencies
-
[C10] Batch Reinforcement Learning with Hyperparameter Gradients
-
[C8] Monte-Carlo Tree Search in Continuous Action Spaces with Value Gradients
-
[C9,W4] Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues
2019
-
[C5] Trust Region Sequential Variational Inference
-
[C6] PyOpenDial: A Python-based Domain-Independent Toolkit for Developing Spoken Dialogue Systems with Probabilistic Rules
2018
-
[C4] Monte-Carlo Tree Search for Constrained POMDPs
-
[W3] Monte-Carlo Tree Search for Constrained MDPs
-
[J1] Layered Behavior Modeling via Combining Descriptive and Prescriptive Approaches: a Case Study of Infantry Company Engagement
2017
-
[C3,W2] Constrained Bayesian Reinforcement Learning via Approximate Linear Programming
-
[C2] Hierarchically-partitioned Gaussian Process Approximation
2016
-
[W1] Multi-View Automatic Lip-Reading using Neural Network
-
[C1] Bayesian Reinforcement Learning with Behavioral Feedback