Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[KDD'23] Off-Policy Evaluation of Ranking Policies under Diverse User Behaviors

[KDD'23] Off-Policy Evaluation of Ranking Policies under Diverse User Behaviors

Haruka Kiyohara

July 16, 2023
Tweet

More Decks by Haruka Kiyohara

Other Decks in Research

Transcript

  1. Off-Policy Evaluation of Ranking Policies
    under Diverse User Behaviors
    Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita,
    Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito
    Haruka Kiyohara
    https://sites.google.com/view/harukakiyohara
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 1

    View Slide

  2. Real world ranking decision making
    Examples of recommending a ranking of items
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 2
    • Search Engine
    • Music Streaming
    • E-commerce
    • News
    • and more..!
    Can we evaluate the value of
    these rankings offline in advance?


    View Slide

  3. How does a ranking system work?
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 3
    ranking with 𝑲 items
    a coming user
    context
    clicks
    reward(s)

    View Slide

  4. How does a ranking system work?
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 4
    ranking with 𝑲 items
    a coming user
    context
    clicks
    reward(s)
    a ranking policy

    ▼ evaluate this one

    View Slide

  5. Evaluating with the policy value
    We evaluate a ranking policy with its expected ranking-wise reward.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 5

    View Slide

  6. Evaluating with the policy value
    We evaluate a ranking policy with its expected ranking-wise reward.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 6
    position-wise policy value

    View Slide

  7. Off-policy evaluation; OPE
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 7
    ranking with 𝑲 items
    a coming user
    context
    clicks
    reward(s)
    a logging policy

    View Slide

  8. Off-policy evaluation; OPE
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 8
    ranking with 𝑲 items
    a coming user
    context
    clicks
    reward(s)
    a logging policy

    View Slide

  9. Off-policy evaluation; OPE
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 9
    ranking with 𝑲 items
    a coming user
    context
    clicks
    reward(s)
    a logging policy
    an evaluation policy

    View Slide

  10. Off-policy evaluation; OPE
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 10
    a logging policy
    an evaluation policy
    OPE estimator

    View Slide

  11. De-facto approach: Inverse Propensity Scoring [Strehl+,10]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 11
    importance weight
    ・unbiased

    View Slide

  12. De-facto approach: Inverse Propensity Scoring [Strehl+,10]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 12
    importance weight
    ・unbiased
    evaluation
    logging action A action B
    more
    less
    less
    more
    action A action B

    View Slide

  13. De-facto approach: Inverse Propensity Scoring [Strehl+,10]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 13
    importance weight
    ・unbiased
    evaluation
    logging action A action B
    correcting distribution shift

    View Slide

  14. De-facto approach: Inverse Propensity Scoring [Strehl+,10]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 14
    importance weight
    ・unbiased
    ・variance

    View Slide

  15. De-facto approach: Inverse Propensity Scoring [Strehl+,10]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 15
    importance weight
    ・unbiased
    ・variance
    When 𝜋0
    is the uniform random policy,

    View Slide

  16. De-facto approach: Inverse Propensity Scoring [Strehl+,10]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 16
    importance weight
    ・unbiased
    ・variance!!
    When 𝜋0
    is the uniform random policy,

    View Slide

  17. User behavior assumptions for variance reduction
    We assume that users are affected only by some subsets of actions.
    • Independent IPS [Li+,18]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 17

    View Slide

  18. User behavior assumptions for variance reduction
    We assume that users are affected only by some subsets of actions.
    • Independent IPS [Li+,18]
    • Reward Interaction IPS [McInerney+,20]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 18

    View Slide

  19. Introducing an assumption, but is this enough?
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 19
    Bias
    Variance
    IIPS
    RIPS
    IPS
    independent
    cascade
    standard
    click model
    IIPS: [Li+,18], RIPS: [McInerney+,20], IPS: [Precup+,00]
    bias variance tradeoff depending on
    a single user behavior assumption

    View Slide

  20. User behavior can change with the user context
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 20
    query: clothes (general)
    -> only browse top results
    query: T-shirts (specific)
    -> click after browsing more items
    clothes

    T-shirts

    User behavior can change with search query, users’ browsing history, etc..

    View Slide

  21. Introducing an assumption, but is this enough?
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 21
    Bias
    Variance
    IIPS
    RIPS
    IPS
    independent
    cascade
    standard
    click model
    IIPS: [Li+,18], RIPS: [McInerney+,20], IPS: [Precup+,00]
    Why not use adaptive user behavior
    for each users?
    bias variance tradeoff depending on
    the user behavior assumption

    View Slide

  22. Adaptive IPS for diverse users
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 22

    View Slide

  23. Our idea: Adapting to user behavior
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 23
    Our idea
    A single, universal assumption

    View Slide

  24. Our idea: Adapting to user behavior
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 24
    Our idea
    (true) user behavior mismatch!

    View Slide

  25. Our idea: Adapting to user behavior
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 25
    Our idea
    (true) user behavior
    bias
    mismatch!

    View Slide

  26. Our idea: Adapting to user behavior
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 26
    Our idea
    (true) user behavior
    excessive
    variance
    mismatch!

    View Slide

  27. Our idea: Adapting to user behavior
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 27
    Our idea
    adaptive! -> reduces mismatch on assumptions

    View Slide

  28. Our proposal: Adaptive IPS
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 28
    Statistical benefits
    • Unbiased under any given user behavior model.
    • Minimum variance among other IPS-based unbiased estimators.
    importance weight of only relevant actions

    View Slide

  29. How much variance is reduced by AIPS?
    AIPS reduces the variance of irrelevant actions.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 29
    : relevant actions
    : irrelevant actions

    View Slide

  30. What the bias will be when 𝑐 is unavailable?
    In practice, user behavior 𝑐 is often unobservable, thus consider ̂
    𝑐 instead.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 30
    overlap matters

    View Slide

  31. What the bias will be when 𝑐 is unavailable?
    In practice, user behavior 𝑐 is often unobservable, thus consider ̂
    𝑐 instead.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 31
    overlap matters

    View Slide

  32. What the bias will be when 𝑐 is unavailable?
    In practice, user behavior 𝑐 is often unobservable, thus consider ̂
    𝑐 instead.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 32
    overlap matters
    small bias large bias
    source of bias

    View Slide

  33. Controlling the bias-variance tradeoff
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 33
    action set

    View Slide

  34. Controlling the bias-variance tradeoff
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 34
    weaker assumption (e.g., no assumption)
    = unbiased (or less biased), but have large variance
    action set

    View Slide

  35. Controlling the bias-variance tradeoff
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 35
    stronger assumption (e.g., independent)
    = more biased, but have lower variance
    action set
    weaker assumption (e.g., no assumption)
    = unbiased (or less biased), but have large variance

    View Slide

  36. Controlling the bias-variance tradeoff
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 36
    stronger assumption (e.g., independent)
    = more biased, but have lower variance
    action set
    weaker assumption (e.g., no assumption)
    = unbiased (or less biased), but have large variance
    Why not optimize #
    𝒄 instead of using 𝒄
    for a better bias-variance tradeoff?

    View Slide

  37. How to estimate optimize the user behavior model?
    Based on the bias-variance analysis, we optimize the user behavior to minimize MSE.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 37
    MSE estimation: [Su+,20] [Udagawa+,23]
    to minimize MSE

    View Slide

  38. How to estimate optimize the user behavior model?
    Based on the bias-variance analysis, we optimize the user behavior to minimize MSE.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 38
    context space
    to minimize MSE

    View Slide

  39. How to estimate optimize the user behavior model?
    Based on the bias-variance analysis, we optimize the user behavior to minimize MSE.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 39
    context space
    to minimize MSE

    View Slide

  40. How to estimate optimize the user behavior model?
    Based on the bias-variance analysis, we optimize the user behavior to minimize MSE.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 40
    context space
    to minimize MSE

    View Slide

  41. How to estimate optimize the user behavior model?
    Based on the bias-variance analysis, we optimize the user behavior to minimize MSE.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 41
    context space
    to minimize MSE

    View Slide

  42. How to estimate optimize the user behavior model?
    Based on the bias-variance analysis, we optimize the user behavior to minimize MSE.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 42
    context space
    to minimize MSE

    View Slide

  43. How to estimate optimize the user behavior model?
    Based on the bias-variance analysis, we optimize the user behavior to minimize MSE.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 43
    context space
    to minimize MSE

    View Slide

  44. Experiments
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 44

    View Slide

  45. Experiment on diverse user behaviors
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 45
    interactions from relevant actions
    (simple) (diverse) (complex)
    user behavior
    distributions

    View Slide

  46. AIPS works well across various user behaviors
    AIPS is accurate even under diverse and complex user behaviors!
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 46
    performance:
    a lower value
    is better
    (simple) (diverse) (complex)
    user behavior
    distributions

    View Slide

  47. AIPS also works well across various configurations
    AIPS effectively balances the bias-variance tradeoff.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 47
    slate sizes
    data sizes

    View Slide

  48. Real-world experiment
    We conduct the experiment with the data from an e-commerce platform.
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 48
    the best in more than 75% of trials improves worst case performance

    View Slide

  49. Summary
    • Effectively controlling the bias-variance tradeoff is the key in OPE of ranking policies.
    • However, existing estimators apply a single user behavior, arising both excessive
    bias and variance in the presence of diverse user behaviors.
    • In response, we propose Adaptive IPS, which switches importance weight
    to minimize the estimation error depending on the user context.
    AIPS enables an accurate OPE estimation under diverse user behaviors!
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 49

    View Slide

  50. Thank you for listening!
    contact: [email protected]
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 50

    View Slide

  51. References
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 51

    View Slide

  52. References (1/2)
    [Saito+,21] Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita.
    “Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy
    Evaluation.” NeurIPS dataset&benchmark, 2021. https://arxiv.org/abs/2008.07146
    [Li+,18] Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa
    Vinay, and Zheng Wen. “Offline Evaluation of Ranking Policies with Click Models.”
    KDD, 2018. https://arxiv.org/abs/1804.10488
    [McInerney+,20] James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra,
    and Ben Carterette. “Counterfactual Evaluation of Slate Recommendations with
    Sequential Reward Interactions.” KDD, 2020. https://arxiv.org/abs/2007.12986
    [Strehl+,10] Alex Strehl, John Langford, Sham Kakade, and Lihong Li. “Learning from
    Logged Implicit Exploration Data.” NeurIPS, 2010. https://arxiv.org/abs/1003.0120
    [Athey&Imbens,16] Susan Athey and Guido Imbens. “Recursive Partitioning for
    Heterogeneous Causal Effects.” PNAS, 2016. https://arxiv.org/abs/1504.01132
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 52

    View Slide

  53. References (2/2)
    [Kiyohara+,22] Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita,
    Nobuyuki Shimizu, and Yasuo Yamamoto. “Doubly Robust Off-Policy Evaluation for
    Ranking Policies under the Cascade Behavior Model.” WSDM, 2022.
    https://arxiv.org/abs/2202.01562
    [Su+,20] Yi Su, Pavithra Srinath, and Akshay Krishnamurthy. “Adaptive Estimator
    Selection for Off-Policy Evaluation.” ICML, 2020. https://arxiv.org/abs/2002.07729
    [Udagawa+,23] Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, and
    Kei Tateno. “Policy-Adaptive Estimator Selection for Off-Policy Evaluation.” AAAI, 2023.
    https://arxiv.org/abs/2211.13904
    August 2023 Adaptive OPE of Ranking Policies @ KDD'23 53

    View Slide