4
None Tailwind Headwind False Promise

Task Description

The agent moves along a 10-cell corridor. The goal is to reach the trophy 🏆 and collect the reward while avoiding the bomb đŸ’Ŗ. Switch modes to change the feedback along the path and observe how different environments affect learning.

đŸŽ¯ Reward
🧭 State
đŸ•šī¸ Actions
âš™ī¸ Suggested Parameters

Beginner

  • bins = 10, Îĩ = 0.3
  • Îą = 0.5, Îŗ = 0.9
  • Recommended: None mode

Advanced

  • Switch Tailwind / Headwind modes
  • Observe how path feedback affects convergence
  • Compare "False Promise" vs normal goal