Task Description
The agent moves along a 10-cell corridor. The goal is to reach the trophy đ and collect the reward while avoiding the bomb đŖ. Switch modes to change the feedback along the path and observe how different environments affect learning.
đ¯ Reward
- Reach đ: +10, episode ends
- Hit đŖ: -10, episode ends
- Tailwind mode: đŦ small positive feedback along the path
- Headwind mode: đĨ small negative feedback along the path
- False Promise mode: goal becomes đ, only gives +2
đ§ State
playerPos: current cell index (0 ~ 9)
đšī¸ Actions
0: none (stay)
1: right (move one cell right)
2: left (move one cell left)
âī¸ Suggested Parameters
Beginner
- bins = 10, Îĩ = 0.3
- Îą = 0.5, Îŗ = 0.9
- Recommended: None mode
Advanced
- Switch Tailwind / Headwind modes
- Observe how path feedback affects convergence
- Compare "False Promise" vs normal goal