1D Maze

Task Description

The agent moves along a 10-cell corridor. The goal is to reach the trophy 🏆 and collect the reward while avoiding the bomb 💣. Switch modes to change the feedback along the path and observe how different environments affect learning.

🎯 Reward

Reach 🏆: +10, episode ends
Hit 💣: -10, episode ends
Tailwind mode: 🍬 small positive feedback along the path
Headwind mode: 🔥 small negative feedback along the path
False Promise mode: goal becomes 🍕, only gives +2

🧭 State

playerPos: current cell index (0 ~ 9)

🕹️ Actions

0: none (stay)
1: right (move one cell right)
2: left (move one cell left)

⚙️ Suggested Parameters

Beginner

bins = 10, ε = 0.3
α = 0.5, γ = 0.9
Recommended: None mode

Advanced

Switch Tailwind / Headwind modes
Observe how path feedback affects convergence
Compare "False Promise" vs normal goal