Agentic systems exhibit emergent behaviors, adapt to changing contexts, and pursue multi-step objectives. These characteristics make evaluation more complex than measuring accuracy on a test set. We need frameworks that capture autonomy, robustness, and goal alignment.