MASK evals with small models

This is a link post for my recent work on the MASK honesty benchmark, posted on github.

Previous
Previous

Belegarth Video Analysis

Next
Next

Sandbagging thought experiment