The Journal
Programming, entrepreneurship, machine learning — dated, irregular.
- What I learned asking 11 AI models to grade each other's AI predictions
An experiment on model personalities, a delusion index, and the open-weight dark horse contender I didn't see coming.
- Opus 4.7 isn't dumb, it's just lazy
Some follow up experiments with Claude Opus 4.7 based on Simon Willison's Pelican Benchmark Shocker.