▤ Blog · 1180 wordsOPERATOR · 86%
Stop paying for eval platforms you don't need.
Every LLM provider is shipping first-party eval tooling now. The third-party category has 18 months — maybe less.
Three platforms confirmed first-party eval features this week. The real question is what happens to the category. For seed-to-Series-A teams, a 100-line harness catches 80% of regressions; the rest is theatrics for a board slide.
4 sources cited↪ open in editor
𝕏 · Thread · 7 postsREADY
JL
Jay Legaspi
@jaylegaspi · drafted 09:02
a 100-line eval harness catches 80% of regressions for teams under 10k req/day.
the rest is theater for the board deck.
🧵 on what to actually run, what to skip, and the $0.04/test math that surprised me last quarter.
est. read 38s↪ push to Typefully
in · Post · 220 wordsREADY
JL
Jay Legaspi · Founder
3rd · drafted 09:02
Spent the morning replacing our eval platform with a 100-line script.
It catches 80% of regressions. We were paying $1,200/mo for the rest — most of which we'd never read.
A few notes on what we kept, what we dropped, and the question I wish someone had asked us a year ago.
est. read 55s↪ push to Buffer