The row looked perfect. rating: 7. Valid JSON, right type, no nulls, no missing keys. My schema check waved it through. The page had returned HTTP 200. The selectors hadn't moved. Everything green.
A rating of 7 on a 5-star site is impossible. The model invented it, formatted it correctly, and handed it to me with total confidence.
That's the failure I want to talk about. Not the scraper that breaks loudly. The one that hands you a clean-looking row that is quietly, plausibly false — and sails past every check you have, because your checks are all looking at the shape of the data, and the lie is in the value.
TL;DR
HTTP 200, intact selectors, and valid JSON tell you the form is fine. They say nothing about whether the value is true.
When an LLM extracts from messy free-text, structured-output mode guarantees you get valid JSON. It does not guarantee the content is real. The model fills uncertain fields rather than leaving them empty — because the schema demands a com
Discussion
Break the silence
Take the opportunity to kick things off.