In-field LLMs are teaching hard lessons about trust. Across finance, healthcare, tax, and cancer biology, accuracy feedback loops, physician oversight, and post-training updates shape how people rely on AI in real-world products [1][2][3][4].
BankToBudget uses GPT-5 under the hood to interpret messy bank exports into a monthly budget, with a backend in Laravel that cleans data before categorization. Feedback on the accuracy of categories and the clarity of results is welcome [1].
Counsel Health embeds LLMs into care with oversight from licensed physicians, aiming for safer, faster care. The company has a $25M Series A led by Andreessen Horowitz and GV, signaling a push for a health AI moat [2].
TaxCalcBench tests frontier models on tax calculation; researchers note that state-of-the-art models calculate less than a third of returns. A calculator tool could help, and, in this space, Gemini models outperformed Claude on this task [3].
Gemma-based work, including Google's C2S-Scale-Gemma-2-27B built with Yale, sits on Hugging Face and GitHub with a bioRxiv preprint. The effort generated a novel cancer therapy pathway hypothesis, but experts stress humans must validate outputs; some note explicit confidence bounds help with verification [4].
Real-world AI needs ongoing updates and human-in-the-loop oversight to stay trusted and defensible.
References
Show HN: BankToBudget – Instantly turn your bank exports into a monthly budget
Shows practical use of GPT-5 in parsing bank data and categorizing transactions; seeks feedback on accuracy and features for improvement.
View sourceShow HN: Counsel Health ($25M Series A) – LLMs for Medical QA and Chat with MDs
Launch of health AI platform with physician oversight, rapid care; discusses moat, records, post-training updates.
View sourceTaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task
TaxCalcBench compares frontier LLMs on tax calculations; Gemini beats Claude; discussions cover tools, risks, reliability, and policy implications.
View sourceGoogle C2S-Scale 27B (based on Gemma) built with Yale generated a novel hypothesis about cancer cellular behavior - Model + resources are now on Hugging Face and GitHub
Discusses Gemma-2 27B LLM used for cancer hypothesis and drug screening; debates novelty, validation, and usefulness of LLMs in biology
View source