Tags a.i.2 AUC1 Base1 Classification1 evaluation1 Evaluation2 F1-score1 fine tuning1 Human Evaluation1 LangSmith1 LLM1 LLM as a Judge1 Neural Network1 Precision1 Recall1 ROC1 Unit tests2