Corrections + Few Shot Examples (Part 2) | LangSmith Evaluations

Подписаться 58 тыс.

Просмотров 3,4 тыс.

50% 1

Evaluation is the process of continuously improving your LLM application. This requires a way to judge your application’s outputs, which are often natural language. Using an LLM to grade natural language outputs (e.g., for correctness relative to a reference answer, tone, or conciseness) is a popular approach, but requires prompt engineering and careful auditing of the LLM judge!
Our new release of LangSmith presents a solution to this rising problem, allowing a user to (1) correct LLM-as-a-Judge outputs and then (2) pass those corrections back to the judge as few-shot example for future iterations. This creates LLM-as-a-Judge evaluators grounded in human feedback that better encode your preferences without the need for challenging prompt engineering.
Here we show how apply Corrections + Few Shot to online evaluators that are pinned to a dataset.

Опубликовано:

4 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 4

@nidhir9107 Месяц назад

The video explore the same content as the previous video (part 1) of the LangSmith playlist. Why?

@lukem121 3 месяца назад

Great content as usual! I'm really excited for the new advanced customer support agent that will be using TypeScript. Do you have any updates on when the video will be published?

@darkmatter9583 3 месяца назад

thank you for everything,im bad now ,but i still followijg your channel and supporting, all your effort best of luck and best wishes ❤

@93simongh 3 месяца назад

In my experience and from some articles I read it appears that asking to provide a numeric score for evaluation is very susceptible to undeterministic, variable results everytime the evaluation prompt is launched. Are these numeric scores shown in langmsith to be trusted?