Do Models Cheat on Tests? - Jacob Haimes

Подписаться 671

50% 1

Jacob Haimes, host of the Into AI Safety podcast, presents research into benchmark inflation, definitely answering the question "Are LLMs cheating on their tests?" This hacktalk was given as part of the Deception Detection Apart Hackathon, hosted by Apart Research [apartresearch.com/] and Apollo Research [www.apolloresearch.ai] in June of 2024.
Jacob is a researcher at Apart Research Lab and host of the Into AI Safety podcast, specializes in effective research communication. He focuses on governance of advanced machine learning systems, including meta-evaluations and scaling democratic consensus building.
View the presentation slides ⭢ docs.google.com/presentation/...
Listen to the Into AI Safety podcast ⭢ into-ai-safety.github.io
Learn more about the hackathon ⭢ apartresearch.com/event/decep...
Our moderator and organizer is Esben Kran and Apart Research.
This video is a slightly trimmed-down version of the livestream found at ru-vid.comD3b_8SNSS_E.
━━━━━ Chapters ━━━━━
00:00 - Intro
01:22 - Different Types of Deception
02:48 - Data leakage is happening
05:21 - The idea
09:47 - Defining a retro-holdout | Difficulty
12:00 - Defining a retro-holdout | Prediction accuracy
12:29 - Defining a retro-holdout | Human indistinguishability
13:18 - Defining a retro-holdout | Semantic similarity
14:20 - Creating a retro-holdout
16:42 - Figure 1
17:24 - Results | Calibration
18:01 - Results | Contemporary Models
18:33 - Results | Contemporary Benchmark Inflation
19:35 - What's next
20:09 - Shoutouts
20:33 - The best part of the presentation!
21:05 - Questions
21:19 - Questions | Current practices for holdout datasets
25:15 - Questions | Scaling this process
27:20 - Questions | Choice of calibration models
27:58 - Questions | How about using Kaggle-style APIs
29:07 - Questions | Why are developers cheating so much
32:15 - Questions | Citation incentives and rigor in research
34:38 - Questions | How to find the paper
35:09 - Questions | Are they cheatin' though?
35:50 - Questions | The darkest timeline
41:17 - Supplementary Material
43:07 - Questions | More on scaling
46:43 - Outro
━━━━━ Apart Links ━━━━━
Learn more about Apart ⭢ www.apartresearch.com
Join future hackathons and sprints ⭢ apartresearch.com/sprints
Connect with us on Discord ⭢ / discord
Check out potential AI safety projects ⭢ aisafetyideas.com
Stay up-to-date on Google Calendar ⭢ calendar.google.com/calendar/...
Be on the ball with iCal (.ics format) ⭢ calendar.google.com/calendar/...
Follow on Twitter ⭢ / apartresearch
Explore code on GitHub ⭢ github.com/apartresearch
Get professional on LinkedIn ⭢ / apartresearch