Which Tokens You Predict Underlie the Reversal Curse and More

Подписаться 6 тыс.

Просмотров 1,4 тыс.

50% 1

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
arxiv.org/abs/2406.05183
The other video of mine that I mentioned:
• The Pitfalls of Next T...
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
/ tunadorable
account.venmo.com/u/tunadorable
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tunadorable

Опубликовано:

26 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 19

@OpenSourceAnarchist 5 дней назад

12:22 I actually really appreciate the quick break downs. I've been learning all about the guts of neural networks and the math behind them, but only in an ad hoc way through RU-vid (cog sci background). The in-line banter and commentary is wonderful :)

@marcfruchtman9473 5 дней назад

I think explanation is good when things haven't been covered before. Thanks for the video.

@netherportals 5 дней назад

I use a lot of generators and it's funny, some will just use their own version of what u say, like "mushroom" will go through as "fantasy setting" , then u get an image with no mushroom. Better models like this can create more accurate generations, so I'm stolked. Thanks for your very informative video, enjoy yo code summer

@mickelodiansurname9578 4 дня назад

Harrison Kinsley mentioned this when GPT3.5 was released that he first asked it "Who is Harrison Kinsley?" and it did not know but when he asked it "Who is SentDex it mentioned it s a channel run by Harrison Kinsley." So its probably safe to assume its a reversal curse.

@jakeaustria5445 21 час назад

Don't know yet how masking works, I still need to study that one. But great video as always. I didn't know before that Reversal Curse is a thing before this vid.

@andrewsilber 5 дней назад

That finding is mildly disconcerting. Doesn’t it imply that even at the higher layers of abstraction it doesn’t glean the concept of identity or simple IS-A relationships ? If that’s the case, then what else *isn’t* it understanding?

@Tolken00 5 дней назад

This is so cool! Makes me excited for what's possible!

@andybrice2711 5 дней назад

This further convinces me that we ought to be incorporating some sort of knowledge graph into LLMs.

@alexanderbrown-dg3sy 5 дней назад

Without any order enhanced pretraining, would still have the limitation if you consumed that KG using next-token prediction though…but I definitely agree with this sentiment in general.

@BooleanDisorder 5 дней назад

Combine them with graph neural networks that take knowledge as input. AI could put relevant parts in the gnn input itself

@alexanderbrown-dg3sy 5 дней назад

@@BooleanDisorder true. I seen research on linearizing and tokenizing KG’s…with any order optimized pretraining, you would get the same benefit as combining a LM + GNN…with the added benefit of the scaling benefit of LM’s.

@aboubenadhem9066 5 дней назад

Last paragraph on p3 implies that “entity pre-parsing” would be one way around the issue. Does that mean training the model on parse trees instead of linear text order?

@wwkk4964 5 дней назад

Thanks for sharing! This solution along with a dynamic tokenizer that is allowed to have multiple tokens or multi symbol representations in it vocabulary that it is allowed to learn on the fly as it sees new input would be the way to go. I think the Tokenizer can even learn things at the level of lexical units so that the model only has to see abstractions it must solve.

@wwkk4964 5 дней назад

The Wikireversal table of results was enlightening. 1. The fact that MLM-U trained model had a much more similar backwards vs forwards score, gives me confidence that its learning was probably more conceptual and relational vs pure memorization which we would expect if the learning was strongly influenced by the direction or chain of events.