MoME Reduces LLM Hallucinations by 10X!

Elvis Saravia

Подписаться 14 тыс.

Просмотров 9 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 16

@bluetensee 3 месяца назад

good job again. thank you for your insightful expertise! and thanks for not clickbaiting!!! hope you'll get a lot more followers soon. you deserve it!

@elvissaravia 3 месяца назад

I appreciate that!

@aireddy 3 месяца назад

It is fantastic if it is really reducing 10x hallucinations. Thank you for sharing your thoughts!!

@terionname 3 месяца назад

not open source =(

@austrich0 19 дней назад

if you have a candidate that lies 5% of the time, but u cant tell what 5% because they lie convincingly, would that help or hurt your business to hire them?

@williamzhao3885 3 месяца назад

I feel like 95% is hard to believe. are they really training 1 million models? I am also not sure how accurate is their routing model

@elvissaravia 3 месяца назад

There are a lot of parts to look more closely. I am also wonder how general the approach is to different domains and type of data.

@novantha1 3 месяца назад

So, I think it's a bit misleading, or perhaps unintuitive, rather, that this technique was labelled "MoE". It's more like S-Lora, where the model actively swaps out relevant LoRAs at inference time. It's not strictly speaking anything "new" as such, but a series of existing techniques tied together into a simple package. I'm not sure how useful it really is to the broader community, particularly given that it's not open source, and that there are existing techniques, like mechanistic interpretability, that should essentially do something really quite similar at the end of the day, to say nothing of advancements in reinforcement learning which will not eliminate an LLM's ability to lack confidence (raw LLMs actually have a pretty good internal estimate before instruction tuning of how accurate the facts they're saying are, we just destroy it in fine tuning atm, but forcing them to answer confidently).

@mihaitanita 3 месяца назад

Hmm. Lots of PR stunts on their blog. So still... skeptical. I really don't get the main trickery, and 200 API calls per month is not enough to get a proper test-through. "Internal memorization. Tuning the weights, not RAG. You can layer them." /via X.

@marinepower 3 месяца назад

This is interesting and somewhat aligns with how the brain seems to work. We have general capabilities that we use all the time, but we are also able to retrieve memories even after years of not accessing them. So it implies that we have weights that change, and memories that are more static / MoE-like where we can pull them up at will.

@yahm0n 3 месяца назад

This seems the same as regular mixture of experts.

@jeffg4686 3 месяца назад

Nice! That does add a lot more comfort in correct answers. The "mixture of agents" model architecture is coming in with some good stuff too (not as good as this though - this is big). We're not far from some really smart agents...

@pradeepbansal23 3 месяца назад

But is it right to call this as innovation ? Just training million of experts with task specific facts can't be said to be research ?

@xt-89907 3 месяца назад

It’s special because it swaps in those experts within a larger architecture. Related research on polysemanticity also suggests that sparsity will enhance explainability and steer ability