Тёмный
No video :(

MAMBA (S6) Fine-Tuned + DPO-Aligned: TEST 

code_your_own_AI
Подписаться 38 тыс.
Просмотров 3,3 тыс.
50% 1

Live test of MAMBA 2.8B fine-tuned and DPO-aligned. Real world performance of MAMBA 2.8B ZEPHYR (SFT + DPO) tested live on several performance tasks, including maths and logical reasoning.
All rights with Authors:
huggingface.co...
huggingface.co...
.. this is a fine-tuned version of xiuyul/mamba-2.8b-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset trained using Direct Preference Optimization (DPO).
For further details (MAMBA code implementation) see my Community tab.
#ai
#aieducation
#airesearch

Опубликовано:

 

26 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 13   
@albertmashy8590
@albertmashy8590 8 месяцев назад
Once we scale & optimise the Mamba architecture, it will have profound and revolutionary effects on society
@ibongamtrang7247
@ibongamtrang7247 8 месяцев назад
unlimited memory. Looking forward 2024 : )
@owenpawling3956
@owenpawling3956 7 месяцев назад
@@ibongamtrang7247I’ve been thinking about wether unlimited memory is even necessary and if you analogize the limited memory as being equivalent to our limited memory until we sleep which has the effect of consolidating memories. For a transformer architecture, this is akin to retraining on the past events. In this scenario, it is only necessary to have enough memory to make it through the day.
@owenpawling3956
@owenpawling3956 7 месяцев назад
I think that given Mamba’s RNN nature, it will have extremely interesting increases in computation from hierarchical MoEs
@waynelast1685
@waynelast1685 8 месяцев назад
Video suggestion: Update what is happening, tools, vendors, etc in the field of AUTO ML . Your last series was 2 years ago. Maybe things have changed , or maybe not so much ( then a short video? ) ?
@BooleanDisorder
@BooleanDisorder 8 месяцев назад
Seems like it seemingly likes alliterations a lot.
@chickenp7038
@chickenp7038 8 месяцев назад
i wish i would understand mamba better
@vladimirtchuiev2218
@vladimirtchuiev2218 7 месяцев назад
This thing is a quiet impressive corporate speech generator, a lot of text saying absolutely nothing.
@grazianomanduzio6800
@grazianomanduzio6800 7 месяцев назад
Which ide are you using for the test?
@Polymathlete
@Polymathlete 7 месяцев назад
(Agent finishes giant off-topic ramble about its own climate research) User: Umm, yes, very impressive... but what about scenario 2? Agent: Let me tell you a story about Muskrat Falls Hydro Developers...
@first-thoughtgiver-of-will2456
@first-thoughtgiver-of-will2456 3 месяца назад
ive seen much larger cutting edge ML transformer models perform much worse
@christophemalvasio5569
@christophemalvasio5569 8 месяцев назад
this is not serious ! tag transformers at HG while rwkv is not missing vulkan handling and langchain like interface try to provide a collaborative console interface for fine tuning...
@derghiarrinde
@derghiarrinde 8 месяцев назад
Hey, this looks like you didn't even give it a test run before making a video. It's kid of difficult to test 2.8b parameter model against even 7b models. I also watched an earlier video from you, there were some sound issues with pheme S at the end of sentences. Please make your videos higher quality than this.
Далее
MAMBA AI (S6): Better than Transformers?
45:48
Просмотров 33 тыс.
A New Class of AI Emerges
10:38
Просмотров 8 тыс.
ПАВЕЛ ДУРОВ АРЕСТОВАН
1:45:21
Просмотров 114 тыс.
拉了好大一坨#斗罗大陆#唐三小舞#小丑
00:11
The Greenwich Meridian is in the wrong place
25:07
Просмотров 835 тыс.
The Surgery That Proved There Is No Free Will
29:43
Просмотров 290 тыс.
I've been using Redis wrong this whole time...
20:53
Просмотров 354 тыс.
PR-453: Direct Preference Optimization
37:12
Просмотров 3,8 тыс.
GraphRAG: The Most Incredible RAG Strategy Revealed
10:38
Understanding Mamba and State Space Models
27:41
Просмотров 4,3 тыс.
ПАВЕЛ ДУРОВ АРЕСТОВАН
1:45:21
Просмотров 114 тыс.