No video :(

MAMBA (S6) Fine-Tuned + DPO-Aligned: TEST

Подписаться 38 тыс.

Просмотров 3,3 тыс.

50% 1

Live test of MAMBA 2.8B fine-tuned and DPO-aligned. Real world performance of MAMBA 2.8B ZEPHYR (SFT + DPO) tested live on several performance tasks, including maths and logical reasoning.
All rights with Authors:
huggingface.co...
huggingface.co...
.. this is a fine-tuned version of xiuyul/mamba-2.8b-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset trained using Direct Preference Optimization (DPO).
For further details (MAMBA code implementation) see my Community tab.
#ai
#aieducation
#airesearch

Опубликовано:

26 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 13

@albertmashy8590 8 месяцев назад

Once we scale & optimise the Mamba architecture, it will have profound and revolutionary effects on society

@ibongamtrang7247 8 месяцев назад

unlimited memory. Looking forward 2024 : )

@owenpawling3956 7 месяцев назад

@@ibongamtrang7247I’ve been thinking about wether unlimited memory is even necessary and if you analogize the limited memory as being equivalent to our limited memory until we sleep which has the effect of consolidating memories. For a transformer architecture, this is akin to retraining on the past events. In this scenario, it is only necessary to have enough memory to make it through the day.

@owenpawling3956 7 месяцев назад

I think that given Mamba’s RNN nature, it will have extremely interesting increases in computation from hierarchical MoEs

@waynelast1685 8 месяцев назад

Video suggestion: Update what is happening, tools, vendors, etc in the field of AUTO ML . Your last series was 2 years ago. Maybe things have changed , or maybe not so much ( then a short video? ) ?

@BooleanDisorder 8 месяцев назад

Seems like it seemingly likes alliterations a lot.

@chickenp7038 8 месяцев назад

i wish i would understand mamba better

@vladimirtchuiev2218 7 месяцев назад

This thing is a quiet impressive corporate speech generator, a lot of text saying absolutely nothing.

@grazianomanduzio6800 7 месяцев назад

Which ide are you using for the test?

@Polymathlete 7 месяцев назад

(Agent finishes giant off-topic ramble about its own climate research) User: Umm, yes, very impressive... but what about scenario 2? Agent: Let me tell you a story about Muskrat Falls Hydro Developers...

@first-thoughtgiver-of-will2456 3 месяца назад

ive seen much larger cutting edge ML transformer models perform much worse

@christophemalvasio5569 8 месяцев назад

this is not serious ! tag transformers at HG while rwkv is not missing vulkan handling and langchain like interface try to provide a collaborative console interface for fine tuning...

@derghiarrinde 8 месяцев назад

Hey, this looks like you didn't even give it a test run before making a video. It's kid of difficult to test 2.8b parameter model against even 7b models. I also watched an earlier video from you, there were some sound issues with pheme S at the end of sentences. Please make your videos higher quality than this.