Thank you for this nice getting started video. Could learn a lot from it. One question, did you write the function json schema or you used any function to generate the schema.
THat was an interesting video about steganographic techniques. The first thing I thought of was the movie Sneakers, where they used a cassette tape as the steganographic device. The movie is about 20+
Yeah it valueable. I've just added one to my read list. I don't think anyone goes over papers in a quick-fire method as you've just done where similar (also dis-similar) papers are compared at a high level on why you picked them and some institution of how they work.
This is a really interesting idea.. With regards to guidance, do you know if anyone has tried to train models that try and predict those noise deltas you would get from pushing the gradient backwards, but doing it in a forward direction? For example, you could train a model to take the features from the "up" layers in Stable Diffusion, and then predict a secondary noise delta to try and correct the regular Stable Diffusions noise in the right direction. Like estimate what the delta of the back propagation noise would need to be. I'm not sure if that would actually save in compute power during inference, compared to having to do a backwards pass and holding the whole graph while you generate images, because I assume that model would need to be reasonable in size. But it might also allow larger steps in prediction than you might get from a single gradient backwards pass. And it would remove the need for an internal RGB image stage at all, because you would only need to use that model during training.. Although it would likely break the awesome part of this method that it requires very few samples to get good results, at the cost of shifting work to the inference stage.
hello i have a confusion regarding one topic in diffusions, can i get your contact info Jonathan, like your mail ID or any other contact info please, i am working on my last year project anything would help
However I have one small question about the overfit part at the end of this video. Is it about that the test set translated into Japanese might be learned or finetuned by the math 7B llm?
Oh my god, man you don't understand how happy I am for your storytelling about how things went in the timeline of developing on the idea of model merging up to this point, where it started how it went, that and how they were thinking about reasons why it works, etc.etc.. I want to get into this so that I understand the main ideas and be able to start working on these as well, but it's so hard to get to the root of things, it requires a huge amount of time to read and digest everything and slowly being able to put the pieces together, so boy do I mean it when I say thank you!
Hi Johno, at the beginning you said you're somewhat skeptical of model merging. Iiuc, your criticism is only about iterative merging for a given goal, which leads to overfitting. Or are you skeptical of the general concept of model merging? Thanks!
That was an excellent overview of not just Sakana's evolutionary methods to identify good merge candidates, but also the popular techniques TIES, DARE and Passthrough/Frankenmerge. Appreciate it as usual, Johno!
I really appreciate both the content of the papers, as well as your work to make papers more approachable. I wonder if DSPy(-ish) syntax could be a way for papers to share their LLM algorithms in a standard/comparable way. Having a way to quickly compare approaches could help contextualize/understand new approaches. I am looking forward to a video specifically on DSPy if you make one. Based on your presentation I am going to give it a try.
Guy with worst image quality ever explains technique that produces best image quality ever. Just some geek joke, thank you for such a nice presentation
🎯 Key Takeaways for quick navigation: Note: It is "Orca" papers but the Harpa AI generated those as "Ora". 02:05 📚 *The session focuses on a paperathon where the goal is to collaboratively read and discuss various papers related to AI and machine learning.* 03:14 🧠 *The speaker outlines a general pipeline for training AI models, covering stages like data generation, pre-training, fine-tuning, alignment with human preferences, and model deployment.* 06:33 🤖 *The discussion shifts to the Ora paper, emphasizing teaching smaller language models to reason by using intermediate steps generated by a larger model.* 11:53 🌐 *Ora 2 builds on the original Ora paper by exploring improved training signals to enhance smaller language models' reasoning abilities, focusing on determining the most effective strategy for each task.* 15:29 🎓 *Ora 2 introduces task-specific system instructions, optimizing the model for various reasoning strategies tailored to different subtasks, aiming for a more versatile and effective chat model.* 17:36 📊 *Ora 2 demonstrates improved performance, surpassing other comparable models in various benchmarks, showcasing the effectiveness of its approach to diverse reasoning tasks.* 24:06 🧠 *Researchers generate synthetic data for diverse image editing tasks, customizing examples for tasks using llama 2 and various techniques.* 25:54 🖼️ *Emu Edit, a diffusion model, is designed to multitask, providing conditioning for different tasks while intelligently guessing the user's desired edit.* 28:57 🔄 *The approach of generating synthetic data for training models, as seen in Emu Edit, yields powerful editing capabilities, surpassing models that rely on less principled synthetic data generation.* 49:57 🧠 *The paper discusses the challenge of storing and training large language models with billions of parameters due to high memory requirements.* 51:08 🎯 *The proposed approach aims to reduce memory usage for fine-tuning large models, making it accessible even on a single 48 GB GPU.* 51:49 🛠️ *Low-rank adapters are introduced to train fewer parameters by applying a Delta to the base weights, further minimizing memory overhead.* 52:30 📉 *Quantization is used to shrink the base model even more, achieving efficient memory usage while maintaining model performance.* 57:25 🚀 *The paper demonstrates that the proposed approach allows for training models with comparable performance to full fine-tuning but with significantly fewer resources and increased efficiency.* [01:15:00 URL](ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YNOIyvUCpAs.html) *📑 The paper introduces a prompt for information removal, ensuring unbiased context for answering questions.* [01:20:52 URL](ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YNOIyvUCpAs.html) *🤖 "Zepha" is a paper discussing a recipe for training a high-performing chat model, focusing on fine-tuning and preference alignment.* [01:22:03 URL](ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YNOIyvUCpAs.html) *🏆 Zepha achieves state-of-the-art performance on chat benchmarks, outperforming other models in the 7B parameter setting.* [01:26:43 URL](ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YNOIyvUCpAs.html) *🔄 The combination of Distilled Supervised Fine Tuning and Direct Preference Optimization yields the best-performing chat model.* [01:34:34 URL](ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YNOIyvUCpAs.html) *🚀 Direct Preference Optimization (DPO) solves the reward maximization problem in a single-stage policy training, making it computationally lightweight and efficient.* 01:38:00 🔄 *DPO allows measuring the likelihood or perplexity of sequences, providing a way to evaluate the model's performance in generating sentences.* 01:41:07 📊 *The DPO loss function focuses on increasing the likelihood of good completions, and its effectiveness lies in updating the model based on preference-ranked pairs.* 01:42:14 💻 *DPO outperforms fine-tuning on preferences or the base model, offering a straightforward and efficient way to leverage preference data for model improvement.* 02:03:20 📸 *Contrastive learning involves mapping images to an embedding space, making similar images close and dissimilar images distant.* 02:05:32 🔄 *Self-supervised techniques include generating variants of an image and ensuring embeddings of similar images are close in the embedding space.* 02:08:06 🌐 *In self-supervised learning, invariance-based methods aim for similar embeddings for different views, while generative methods involve filling in gaps or completing parts of an image.* 02:16:15 🚀 *The proposed method outperforms other techniques, requiring less pre-training and achieving high scores with minimal labeled data on ImageNet.* 02:24:49 🤔 *The conditioning variable Z in the joint embedding-predictive architecture specifies the positional information for predictions during training but is not used during inference.* 02:27:37 📊 *The Z variable helps produce multiple outputs during training, allowing the model to generate a distribution of possible predictions.* 02:32:45 🌐 *Lucid Dreamer paper discussed, focusing on Garian splatting technique for fast and efficient novel view synthesis in 3D scenes.* 02:38:02 📄 *Lucid Dreamer's "High Fidelity Text to 3D Generation via Interval Score Matching" introduced, generating impressive 3D content from text prompts.* 02:39:33 🔍 *Score distillation sampling explained as a method using 2D models to optimize 3D scene representations, addressing challenges like oversaturation.* 02:46:08 🔄 *Interval Score Matching introduced,a technique for consistency in score matching, providing a cleaner signal for updating base representations in 3D models.* 02:48:12 🌐 *Point initialization using existing models like Pointe for 3D GANs discussed, providing a better starting point for optimization.* 02:50:12 🎨 *Lucid Dreamer's applications include avatar generation, 3D editing, and impressive results in generating 3D scenes from various input types.* 02:51:59 🌐 *Mention of the ongoing innovation in text-to-3D space, with Lucid Dreamer being one of many papers pushing the boundaries in generating high-quality 3D content from textual inputs.* 02:58:32 📊 *Low-poly representations are beneficial for efficient rendering, and the paper explores a method to generate these representations using a graph convolutional encoder and residual face quantization module.* 03:02:24 🔄 *The paper introduces a mesh generation approach using a Transformer, where embeddings are treated as a sequence and decoded to produce mesh representations, showcasing potential applications in various domains.* 03:04:34 🔍 *Ablation studies reveal that the proposed paper incorporates several crucial techniques, emphasizing their necessity for achieving sensible results in mesh generation.* 03:08:32 📈 *The discussion shifts to exploring recent diffusion model papers, contemplating the advancements and improvements in the field beyond stable diffusion models.* 03:15:31 🚀 *"Virion" introduces an innovative approach to diffusion models, focusing on hierarchical stages with extreme compressions for efficient training and achieving competitive results with fewer GPU hours.* 03:20:35 🎨 *The key insight from "Virion" lies in breaking down the image generation task into different stages of difficulty, addressing compression-decompression efficiently at various hierarchical levels.* 03:26:23 🧠 *The discussion covers aspects of competition and innovation, including the allocation of compute between low and high-resolution parts and efficient training methods.* 03:27:35 📸 *Training without using billions of images is possible by creating a dataset from Creative Commons images, as shown by Mosaic ML, achieving competitive results with a smaller dataset.* 03:33:48 💡 *Pixart Alpha introduces an efficient text-to-image Transformer, leveraging a pre-trained ImageNet model, cross-attention for text, and synthetic data for fast training with competitive results.* 04:01:24 🔄 *The generative model pipeline involves steps like base training, fine-tuning with alignment/preference, and use case considerations like inference speed and deployment.* 04:01:51 🚀 *Use case considerations include speeding up sampling, deployment strategies, and exploring different sampling techniques for self-improvement and enhanced usability.* 04:02:16 📑 *Recap of papers covered, highlighting their focus areas in the generative model pipeline, including pre-training, fine-tuning, efficiency improvements, and data utilization.* 04:03:14 🌐 *Papers like I-Jeer focus on pre-training image models to learn useful representations, exploring generative approaches and joint embedding predictions.* 04:03:39 🧠 *Lucid Dreamer explores using an existing base model for a different purpose, demonstrating the versatility of pretrained models in varied applications.* 04:04:07 ⚙️ *Matroska, Pixart Alpha, and others emphasize efficiency improvements, both in terms of data utilization and training processes, contributing to the evolution of generative models.* Made with HARPA AI
Can splats emit light (instead of just reflecting)? If not, how difficult would that be to implement? I'd like to try modelling aurora, which would correspond to fully transparent splats emitting light.
Hi guys, some of you will probably run into a couple of issues while trying to run the notebook, if the debugger says from torch._six import string_class or something related to that, first git clone the VQGAN source code, if you have the problem, the rest of the code will fail, but thats okay, then comment the lines that git clone stuff, go to the file that has the issue, you can see the path in the debugger, then instead of from torch._six import string_class, write string_class=str, I think pytorch used to use a custom string class, now they use the python built in one,hope it helps :)
Great! Thanks for your videos! @datasciencecastnet I have a question: do you know if there is something like PickScore for image editing models specifically InstructPix2Pix? I would like to see what users prefer in terms of cfg text, cfg image and so on.