Deep learning is transforming the world. We are making deep learning easier to use and getting more people from all backgrounds involved through our free courses for coders, software library, cutting-edge research, and community.
The world needs everyone involved with AI, no matter how unlikely your background. We want to make deep learning as accessible as possible.
I just started doing this lesson. The easiest thing to decipher greek symbols and math equations nowadays is just cut and paste the cryptic thing into ChatGPT. IMO what ChatGPT spits out is pretty good...
Thanks again everybody for putting these out- I listen through these and learn a dozen nuggets about things I never spent time thinking about. It's great to hear the running commentary about design considerations and how to think about implementations. Great stuff!
Absolutely, succulent definition of scripting language. Always wanted to know what does it mean, to write a scripting language. _A scripting language is a single file._ These talks are most valuable. Btw what is he wearing, it looks like a bathrobe but I am sure its something else.
20:50 Very interesting conversation. For me at least, working with docker has been a more stable and straight forward(albeit slower) way of running a good, isolated python environment. After I polluted a fedora's workstation environment I just switched everything to containers. Really recomend it.
(around 17:40) Is taking the ratio of the two `error_rate`s standard practice? I find the "30% improvement" statistic a little misleading? The original error rate is 7.2% and the new error rate is 5.6% (rounding of 5.548 but this is a detail). In other words the accuracy goes from 92.8% to 94.4%. This can be seen as significant or not depending on which scale you adopt: a linear or a logarithmic one.
*Summary* *Key takeaways:* * *Fastlite* is a new Python database library built on top of Simon Willison's `sqlite-utils`. (0:08) * It focuses on *ergonomics and ease of use* within interactive environments like Jupyter notebooks and IPython. (12:59) * *Key features:* * *Dynamic tab completion:* Access tables, views, and columns with tab completion. (11:12) * *Simplified syntax:* Use intuitive dot notation (e.g., `db.t.artists`) for accessing database elements. (13:59) * *Built-in visualization:* Generate database schema diagrams directly within the notebook using Graphviz. (22:45) * *Leverages Python's dynamic nature:* Utilizes magic methods like `__repr__` and `__getattr__` for a more Pythonic experience. (30:30) * *Motivation:* * Simplify web application development by making database interactions more intuitive. (0:48) * Address the complexity and verbosity of existing ORMs like SQLAlchemy and SQLModel. (1:16) * Provide a more dynamic and interactive alternative to statically typed approaches. (9:18) * *Implementation:* * Leverages `sqlite-utils` for core database functionality. (5:02) * Uses Python's magic methods to enable tab completion and custom object representations. (33:35) * Employs Graphviz for generating database schema diagrams. (47:32) * *Benefits:* * Easier to learn and use, especially for those familiar with SQL. * More interactive and exploratory workflow. * Less boilerplate code compared to traditional ORMs. *Overall:* Fastlite aims to make working with databases in Python more intuitive and enjoyable, particularly within interactive environments. i used gemini 1.5 pro to summarize the transcript
Can someone tell me if I need to buy an Nvidia GPU to run ML task ? I wanted to get an AMD but since they don't support CUDA(i may be wrong) I am a little apprehensive to go with it.
Only fifteen minutes in and I'm already missing fastai classes so much! There's a ton I learn from you Jeremy, whenever you do one of these streams. And the Hackers Guide to LLMs video is a true gem! 🎉
Until and unless, educators like Jeremy are present, no closed source company can have a lock on knowledge. Thanks for doing what you do, so consistently. One question though, even though there's so much chaos in education field, what motivates you to do it consistently? Doing great is okay, doing great consistently is really hard in this distraction prone world. Anyways as always Thank you and your team for your contribution
*Summary* *Why Claudette exists:* * *[**0:00**]* Jeremy feels Claude is underrated and wants to promote its use. * *[**4:03**]* He aims to provide a simpler, more transparent alternative to large, complex LLM frameworks. * *[**6:18**]* Claudette bridges the gap between writing everything from scratch and using bulky frameworks, appealing to both beginners and experienced Python programmers. *What Claudette does:* * *Simplifies Anthropic SDK:* * *[**8:27**]* Offers convenient functions to access Claude's models (Opus, Sonnet, Haiku). * *[**15:57**]* Handles message formatting and content extraction, making interactions cleaner. * *[**19:15**]* Tracks token usage across sessions for cost monitoring. * *[**24:48**]* Provides easy ways to use `prefill` (forcing the start of a response) and streaming. * *Implements Chat History:* * *[**1:01:55**]* The `Chat` class maintains conversation history, mimicking stateful behavior. * *[**1:04:34**]* Integrates seamlessly with prefill, streaming, and tool use within a chat session. * *Facilitates Tool Use (REACT pattern):* * *[**36:57**]* Uses Python functions as tools, automatically generating the required JSON schema with `docstrings` and type hints. * *[**56:21**]* Handles tool execution based on Claude's requests, including passing inputs and retrieving results. * *[**1:28:45**]* Provides a `tool_loop` function to automate multi-step tool interactions. * *Supports Images:* * *[**1:09:01**]* Includes functions to easily incorporate images into messages and prompts. * *Example Use Cases:* * *[**1:19:12**]* Demonstrates building a simple customer service agent with tool use. * *[**1:31:20**]* Showcases building a code interpreter similar to ChatGPT's, combining Python execution with other tools. *Key Features:* * *[**7:36**]* *Literate Programming:* The source code is designed to be read like a tutorial, explaining itself as you go. * *[**5:09**]* *Minimal Abstractions:* Leverages existing Python knowledge and avoids introducing unnecessary complexity. * *[**1:24:51**]* *Transparency:* You can easily inspect requests, responses, history, and even debug the HTTP communication. *Future Plans:* * *[**1:16:19**]* Create similar libraries ("friends") for other LLMs like GPT and Gemini. * *[**1:16:19**]* Maintain focus on simplicity and ease of use. *Overall:* Claudette is a user-friendly library that simplifies working with Claude while providing powerful features for building LLM applications. Its literate programming style and minimal abstractions make it easy to learn, use, and extend. i used gemini 1.5 pro to summarize the transcript.
Also, are we using latent space as gradients here, as we are subtracting gradients from the latent, which we typically do from weights in conventional NN ?
How do we decide the scaling factor in VAE part i.e. 0.18215, any hint on how to decide it ? I did try changing and could see the different output, but what's a good way to choose ?
Q: why not just make bs=len(valid_ds), i.e. make the batch size for the validation set the same as its length? I can't see a function in having batches of the validation set, since we're just computing some metric on it?
A hackers' guide to using language models, including open-source and OpenAI models, with a focus on a code-first approach. Covers language model pre-training, fine-tuning, and reinforcement learning from human feedback. Demonstrates creating a custom code interpreter and fine-tuning a model for SQL generation. Key moments: 00:01 Language models are essential in predicting the next word or filling in missing words in a sentence. They use tokens, which can be whole words or subword units, and are trained through pre-training on large datasets like Wikipedia. -Language models predict the next word or fill in missing words. They use tokens that can be whole words or subword units, enhancing their predictive capabilities. -Training language models involves pre-training on extensive datasets like Wikipedia. This process helps the model learn language patterns and improve its predictive accuracy. 08:04 Neural networks, specifically deep neural networks, are trained to predict the next word in a sentence by learning about the world and building abstractions. This process involves compression and fine-tuning through language model fine-tuning and classifier fine-tuning. -The importance of neural networks learning about objects, time, movies, directors, and people to predict words effectively. This knowledge is crucial for language models to perform well in various tasks. -The concept of compression in neural networks and the relationship between compression and intelligence. Fine-tuning through language model fine-tuning and classifier fine-tuning enhances the model's capabilities. -Different approaches like instruction tuning and reinforcement learning from human feedback are used in classifier fine-tuning to improve the model's performance in answering questions and solving problems. 16:07 To effectively use language models, starting with being a proficient user is crucial. GPT-4 is currently recommended for language modeling, offering capabilities beyond common misconceptions about its limitations. -GPT-4's ability to address reasoning challenges and common misconceptions about its limitations. It can effectively solve problems when primed with custom instructions. -The training process of GPT-4 and the importance of understanding its initial purpose to provide accurate answers. Custom instructions can guide GPT-4 to offer high-quality information. -The impact of custom instructions on GPT-4's problem-solving capabilities and the ability to generate accurate responses by priming it with specific guidelines. 24:12 Language models like GPT-4 can provide concise answers but may struggle with self-awareness and complex logic puzzles, leading to hallucinations and errors. Encouraging multiple attempts and using advanced data analysis can improve accuracy. -Challenges with self-awareness and complex logic puzzles can lead to errors and hallucinations in language models like GPT-4, affecting the accuracy of responses. -Encouraging multiple attempts and utilizing advanced data analysis can enhance the accuracy of language models like GPT-4 in providing solutions to complex problems. -Utilizing advanced data analysis allows for requesting code generation and testing, improving efficiency and accuracy in tasks like document formatting. 32:20 Language models like GPT-4 excel at tasks that require familiarity with patterns and data processing, such as extracting text from images, creating tables, and providing quick responses based on predefined instructions. -The efficiency of language models in tasks like extracting text from images and creating tables due to their ability to recognize patterns and process data quickly. -Comparison between GPT-4 and GPT 3.5 in terms of cost-effectiveness for using the Open AI API, showcasing the affordability and versatility of GPT models for various tasks. -The practical applications of using the Open AI API programmatically for data analysis, repetitive tasks, and creative programming, offering a different approach to problem-solving. 40:26 Understanding the cost and usage of OpenAI's GPT models is crucial. Monitoring usage, managing rate limits, and creating custom functions can enhance the experience and efficiency of using the API. -Monitoring usage and cost efficiency of OpenAI's GPT models is essential to avoid overspending. Testing with lower-cost options before opting for expensive ones can help in decision-making. -Managing rate limits is important when using OpenAI's API. Keeping track of usage, especially during initial stages, and implementing functions to handle rate limit errors can prevent disruptions in service. -Creating custom functions and tools can enhance the functionality of OpenAI's API. Leveraging function calling and passing keyword arguments can enable the development of personalized code interpreters and utilities. 48:30 Creating a code interpreter using GPT-4 allows for executing code and returning results. By building functions, one can enhance the model's capabilities beyond standard usage. -Exploring the concept of doc strings as the key for programming GPT-4, highlighting the importance of accurate function descriptions for proper execution. -Utilizing custom functions to prompt GPT-4 for specific computations, showcasing the model's ability to determine when to use provided functions. -Enhancing GPT-4's functionality by creating a Python function for executing code and returning results, ensuring security by verifying code before execution. 56:34 Using Fast AI allows accessing others' computers for cheaper and better availability. GTX 3090 is recommended for language models due to memory speed over processor speed. -Options for renting GPUs include GTX 3090 for $700 or A6000 for $5000, with memory size considerations. Using a Mac with M2 Ultra can be an alternative. -Utilizing the Transformers library from Hugging Face for pre-trained models. Challenges with model evaluation metrics and potential data leakage in training sets. -Selecting models based on Metas Llama 2 for language models. Importance of fine-tuning pre-trained models for optimal performance and memory considerations. 1:04:38 Jeremy Howard, an Australian AI researcher and entrepreneur, discusses optimizing language models for speed and efficiency by using different precision data formats, such as B float 16 and gptq, resulting in significant time reductions. -Exploring the use of B float 16 and gptq for optimizing language models, leading to faster processing speeds and reduced memory usage. -Utilizing instruction-tuned models like stable Beluga and understanding the importance of prompt formats during the instruction tuning process. -Implementing retrieval augmented generation to enhance language model responses by searching for relevant documents like Wikipedia and incorporating the retrieved information. 1:12:42 The video discusses using open-source models with context lengths of 2000-4000 tokens to answer questions by providing context from web pages. It demonstrates using a sentence Transformer model to determine the most relevant document for answering a question. -Utilizing sentence Transformer models to identify the most suitable document for answering questions based on similarity calculations. -Exploring the process of encoding documents and questions to generate embeddings for comparison and selecting the most relevant document. -Discussing the use of vector databases for efficient document encoding and retrieval in large-scale information processing tasks. 1:20:46 Fine-tuning models allows for customizing behavior based on available documents, demonstrated by creating a tool to generate SQL queries from English questions, showcasing the power of personalized model training in just a few hours. -Utilizing the Hugging Face data sets library for fine-tuning models, enabling quick customization based on specific datasets for specialized tasks. -Exploring the use of Axolotl, an open-source software, to fine-tune models efficiently, showcasing the ease of implementation and ready-to-use functionalities for model training. -Discussing alternative options for model training on Mac systems, highlighting the mlc and llama.cpp projects that offer flexibility in running language models on various platforms. 1:28:50 Exploring language models like Llama can be exciting yet challenging for Python programmers due to rapid development, early stages, and installation complexities. -Benefits of using Nvidia graphics card and being a capable Python programmer for utilizing Pi torch and hugging face ecosystem in language model development. -The evolving nature of language models like Llama, the abundance of possibilities they offer, and the importance of community support through Discord channels.