PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
We aim to be an accessible, community-driven conference, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
Huge! been trying to trade on my own for a while now but it isn’t going well. Few weeks ago I lost about $7,000 in a particular trade. Can you at least advise me on what to do?
Well, I picked the challenge to put my finances in order. Then I invested in cryptocurrency, stocks, through the assistance of my discretionary fund manager,
Since October 7 the way Hamas been moving ther thing just making the future in Palestine harder. say peace na fit it anymore. Instead of talk, its violence all the time, and the zionest entity takes advantage. The last hope was wit Salam Fayyad who could play the games. Hamas instead of looking after the everyday people just get rich in Doha, spend money on tunnels and luxury
I might actually do it soon since there'd be at least two users I know of (more than zero), but it will probably be very difficult to grow a community around it
but if I make a diagram, how do I know its the correct diagram, without having run experiments? you have your diagram first then do your modeling and analysis! I suppose you have to start with some hypothesis though
Sounds very good - and what I needed at this point in my developing a solution for document management. Thank you for the software and for the lecture! Will try it straight away.
*cudf.pandas: Accelerating Pandas with GPUs for Faster Data Processing* * *0:00** Introduction:* Ashwin Srinath, Senior Software Engineer at NVIDIA, introduces cudf.pandas, a tool that allows you to run pandas code on GPUs without code changes, achieving significant speedups. * *0:10** Motivation:* Pandas is popular but can be slow due to single-threading and its non-query-engine nature. Alternatives like cuDF exist, but they often require code changes and have different APIs. * *2:08** cuDF Overview:* cuDF, based on CUDA and C++, is a GPU-based data frame library offering a pandas-like API and substantial performance gains (10-100x faster than pandas). It currently supports 60-75% of the pandas API. * *3:25** Reasons to Stick with Pandas:* Despite alternatives, pandas remains valuable for its flexibility, ease of collaboration, a vast ecosystem of dependent libraries, and ongoing performance improvements. * *5:15** cudf.pandas Approach:* cudf.pandas aims to combine the benefits of pandas with GPU acceleration, allowing users to retain the familiar pandas API while leveraging the speed of GPUs. * *5:34** How It Works:* cudf.pandas acts as a proxy for pandas, intercepting pandas calls and attempting to execute them on the GPU via cuDF. If an operation isn't supported on the GPU, it seamlessly falls back to CPU execution using pandas. * *7:54** Demo (Part 1 - Basic Operations):* The demo showcases how to load the cudf.pandas extension in Jupyter Notebook. Several examples demonstrate performance gains for groupby, string operations, and merge operations, but also highlights cases where GPU acceleration doesn't provide a speedup (e.g., `count` on axis=1). * *10:53** Proxy Pattern Explained:* cudf.pandas uses a proxy pattern where proxy functions and types intercept pandas calls, attempting GPU execution first and falling back to CPU if necessary. * *11:18** Demo (Part 2 - Performance Optimization):* The demo focuses on optimizing time series data operations. Using the cudf.pandas profiler reveals that `index.between_time` is a CPU bottleneck. By rewriting the code to use GPU-supported datetime properties, the execution time is significantly reduced. * *15:11** Optimization Benefits:* Code optimized for GPU execution often also runs faster on the CPU, demonstrating that writing GPU-friendly code can be beneficial even without a GPU. * *15:49** Demo (Part 3 - Third-Party Library Acceleration):* The demo shows how cudf.pandas can accelerate third-party libraries that rely on pandas. Using LangChain as an example, it demonstrates how an LLM-powered agent utilizing pandas for data analysis can benefit from GPU acceleration, significantly reducing query execution time. * *19:00** Recap:* cudf.pandas offers GPU acceleration for pandas with no code changes. Optimizing code for GPU execution is crucial for maximum performance. Third-party libraries can leverage GPUs through cudf.pandas. * *19:28** How It Works (Technical Details):* cudf.pandas relies on the proxy pattern and customizes the Python import mechanism to deliver proxy modules, ensuring seamless integration with existing pandas code. * *20:50** Comparison with Other Approaches:* The talk briefly discusses the limitations of duct typing and the potential of the DataFrame Standard API for interoperability between data frame libraries. * *22:42** FAQs:* The presentation concludes with FAQs covering performance expectations, pandas API support, compatibility with third-party libraries, and handling data larger than GPU memory. * *24:03** Getting Started:* Instructions for installation and access to demo materials are provided. * *24:34** Q&A:* A brief Q&A session addresses questions from the audience regarding CPU performance gains, multi-node scaling, Docker images, UDF support, and the availability of the GitHub repository. I used gemini-1.5-pro-exp-0801 on rocketrecap dot com to summarize the transcript. Cost (if I didn't use the free tier): $0.09 Input tokens: 23132 Output tokens: 856
Hi Dimitry, I am trying to discover how to get a "minimum heart rate" from a bunch of samples of "sedentary heart rate" say about 10 per hour. The minimum heart rate would express the true minimum for the (noisy) samples (as opposed to just actual minimum). I thought about using extreme value analysis, but after this explanation, that doesn't seem correct. What would you suggest?
Thank you Vincent for sharing the link to this video of yours mentioning contextual helper in Jupyter lab notebook. Plus your demo of reflection is a good idea was extra goodie on top