Python and Pandas with Reuven Lerner

Python and Pandas with Reuven Lerner

219
1 089 034

Подписаться

I'm Reuven Lerner, and I teach Python and data science to companies around the world, and via my online store at LernerPython.com. I also publish two weekly newsletters - "Better Developers" (BetterDevelopersWeekly.com) and "Bamboo Weekly" (BambooWeekly.com) and have published two books ("Python Workout" and "Pandas Workout").

On this RU-vid channel, I publish videos that will help you with Python and Pandas. Many of the ideas come from participants in my courses, who ask me great questions just about every day.

Is there something you want to know? Just contact me! My students generally give me the best ideas and directions for what to teach.

Search-and-replace Pandas values with "where" and "mask"

8:21

Search-and-replace Pandas values with "where" and "mask"

Месяц назад

Understanding and using idxmin/idxmax in Pandas

8:37

Understanding and using idxmin/idxmax in Pandas

Месяц назад

My new book, Pandas Workout, will make you more confident and fluent with Pandas

2:20

My new book, Pandas Workout, will make you more confident and fluent with Pandas

Месяц назад

What are the "nlargest" and "nsmallest" methods in Pandas? (And should you use them?)

8:06

What are the "nlargest" and "nsmallest" methods in Pandas? (And should you use them?)

Месяц назад

Advanced unpacking in Python "for" loops

6:55

Advanced unpacking in Python "for" loops

4 месяца назад

Sorting a mix of letters and numbers in Python

9:49

Sorting a mix of letters and numbers in Python

4 месяца назад

The new case_when method in Pandas 2.2.0

10:07

The new case_when method in Pandas 2.2.0

5 месяцев назад

Method chaining in Pandas

18:17

Method chaining in Pandas

5 месяцев назад

Five mistakes companies make teaching Python to their staff

8:45

Five mistakes companies make teaching Python to their staff

7 месяцев назад

Comparing values in Pandas with "diff" and "pct_change"

6:46

Comparing values in Pandas with "diff" and "pct_change"

7 месяцев назад

Selecting rows in Pandas using .loc and lambda

9:04

Selecting rows in Pandas using .loc and lambda

7 месяцев назад

Understanding "with" and Python's context managers

14:00

Understanding "with" and Python's context managers

7 месяцев назад

Improve your career with Python + data: Announcing PythonDAB cohort 4

4:19

Improve your career with Python + data: Announcing PythonDAB cohort 4

7 месяцев назад

*args and **kwargs - what are they, and how are they different?

10:42

*args and **kwargs — what are they, and how are they different?

10 месяцев назад

Boolean indexing in Pandas made simple

8:23

Boolean indexing in Pandas made simple

10 месяцев назад

Using | in Pandas? Consider the "isin" method instead

6:01

Using | in Pandas? Consider the "isin" method instead

10 месяцев назад

Flipping Data with Pandas: Stack & Unstack

8:17

Flipping Data with Pandas: Stack & Unstack

11 месяцев назад

The six most important read_csv arguments in Pandas

16:50

The six most important read_csv arguments in Pandas

11 месяцев назад

Data selection in Pandas with "filter"

8:24

Data selection in Pandas with "filter"

11 месяцев назад

Why you'll love Jupyter Notebook 7

9:13

Why you'll love Jupyter Notebook 7

Год назад

NaN vs. NA - understanding Pandas nullable types

10:16

NaN vs. NA — understanding Pandas nullable types

Год назад

One Python function, three different errors

5:05

One Python function, three different errors

Год назад

Bang! Unix shell magic inside of Jupyter

11:27

Bang! Unix shell magic inside of Jupyter

Год назад

Magic commands in Jupyter and IPython

7:33

Magic commands in Jupyter and IPython

Год назад

ChatGPT wrote Pandas code to analyze US debt. Was it any good?

35:24

ChatGPT wrote Pandas code to analyze US debt. Was it any good?

Год назад

ChatGPT + Noteable (Jupyter) = Mind-blowing!

20:20

ChatGPT + Noteable (Jupyter) = Mind-blowing!

Год назад

Finally! Pandas exercises that aren't boring: Bamboo Weekly

2:07

Finally! Pandas exercises that aren't boring: Bamboo Weekly

Год назад

Stop using inplace=True in Pandas!

6:32

Stop using inplace=True in Pandas!

Год назад

Finding text patterns in Pandas with regular expressions

7:04

Finding text patterns in Pandas with regular expressions

Год назад

Комментарии

@iaroslavd.916 3 дня назад

Cool explanation. Very clear!

@ReuvenLerner 3 дня назад

I'm delighted that it helped!

@kisho2679 5 дней назад

How can documents be nested/included in JupyterLab, being updated when changed?

@ReuvenLerner 4 дня назад

JupyterLab can handle folders, including sub-folders. So you can put documents, including notebooks, inside of those folders.

@kisho2679 4 дня назад

@@ReuvenLerner Yes, well, I mean "include" (=call/embed/encapsulate) an external/underlying file (e.g. .md, .tex, etc.) into a cell of a new document, which will be automatically uptated when the content of the underlying file changes ...

@ReuvenLerner 4 дня назад

@@kisho2679 Oh, I don't think that's possible. (Maybe I'm wrong, though!) Instead, you'll probably want/need to write a bunch of code in a Python module and then import that module into your notebook.

@carcorr 7 дней назад

Amazing presentation skills. I was waiting to see an explanation of why strings in python are immutable, but I guess that was out of scope. Thanks!

@ReuvenLerner 7 дней назад

Thanks so much! And yeah, there are a number of reasons why Python's strings are immutable. One is that immutable data is more efficient and less error prone. A second is that only immutable (well, hashable, but that's almost the same thing) values can be dict keys. So for us to use strings as dict keys, they basically have to be immutable.

@CHeRKeSSS00 9 дней назад

OMG! OMG! OMG! What a content! I am speachless! After buying so many online courses I found you for free? OMG! OMG! OMG!

@ReuvenLerner 8 дней назад

I'm so delighted you enjoyed it!

@CHeRKeSSS00 8 дней назад

@@ReuvenLerner I was wondering if you teach on any other platforms? Anywhere to look at your courses? Thanks

@ReuvenLerner 8 дней назад

@@CHeRKeSSS00 Yup, check out LernerPython.com -- lots of courses there!

@h4ck314 9 дней назад

Instructive video, thanks

@ReuvenLerner 7 дней назад

I'm glad you enjoyed it!

@pietraderdetective8953 9 дней назад

great content! do you know if Numpy 2.0 brings performance speedup? If it does, I'm hoping it will speedup up Pandas as well! I've been holding off migrating to Polars due to the 30-50k loc Pandas code I got. Would rather not having to refactor them and stay with Pandas if there's major speedup with the new Numpy 2.0

@ReuvenLerner 9 дней назад

My impression is that it doesn't really speed things up, but rather cleans up the API in a lot of ways. That said, I wouldn't be surprised if they managed to find *some* ways to speed things up a bit more. The real Pandas speedup will come, I think, via PyArrow. It already reduces memory usage quite a lot, and for many things it's super speedy. The other things (e.g., joins) aren't quite there yet, but I have to assume that they will.

@abc_cba 9 дней назад

i also tried using polars last day, but maybe with using pandas for so long, it was not very convenient for me with syntax. I would love to see you making tutorials on pandas alternatives viz, Dask Ray Modin Vaex RAPIDS Ponder Fugue Daft DuckDB (thank you, in-case you read my long comment)

@ReuvenLerner 9 дней назад

I've got a lot to do just on the Pandas front, but I do hope to eventually cover some of these other technologies, too. I tend to cover them a bit more at Bamboo Weekly (BambooWeekly.com/),.

@abc_cba 9 дней назад

@@ReuvenLerner alright

@abc_cba 9 дней назад

Thank You for this video, i just wanted to know if we can run both by summoning one version when the other isn't needed or maybe have both of them and invoking a specific version? My English is weak, as i originate from a rural area of India and English is my 9th spoken language in day to day communications. - Samuel

@ReuvenLerner 9 дней назад

I wish that my 9th language were as good as your English! Python doesn't let you have more than one version in use at a time. However, you can have virtual environments (venvs) with different versions of the same package. In that way, one project can use NumPy 1.x, and another project (in another directory) can use NumPy 2.0.

@abc_cba 9 дней назад

@@ReuvenLerner thank you for the appreciation. I am very fascinated that i came across your channel and i added your playlist into my watch later videos, i would love to watch and learn more from them. thank you for contributing to the developer world in this age of chat prompts. it is more understandable. btw, did you take the developer's survey from stackoverflow which they silently pushed and didn't make a lot of noise about this year? I found the questionnaire was only based on A.I. NOTHING ELSE, kind of signaled that many in the Dev space say is the end of Stackoverflow since the inception of LLM's. what's your take on that? also, i assume it must be very late in the US, and for you to reply me this quickly, i want to thank you again for that. My name is Samuel, nice to know you, Mr. ___?!

@rejoicechidinma9491 11 дней назад

Great 👍

@ReuvenLerner 10 дней назад

Glad you enjoyed!

@basil9633 12 дней назад

Great video ! , videos like this are very informative and help young developers like me.

@ReuvenLerner 12 дней назад

I'm so glad to hear it helped!

@motivational_gamer 13 дней назад

Thank you so much

@ReuvenLerner 12 дней назад

My pleasure! Thanks for letting me know.

@atifdai313 13 дней назад

I am using the yearly data....Suppose my data is showing 33 rows and 20 columns (20 columns also including the years (1999 to 2022) in my summary stat analysis. How can I exclude the year's column from my whole analysis? OR I should delete the year's column. Please guide us further regarding any data shape command.

@ReuvenLerner 12 дней назад

You can remove one or more columns with df.drop. If you want to remove all rows in a particular range, then you will likely want to use a boolean index to indicate what you do or don't want, and then apply it to the data frame. There isn't room here to explain that, but look for my video about "boolean indexing made simple" that explains it more.

@MikeM-uy6qp 15 дней назад

This seems helpful. Unfortunately shortcuts aren't working in my notebook. I swear I'm cursed. Every damn thing I do in Python requires troubleshooting.

@ReuvenLerner 9 дней назад

Oh, no! I'm sorry to hear that you're having these problems. It definitely takes time to get your environment working in a way that makes sense and is stable.

@adithyagopal1816 21 день назад

Thanks a lot sir<3

@ReuvenLerner 21 день назад

My pleasure!

@real.samad_ 22 дня назад

Hello Reuven, can I use smally portion of this video for an instagram reel I am working on?

@ReuvenLerner 22 дня назад

Maybe -- it depends on the context and what you're using it for. Feel free to e-mail me (reuven@lerner.co.il) to discuss this further.

@John-pb7gb 24 дня назад

How can we a csv file if we have an uneven number of columns? Let's say the header row has Name, Phone number, and Address separated by comma, but only some of the data in the Address column has more commas(ex: St.Vincent road, Dallas, TX) something like this. How should I read the file

@ReuvenLerner 9 дней назад

CSV files need to have the same number of columns in each row. You can sometimes get away with null values, if there are commas next to each other, but I don't believe that you can ever have variable-length lows.

@Kavrizhka Месяц назад

thanksfor your explanation!👍

@ReuvenLerner Месяц назад

Glad it helped!

@ancientgear7192 Месяц назад

2:25 Why does the interface looks like that? I reinstalled anaconda today and the notebook looks way different than what it used to. And the notebooks look different too. They are counterintuitive. Unless there is some kind of glitch.

@ReuvenLerner Месяц назад

Yes, the new version of Jupyter notebook looks a bit different. They have moved around the UI and functionality. It's all still in there, plus a lot of new stuff, but they changed it because people didn't move to JupyterLab and still wanted the functionality. You can expect Jupyter to look and act this way moving forward, though.

@ancientgear7192 Месяц назад

@@ReuvenLerner yes I found out it is version 7. I downgraded it to the previous 6 something version and now I can customize it with different themes too.

@hiddentruth4310 Месяц назад

Can name be a class in the real world of computer programming?

@ReuvenLerner Месяц назад

In Python, a class is an object. And like all objects, you can have a variable that refers to it. When we define a class with the "class" keyword, we are assigning a variable to refer to the class object. That variable is just like any other variable, and can have any name that you want. Traditionally, we name classes with CamelCase, but that is a convention, not a rule. However, I'm not sure if this answers your question -- I'm not sure what you meant by "the real world," if that refers to non-Python languages, or something else. I'm happy to answer any other questions you have!

@dataalanlist Месяц назад

Sir for this version, do you use Nbextensions ? Since I need the hinterland for autocompletion but it seems the Nbextensions not support/compatible with this version. Or maybe you have another recommendation for autocompletion in this version

@ReuvenLerner 9 дней назад

I don't use very many extensions, if any - in part because I've had trouble with extensions in the past. Autocomplete just kinda works for me, when I press tab.

@kdpr007 Месяц назад

Thank you for the video. Ca you please suggest a good book to read on Classes and Objects. Thank you

@ReuvenLerner 9 дней назад

I don't know about a book, but I gave two webinars/courses, "Python objects for newbies," that are part of my subscription service at LernerPython.com . And you can check out my video here, which explains some things: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ZD3kKK1_deQ.html

@mikhails2883 Месяц назад

This is a great achievement, congrats! BTW, your channel is awesome. Do you have plans on writing another workout book (SQL, Python Workout 2, or whatever subject it might be 🙂)?

@ReuvenLerner Месяц назад

First: Thanks so much for your kind words! Second: I have discussed another book possibility with Manning, and SQL Workout was what we are thinking about. But to be honest, I need a bit of a break from writing books! That said, I'm hoping that when things stabilize with my subscription service (at LernerPython.com/), both in terms of more subscribers and also getting the back-end systems more robust, I'll have time to think about new book projects, as well as re-recording a bunch of my existing courses and creating new ones.

@rafaelsantana5808 Месяц назад

Hello, I would like to know how I can make it count in a column at the end all columns that have values. example: count = ID1 + ID2+ ID3

@inderjeetchandnani302 Месяц назад

At time 3:59 if we have a date data, will it work?

@safkaify7875 Месяц назад

Nicely explained. Keep up the good work.

@ReuvenLerner Месяц назад

My pleasure; glad you enjoyed!

@mansouralshamri1387 Месяц назад

Great little exercise. I was able to do it in two lines (excluding the first line). my_list = ['a2', 'b3', 'c4', 'a5', 'b12', 'c1', 'a11', 'b10', 'c14'] seperated = [(item[0], int(item[1:])) for item in my_list] sorted(seperated)

@ReuvenLerner Месяц назад

Excellent!

@mansouralshamri1387 Месяц назад

I don't understand the trapping exceptions in the __exit__ part. Why did you return True?

@ReuvenLerner Месяц назад

You return True from `__exit__` to ensure that the exception isn't re-raised.

@tyl9680 Месяц назад

Very clearly explained. Thanks!

@ReuvenLerner Месяц назад

Glad to hear it helped!

@imothar Месяц назад

Another great video👍 Just wondering if there were any specific reason why did not use pd.NA? Perhaps it's the same result in the end, when it comes to floats 🤷

@ReuvenLerner Месяц назад

The future of Pandas is clearly pd.NA, and I should use it more! But in this particular case, it didn't make a difference: Using either np.nan or pd.NA will turn the dtype into floats. That's because the standard int type isn't nullable, meaning that it cannot handle pd.NA as anything other than a float. If, however, you were to set the dtype to be Int64 (note the capital), then using pd.NA would indeed do what you (and I) want.

@tyl9680 Месяц назад

Why s.replace('is', 'IS', regex=True) and s.replace('is', 'IS') give different results for s = pd.Series('this is a buch of words'.split())

@marcinpohl3264 Месяц назад

How do i use np.NaN in a way that does NOT change ints to floats?

@ReuvenLerner Месяц назад

NaN is a float. So if you want to have NaN in an int column, then the ints will need to change to floats. HOWEVER, if you create your series with a nullable type, then you can use pd.NA instead of np.nan, and you'll be all set. That's because pd.NA is compatible with a wide variety of types: In [12]: s = Series([10, 20, 30, 40, 50]) In [13]: s.loc[3] = pd.NA In [14]: s Out[14]: 0 10.0 1 20.0 2 30.0 3 NaN 4 50.0 dtype: float64 In [15]: s = Series([10, 20, 30, 40, 50], dtype='Int64') In [16]: s.loc[3] = pd.NA In [17]: s Out[17]: 0 10 1 20 2 30 3 <NA> 4 50 dtype: Int64

@tyl9680 Месяц назад

What about diff by different categories? Say I have corn, rice, beans and wheat prices in the same df, and I want to compare the price changes within the same catogories.

@ReuvenLerner Месяц назад

You can totally do this! Just use "diff" on the result of a "groupby". For example: df = DataFrame({'category': ['wheat', 'corn', 'rice', 'wheat', 'corn', 'rice', 'wheat', 'corn', 'rice'], 'price': [10, 8, 6, 11, 7, 5, 15, 9, 4]}) df.groupby('category')[['price']].diff() You'll get a new data frame back (thanks to the double square brackets around 'price'), showing the difference for each row from the previous occurrence of that category. However, if you want to know which category is which, you'll probably want to join it back to the original data frame: df.groupby('category')[['price']].diff().join(df, rsuffix='_df')

@retrain35yo87 Месяц назад

super...

@ReuvenLerner Месяц назад

You bet!

@retrain35yo87 Месяц назад

good vid...no need to "".join() after?

@retrain35yo87 Месяц назад

good lesson

@ReuvenLerner Месяц назад

Glad you enjoyed it!

@SteamTrain2639 Месяц назад

What is the point of the lines that say: self.x = x self.y= y ?

@ReuvenLerner Месяц назад

Self is the new instance. Attributes set on self are the Python equivalent of setting instance variables. We thus take the parameter x (whose value was set by the caller) and assign it to self.x, keeping the value around. We then do the same thing with y.

@arwaabougharib8698 Месяц назад

Thanks for the great video! I'm curious as to why I'd want to open a new launcher when every new notebook I create in the same launcher has a different kernel...

@ReuvenLerner Месяц назад

I think the launcher is just a way to start new notebooks, consoles, etc. Each of those still has its own kernel.

@PrateekTrivedi6 Месяц назад

Thanks for such amazing videos. You making learning simple by explaining the functionality and the practical use-cases where the value for that functionality lies. I have a doubt - @6:02 - Shouldn't level be 0 when we are unstacking on the basis of passenger count as this is only row label present ?

@ReuvenLerner Месяц назад

Thanks for the kind words! Notice that in many (not all) of the cases in this video, I'm not actually modifying our data frame g. Rather, I'm running a method (e.g., unstack), and getting back a new data frame. This leaves g unchanged, which means that if I first run g.unstack(level=1) and then run g.unstack(level=0), I'm unstacking on the same data frame each time. It's not that after unstacking on level=1 we're left with only one level in the multi-index. Does that make sense?

@PrateekTrivedi6 Месяц назад

@@ReuvenLerner Yeah I got it now :) that was super quick response, thanks!!

@kmh9817 Месяц назад

Thank you!!!!!!!!!!!

@ReuvenLerner Месяц назад

Glad it helped!

@kmh9817 Месяц назад

Kept using # with words without space. I thought something is wrong with my pc.

@daklina Месяц назад

Its awesomely helpful! I'm wondering if theres something similar in Pycharm though... Thx

@ReuvenLerner Месяц назад

Glad you enjoyed it! The idea of "cells" that can contain either code or Markdown is special to Jupyter. So if you're using PyCharm's paid (professional) edition, then you can fire up a notebook and use Markdown there. But in a regular ol' PyCharm (or Python) file, you can't.

@VelkoKamenov 2 месяца назад

Great content! This concept is especially useful for people coming to Python from R and the tidyverse syntax with the pipe operator (%>%) for chaining functions. About filtering rows how do you feel about using query() instead of loc?

@ReuvenLerner 2 месяца назад

Thanks so much! I'm not a big fan of query, just because it introduces a totally new and different syntax -- it's sort of like embedding SQL inside of a program. That said, I admit that query can sometimes make things run faster, and can also look a bit clearer.

@proud_indian0161 2 месяца назад

From where can i get this athlete dataset?

@ReuvenLerner 2 месяца назад

It's in the data set for my book, Pandas Workout, at files.lerner.co.il/pandas-workout-data.zip

@hugoguay4993 2 месяца назад

Thank you so much it greatly help for me and my study of Data science

@ReuvenLerner 2 месяца назад

I'm delighted to hear it!

@hull39 2 месяца назад

Your videos are great!!! Thanks so much for the concise, clear explanations. You Rock!!!

@ReuvenLerner 2 месяца назад

Thanks so much for your kind words! Hoping to do more videos very soon...

@Since-em2vy 2 месяца назад

Great explaination lerner😃 Currently am learning python and your video helped me to solve this issue.

@ReuvenLerner 2 месяца назад

Great to hear!

@mimiq0368 2 месяца назад

A very simple and direct explanation, super nice! :D

@ReuvenLerner 2 месяца назад

Glad it helped!

@hashimnaushahi 2 месяца назад

I really like the way you teach Python! Do you also have a video about the builder pattern, or would you be willing to create one?

@ReuvenLerner 2 месяца назад

Thanks for your kind words! I have a (paid, recorded) course on design patterns, but I don't think that it includes the builder pattern. (It often depends on whether I have time to fit it in.) If not, I'll see if I can do something here on RU-vid.

@hashimnaushahi 2 месяца назад

@@ReuvenLerner Thank you! I'll be looking forward to that video. In the meantime, do you have a link to your paid course?

@ReuvenLerner 2 месяца назад

@@hashimnaushahi Absolutely -- the design patterns course is at store.lerner.co.il/design-patterns (which isn't explicitly mentioned yet, but is a free part of my course membership at LernerPython.com). Let me know if you have any further questions!

@regal7548 2 месяца назад

What if the datasets doesnt haw anything in common , like one is geological data, one is survey data, one is market analysis and each of them has a massive number of null values . Also the unique ids are different for example , one table has SLT20284903 and some others just numbers . What do we do then ?

@ReuvenLerner 2 месяца назад

Then you shouldn't be combining them, in this way or any other way! My assumption for this video was that you have a data set broken up across a number of CSV files, each with the same column names and dtypes. You want to take these multiple files and turn them into a single data frame. Pandas provides us with the "pd.concat" method, which is good for such things, but the problem is how you read them into Pandas quickly and easily. If you have geological data, survey data, and market analysis, then *perhaps* they have some factors in common. But you don't want them in the same data frame. Rather, read each into its own data frame, and use "join" or "merge" to combine them.

@regal7548 2 месяца назад

@@ReuvenLerner ok.. thank you

@BlenderAwy 2 месяца назад

how to solve the error in jupyter (invalid syntaxror)

@ReuvenLerner 2 месяца назад

That's not a Jupyter problem per se; it means that you have a syntax problem in your Python code. Double check your code; maybe there is a missing parenthesis, quote, comma, or period?

@siddharthsingh6561 2 месяца назад

ur incredible!

@ReuvenLerner 2 месяца назад

Thanks so much!