Тёмный

Matplotlib Boxplots | Creating Single and Multiple Boxplots in Python 

Andy McDonald
Подписаться 9 тыс.
Просмотров 26 тыс.
50% 1

Matplotlib boxplots can be used for a variety of tasks which include: outlier detection, understanding the data range and distribution, and understanding whether the data is skewed. In this video, we take a look at creating basic boxplots with matplotlib, without the need for seaborn or any other high-level libraries.
We also look at creating boxplots of multiple columns with different ranges using a simple Python for loop.
If you haven't already, make sure you subscribe to the channel: / @andymcdonald42
If you have enjoyed this video and want to say thanks, feel free to buy me a coffee at the following link: buymeacoffee.c...
----
The notebook for this video can be found on my GitHub repository at: github.com/and...
Libraries used in this video:
pandas: pandas.pydata.org
matplotlib: matplotlib.org
Books I Recommend:
As an Amazon Associate I earn from qualifying purchases. By buying through any of the links below I will earn commission at no extra cost to you.
PYTHON FOR DATA ANALYSIS: Data Wrangling with Pandas, NumPy, and IPython
UK: amzn.to/3HNycJ9
US: amzn.to/3DL7qPv
FUNDAMENTALS OF PETROPHYSICS
UK: amzn.to/3l1PgSf
PETROPHYSICS: Theory and Practice of Measuring Reservoir Rock and Fluid Transport Properties
UK: amzn.to/30UNWZS
US: amzn.to/3DNqBbd
WELL LOGGING FOR EARTH SCIENTISTS
UK: amzn.to/3FHsbfn
US: amzn.to/3CILAuE
GEOLOGICAL INTERPRETATION OF WELL LOGS
UK: amzn.to/3l2v2HV
US: amzn.to/30UOTkU
-----
Thanks for watching, if you want to connect you can find me at the links below:
/ andymcdonaldgeo
/ geoandymcd
/ andymcdonaldgeo
www.andymcdona...
Sign up to my newsletter at:
fabulous-found...
#matplotlib #petrophysics #python #boxplots #welllogs #jupyternotebooks #geoscience

Опубликовано:

 

30 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 44   
@annadomas2484
@annadomas2484 Год назад
Thank you! I am learning and your videos help a lot! I tried to use your code for my dataset but I faced with an error and do not understand where is the problem. TypeError Traceback (most recent call last) in 4 5 for i, ax in enumerate(axes.flat): ----> 6 ax.boxplot(data1.iloc[:,i]) 7 ax.set_title(data1.columns[i], fontsize=20, fontweight='bold') 8 ax.tick_params(axis='y', labelsize=14) TypeError: unsupported operand type(s) for +: 'method' and 'float'
@alirezarahnama2096
@alirezarahnama2096 7 месяцев назад
Hi Andy! I have been trying to make a box plot with a simple break in y-axis and have not been able to. any tips?
@mohammadkeshtkar9655
@mohammadkeshtkar9655 3 года назад
We are very lucky to be able to see these useful videos. Thank you Andy🙏🙏
@AndyMcDonald42
@AndyMcDonald42 3 года назад
Thanks. I am glad you like them!
@sanisalisu4929
@sanisalisu4929 5 месяцев назад
I can send you the data and the type of boxplot Im talkig about
@johnowusukonduah2305
@johnowusukonduah2305 Год назад
I always know my answer is certain with Andy! Thank you for your great videos, I've learnt a lot from you. You're a genius
@victorjohnlaobena7099
@victorjohnlaobena7099 5 месяцев назад
help me out alot than you!😀😀😀
@chisoo6903
@chisoo6903 3 года назад
after knowing the outlier in the boxplot , what is the python command we could use to remove them from our analysis?
@AndyMcDonald42
@AndyMcDonald42 3 года назад
Hi Chi, you can use a small piece of code, like the one below, to remove the outliers identified by the boxplot. #Calculate the Quartiles Q1 = df.quantile(0.25) Q3 = df.quantile(0.75) #Calculate the IQR IQR = Q3 - Q1 #Remove the outliers df_clean = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)] Source: stackoverflow.com/questions/50461349/how-to-remove-outlier-from-dataframe-using-iqr
@sabrinakadirova7084
@sabrinakadirova7084 2 года назад
I liked it so much! Please, keep doing such videos, you're saving my nerves..
@AndyMcDonald42
@AndyMcDonald42 2 года назад
Thanks. I have plenty more to come 😁
@gamuchiraindawana2827
@gamuchiraindawana2827 6 месяцев назад
Lovely
@vito135c
@vito135c Год назад
Thanks.
@GreyHatGenX
@GreyHatGenX Год назад
commnet
@cypherecon5989
@cypherecon5989 2 года назад
data["income"].plot(kind="box"); but it doesnt show me the y and x axis. Does anybody know why that is?
@cypherecon5989
@cypherecon5989 2 года назад
5:01 even with the plt. command the boxplot gets plotted but without y and x axis...
@AndyMcDonald42
@AndyMcDonald42 2 года назад
I’m not sure. Have you checked over your data to make sure it’s ok and you are calling the correct column? I believe anything like nans should be handled by the plotting. If you are still having trouble Stackoverflow is a great place to get help and it allows you to share your code and data, which you can’t really do here
@cypherecon5989
@cypherecon5989 2 года назад
@@AndyMcDonald42 it was my dark theme. I had to do plt.figure(facecolor="white"). :D
@AndyMcDonald42
@AndyMcDonald42 2 года назад
@@cypherecon5989 Glad you got it sorted. Its always the small things that catches us out. 😁
@balajig8522
@balajig8522 2 года назад
Really nice vedio! please share the original DataFrame you used
@19neetish
@19neetish Год назад
Hi, Could it be possible that using a box plot and interquartile range may not always be a good idea? for example, the formation can have n number of combinations, and fluid properties may vary too. It may result is a very wide data spread. Could it be possible that a point outside the range might be true and represent a unique rock type? Shouldn't we confirm that from the mud log?
@AndyMcDonald42
@AndyMcDonald42 Год назад
Yes. That is very possible. Any outliers detected by these methods should always be checked to confirm that they are real outliers. When applying boxplots to petrophysical data I often do it by filtering for specific formations/ rock types. The key is not to use one method in isolation. Same principle as not trying to do an analysis based on a single curve.
@19neetish
@19neetish Год назад
@@AndyMcDonald42 In the case of this field. Would you suggest doing the outlier analysis based on the geological age of the rock? This data is present in the dataset. Also, is it possible to figure out whether the log data is processed or not? I mean whether all the necessary correction has been applied by the logging company or not? Just looking at the PFE data, I can see mud has barite, and PEF readings are off the chart. It makes me think resistivity and other data might not have been corrected for borhole environment too. That would definitely mess up the model training.
@slee3083
@slee3083 2 года назад
Hi Andy, looking at the last exercise using subplots, would this still work if the columns had a different number of data points from each other? I've tried similar to this video except with reading a simple csv file containing a few columns of data, with some columns having more data points than others, and the box plots with less data points (NaN) just don't show up at the end. Is there a way around this? If I plot the data separately or on the same graph (same axis) it has no problem, but only some of the subplots with fewer data points just wouldn't plot at all. Thanks
@AndyMcDonald42
@AndyMcDonald42 2 года назад
Hi S Lee. I am not 100%certain on this and would have to try. But some plots don’t handle nan values and you unfortunately have to remove them by dropping them. This seems to be the case with this stackoverflow question which sounds similar to what you are experiencing stackoverflow.com/questions/44305873/how-to-deal-with-nan-value-when-plot-boxplot-using-python
@espanolaturitmoint
@espanolaturitmoint 2 года назад
Hi, thanks a lot for the content! I need help with a boxplot... Could you tell me how you can show the points inside the boxplot and annotate a number for each point? I have a dataset only of 49 points
@AndyMcDonald42
@AndyMcDonald42 2 года назад
No problem. One way to do that is add a jitter plot on top of the box plot. I’m not so sure annotating each point would be a good idea as it may become too cluttered. You can see an example here www.python-graph-gallery.com/36-add-jitter-over-boxplot-seaborn
@yippiyee1
@yippiyee1 2 года назад
Thanks for the informative video.
@iliusmondal2098
@iliusmondal2098 2 года назад
Hi Andy, Is there any way to remove the outliers?
@AndyMcDonald42
@AndyMcDonald42 2 года назад
Yes there is. You can apply the boxplot equations to a dataframe and remove points that way : datascience.stackexchange.com/questions/54808/how-to-remove-outliers-using-box-plot
@josedavidbastoaguirre2099
@josedavidbastoaguirre2099 3 года назад
Really nice video! Thanks. It would be great if you could also explain how to interpret the graphics. for instance, what is the meaning of having a lot of outliers in GR Log. Again, thank you very much.
@josedavidbastoaguirre2099
@josedavidbastoaguirre2099 3 года назад
I mean... probably some of them are just wrong data, but maybe some outliers represent a particular lithology.
@AndyMcDonald42
@AndyMcDonald42 3 года назад
Thanks Jose. I am planning to cover that in a small series on outlier detection in the near future. These initial videos are focusing on how to create the plots with Python. I also covered this topic very briefly at this years SPWLA conference and in more detail in my Data Quality paper, which you can find at the link below. www.researchgate.net/publication/351607547_Data_Quality_Considerations_for_Petrophysical_Machine_Learning_Models You are correct that some of the outliers could be incorrectly measured data, which could be a result of tool/sensor issues, borehole washout, system issues...etc. But they could potentially reflect a particular lithology, for example a spike in the GR data may be caused by a hot sand/hot shale. That is why we need to treat some of these outlier detection methods with caution and also use our domain expertise to make the final decision.
@coldtea9755
@coldtea9755 2 года назад
Thank you really helpful
@timut1830
@timut1830 2 года назад
Thank you so much for your video!
@AndyMcDonald42
@AndyMcDonald42 2 года назад
No worries!
@mjones410
@mjones410 2 года назад
super helpful thank you Andy
@AndyMcDonald42
@AndyMcDonald42 2 года назад
Happy to help
@kararshah6056
@kararshah6056 Год назад
man u explained sooooooooooo good
@AndyMcDonald42
@AndyMcDonald42 Год назад
Thanks. I am glad it helped :)
@anamalbulushi5332
@anamalbulushi5332 3 года назад
Thank you Andy 👍🏻
@AndyMcDonald42
@AndyMcDonald42 3 года назад
Any time 👍
@nzambabignoumba445
@nzambabignoumba445 2 года назад
Thank you!!
@AndyMcDonald42
@AndyMcDonald42 2 года назад
You're welcome!
Далее
ХОККЕЙНАЯ КЛЮШКА ИЗ БУДУЩЕГО?
00:29
Seaborn Is The Easier Matplotlib
22:39
Просмотров 172 тыс.
Making Animations in Python using Matplotlib!
22:05
Просмотров 11 тыс.
How To Make Your Matplotlib Bar Charts Stand Out
19:59
Просмотров 1,8 тыс.