Isolation Forest for Outlier Detection within Python

Andy McDonald

Подписаться 9 тыс.

Просмотров 28 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

4 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 25

@AndyMcDonald42 23 дня назад

The code and data for this video can be found as part of my Petrophysics & Python Series on Github: github.com/andymcdgeo/Petrophysics-Python-Series Direct Notebook Link: github.com/andymcdgeo/Petrophysics-Python-Series/blob/master/33%20-%20Auto%20Outlier%20Detection%20-%20Isolation%20Forest.ipynb Data Folder: github.com/andymcdgeo/Petrophysics-Python-Series/tree/master/Data

@dougclendening5896 Месяц назад

I haven't found a single video that basically explains what lines 8, 9 and 10. Some videos talk about trees but are too generic and don't give real examples in the nodes. Videos like this shows the code but don't talk about how any of this is related to an actual tree or set of logic. How the heck are we getting there? Also, I don't think you showed an example row of data. Are all of the data numbers?

@smn7074 Год назад

thanks for your great video. exactly what i needed.

@vitorribeirosa 2 года назад

Thanks, Andy!!! Great video!!!

@faicornelius2601 Год назад

Thanks so much for your great videos.

@mngreta 9 месяцев назад

Can you please share the code? I took the time and tried to copy from the video but something is still wrong :(

@redpantherofmadrid 8 месяцев назад

well explained, thanks a lot, and love the accent, its a bonus :)

@mwasimmit Год назад

for plotting in 2D if i reduce the dimensin to 2 dimensions using PCA and Plot it with the model result.. will it be a good summerize plot?

@gourabguha3167 Год назад

Any chance we can get the github link or the source code .ipynb file along with the dataset

@rawabih4026 Год назад

شكرا من أعماق القلب

@MonuSaraswati 3 месяца назад

Hi Andy - Can you please share this dataset ? I have not been able to find it online

@pramishprakash Год назад

Great explanation Sir

@pioner40 Год назад

very good video. do you share the notebook ?

@fastisslow6177 3 месяца назад

nice explanation👍

@العرندس-ع8ل Год назад

hi .. any python lib to create visual family tree with SQLite db ?

@BabiryeShakira-g4s Месяц назад

Is there a way I can get this exact dataset?

@AndyMcDonald42 23 дня назад

@faicornelius2601 Год назад

Please Andy, after identifying the outliers, how do we remove them?

@AndyMcDonald42 Год назад

Removing outliers needs to be done with due consideration. The cause of them being outliers needs to be properly understood and then the appropriate course of action can be taken. I discuss multiple methods of dealing with outliers in my medium article here: towardsdatascience.com/well-log-data-outlier-detection-with-machine-learning-a19cafc5ea37

@faicornelius2601 Год назад

@@AndyMcDonald42 Thank you so much Andy. I have just followed you on Towards data Science. You are a great teacher.

@danymerizalde1942 Год назад

Where is the data?

@lashlarue7924 Год назад

🫡👏👏👏❤

@nikolanovakovic7591 8 месяцев назад

really struggling to understand this accent

@FxbxxxScxlxrxxnx Год назад

got a question: I have created a model using IF, and I fitted the model with my training dataset, now I want to apply this model to my test dataset. I don't really understand how I actually need to imagine this process of "fitting the IF model"? I mean, when I set contamination to, let's say, 5%, then my model calculates the anomaly scores of all values in the training dataset assigning to the 5% "most anomaly-like" data points the value -1 describing them as anomalies, right?, and after that when I pass my test dataset to the model, does my model then actually just reuse this structure of the IF trained with the training dataset for calculating the anomaly scores of the test data points and then it just compares if there are any anomaly-scores of test data points that superate the lowest one of these 5% "most anomaly-like" datapoints of the training dataset regarding their anomaly-score? And if any test data points are superating the lowest anomaly score of the 5% "most anomaly-like" data points in the training dataset then the data points in my test dataset are described as anomalies?

@johnbaptistbypassinglife Год назад

Yes, that's correct! When you fit an Isolation Forest (IF) model to your training data, the model will create a number of decision trees and use them to calculate anomaly scores for each data point in the training set. The data points with the highest anomaly scores will be considered the "most anomaly-like" and will be given a label of -1 to indicate that they are anomalies. When you apply the model to your test data, the model will use the same decision trees and calculation process to determine the anomaly scores for each data point in the test set. If any data points in the test set have anomaly scores that are higher than the lowest anomaly score of the "most anomaly-like" data points in the training set, they will also be given a label of -1 to indicate that they are anomalies. This process allows the model to identify anomalies in the test data that are similar to the anomalies identified in the training data. However, it's important to note that the model may also identify anomalies in the test data that were not present in the training data, as the model is designed to detect unusual or unexpected patterns in the data. I hope this helps to clarify the process of fitting and applying an IF model to your data! Let me know if you have any other questions.