Machine Learning for Malware Analysis (Part 2)

Подписаться 311

50% 1

In Part two Machine Learning for Malware Analysis an example dataset is taken for PE Header analysis to build a binary classification model for malware detection. The used code/ dataset is available on Kaggle & our github repo. In this lesson, you will apply the mentioned machine learning metrics like balanced accuracy score, average precision score and precision recall curve for model evaluation.
GITHUB:
➡ github.com/databowlr/peheader/
KAGGLE DATASETS:
➡www.kaggle.com/ang3loliveira/...
➡www.kaggle.com/ang3loliveira/...
CONTENT OF THIS VIDEO
00:00 Intro
01:45 Used dataset
02:08 Library download and data analysis
04:55 Cross-vaildation and stratification
11:34 Application of Stratified K-Fold and SMOTE NC
14:28 Tree based machine learning algorithms (Decision Tree, Random Forest & XG Boost)
22:50 Catboost
24:20 K nearest neighbors, naive bayes, logistic regression
28:23 Summary
In real world scenarios highly imbalanced labelled datasets are common for building classification models and anomaly detection across different industries. We would like to present an approach in this video how to preprocess a highly imbalanced dataset with oversampling methods and cross validation by using SMOTE and scikit-learn libraries. A consistent data sampling method is essential to avoid that your model won`t get overfitted during model training, we will introduce you to some commonly used stratification methods. We need also a better representation of the minority labelled data to have a idea which input features are important for malware labelled data. The Synthetic Minority Oversampling Technique (SMOTE) library can assist to create synthetic samples of the minority labelled malware class thus an improved class distribution will help to build a more robost binary classification model. This approach can also be used for other classification problems with highly imbalanced tabular datasets. Then we will apply tree based machine learning algorithms to build a robust classifier and additionally we use catboost algorithm to see some quite interesting features. Other classical machine learning algorithms like logistic regression, naive bayes, k nearest neighbors..will get also attention to build a robust binary classification model. By the end of this lesson, you will have a good baseline to approach other machine learning related classification problems.
About Data Bowl Recipes:
Recipes about Data Science and Data Engineering.
Don't forget to subscribe to the channel and hit the like button
Thanks for watching!
#catboost #randomforest #SMOTE #peheader
#supervisedmachinelearning #machinelearning
#malwaredetection2022 #malwaredetection
#cybersecurity #malware
Related Phrases:
Machine Learning, Malware Detection, Cybersecurity 2022, Machine Learning, Malware Detection Techniques, CarBoost, Randomforest, Malware Analysis, PE Header Analysis
Disclaimer: We do not accept any liability for any loss or damage which is incurred from you acting or not acting as a result of watching any of our publications. You acknowledge that you use the information we provide at your own risk. Do your own research.
Copyright Notice: This video and our RU-vid channel contains dialog, music and images that are property of Data Bowl Recipes. You are authorized to share the video link and channel, embed this video in your website or others.
© Data Bowl Recipes