Malware Detection with Convolutional Neural Networks

Подписаться 303

Просмотров 7 тыс.

50% 1

Welcome to our introductionary about malware detection with Convolutional Neural Networks. We will focus on building a binary classification model with the help of convolutional neural networks. This algorithm is quite popular within computer vision related algorithms and widely also used to build image classification models. By the end of this section you will be able to build a malware classifier based on image classification approach.
DATASET:
drive.google.com/file/d/1_KgS...
Github:
github.com/databowlr/malware_cnn
before running code, fix issues:
github.com/databowlr/malware_...
CONTENT OF THIS VIDEO
00:00 Intro
02:38 Tensor, Scalar, Vector key arguments in Pytorch
05:53 Main mathematical operations in Pytorch
07:53 Neural Network Layers
09:38 Neural Network Activation Functions
12:40 Pytorch methods for Neural Networks
17:25 Malware Detection with CNN
19:50 Advantages of CNN
22:00 Input CNN
22:40 Convolution Operation
24:10 Activation Feature Map
24:30 Pooling
25:30 Inputs and Outputs for CNN
26:45 Backpropagation
28:50 Loss Function
29:55 Entropy and Cross-Entropy
33:00 Building CNN model
42:08 Hyperparameters in Tensorboard
A Convolutional Neural Network is using same weights in multiple convolution layers which can be understood as parameter sharing. From input units to output units a filter is transforming meaningful features, fewer parameters and memory is required. A typical Convolution Neural Network consists of three main components, convolution layers, pooling layers and fully connected layers. The convolutional layer extracts different features from each part of an input image.
Edge detection and sharpening are basic feature extraction operations by a filter. The Pooling Layer collects only data that has been aggregated using a specific aggregation function. The most popular pooling process is max pooling, which reports the maximum output from the neighborhood. A fully connected layer takes the output of the convolutional layers and combine it through learning of non-linear combinations of features. The final layer is in charge of calculating the probability of belonging to a given class.
Pytorch models are based on modules and module is base class that all other modules inherit from. Interface of convolution layers and dense layers in a neural network is flatten, output of conv2d layer is image and input for dense layer is a vector.
A simple model is build with the sequential module, a linear layer, ReLU, another linear layer and relu and finally linear layer with Log Soft max activation function. Tensorboard allows us to have an excellent visual view under graphs on the model architecture in terms of layers and input output relations. We define the necessary arguments to display the graph of our Convolutional neural network in Tensorboard. First we need to define a function for hyperparameter tuning of our model, which will be used on the available training dataset. This for loop takes as previously outlined different parameter values for batch size, drop out as well as leaning rate and shuffling training dataset into consideration.
Tensorboard gives us the opportunity to download these Hyper Parameter combinations in different formats. We can focus on relevant parameters having an impact on high accuracy and low loss. Finally, we print for every single epoch the current state of the defined parameters like learning rate, dropout and batch size. And also loss and accuracy will be printed out for every epoch.
Finally, we can determine suitable hyperparameters to build our malware classifier with accuracy above 97%.
About Data Bowl Recipes:
Recipes about Data Science and Data Engineering.
Don't forget to subscribe to the channel and hit the like button
Thanks for watching!
#pytorch #malware #tensorboard
#convolutionalneuralnetwork #machinelearning
#imageclassification2022 #malwaredetection
#cybersecurity #deeplearning
Related Phrases:
Convolutional Neural Network, Pytorch, Machine Learning, Malware Detection, Cybersecurity 2022, Machine Learning, Malware Detection Techniques, Deep Learning, Malware Analysis
Disclaimer: We do not accept any liability for any loss or damage which is incurred from you acting or not acting as a result of watching any of our publications. You acknowledge that you use the information we provide at your own risk. Do your own research.
Copyright Notice: This video and our RU-vid channel contains dialog, music and images that are property of Data Bowl Recipes. You are authorized to share the video link and channel, embed this video in your website or others.
© Data Bowl Recipes

Опубликовано:

11 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 38

@carlosdanielcontrerasperez773 Год назад

I would like to know how to add additional data to the dataset. Is there any specific procedure to convert binaries to images? Thank you very much

@thatGuySuraj 2 года назад

Thanks for this awesome video. I need help in combining tar files. How do I combine all the tar files without corrupting the header. Can you provide me link of the dataset used so that I directly download it from there and avoid combining?

@databowlrecipes318 2 года назад

Dear Suraj, please find link for dataset download. easyupload.io/5f4lpb

@thatGuySuraj 2 года назад

@@databowlrecipes318 Thank you so muchh !

@lokeshdohare4872 Год назад

@@databowlrecipes318 It's showing file not found

@saumyagaur7633 Год назад

Thank you for the video sir. Can you please provide the link for the dataset since the one you mentioned before is not working now?

@databowlrecipes318 Год назад

www.4shared.com/s/ffVrAKm8cfa

@omaimaelalaouielfels9637 9 месяцев назад

Please what is the name of this dataset ? Os it Malimg or MMCC or somthing else?

@rahmanchowdhury719 2 года назад

Dataset is not available now... How can i get data? This link is not work.

@databowlrecipes318 2 года назад

Try www.4shared.com/s/fayNdYjvOea

@lamnhat76 4 месяца назад

Thank you for sharing the video, please help me answer 2 questions: 1. What is the name of the dataset you use? 2. Can I apply this code to Android?

@databowlrecipes318 4 месяца назад

it´s an old dataset i found on github, i recommend you to take a look on this site for dataset: mal-net.org/ and this repo: github.com/TanayBhadula/malware-image-detection.

@makyxyz3375 4 месяца назад

Hi, Do you have the link for the dataset? Thanks in advance

@databowlrecipes318 4 месяца назад

link to download dataset drive.google.com/file/d/1_KgSD6pCdO2_YCJMkCxmoPBk2SdKRGHP/view?pli=1

@makyxyz3375 4 месяца назад

Thanks!

@user-jn8ht9ww4q 6 месяцев назад

Can please share the dataset. Your all given link is not found.

@databowlrecipes318 6 месяцев назад

also please see updated video description with link to dataset. drive.google.com/file/d/1_KgSD6pCdO2_YCJMkCxmoPBk2SdKRGHP/view?usp=drive_link

@AB-dw5ul Год назад

I want to add to the data, after converting the file to a photo, which photo should I add to the train folder?

@databowlrecipes318 Год назад

i can`t understand your question, the provided dataset need to be splited in train/test.

@AB-dw5ul Год назад

@@databowlrecipes318 thanks for your response My question is, I want to add data to the data set that you posted. That is, benign and new malware files. When I convert files to images, each file is converted into a large number of images with my standard size. How do I know which image to transfer from a malware file to the malware folder and which image to transfer from a benign file to the benign folder in order to finally train them all?

@databowlrecipes318 Год назад

@@AB-dw5ul feel free to drop me your email for further conversation(databowlrecipes@gmail.com)

@ThaiNguyenNhat 7 месяцев назад

Why is my average training acc & average validation acc in my training only 50%? Can u help me?

@databowlrecipes318 7 месяцев назад

Please consider following changes to the code github.com/databowlr/malware_cnn/issues/1

@lahcenjou3358 2 года назад

I got this error because of loss.backward() .one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 1024]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

@databowlrecipes318 2 года назад

pls install lower version of pytorch e.g. !pip install torch==1.7.0 torchvision==0.8.0

@linhvokhuong6207 Год назад

@@databowlrecipes318 hello, thankyou for your video, but i got this message, please help: ERROR: Could not find a version that satisfies the requirement torch==1.7.0 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1) ERROR: No matching distribution found for torch==1.7.0 [notice] A new release of pip available: 22.2.2 -> 22.3.1 [notice] To update, run: python.exe -m pip install --upgrade pip

@databowlrecipes318 Год назад

@@linhvokhuong6207 pls see issue list before starting all codes github.com/databowlr/malware_cnn/issues/1

@ThaiNguyenNhat 7 месяцев назад

remove inplace=True when initializing nn.Dropout as follows: self.dropout = nn.Dropout(p=0.5). It will work!

@seonahbae4377 2 года назад

Hi i'm sorry, i tried this script on my colab and got a problem in tensorboard, its didn't showing result like scalar and hparam like in your videos can u help me sir? thankyou

@databowlrecipes318 2 года назад

Hello have you made changes according to github.com/databowlr/malware_cnn/issues/1

@seonahbae4377 2 года назад

@@databowlrecipes318 yes i have, other command run smoothly like in the video but when it come to this command %reload_ext tensorboard %tensorboard --logdir ./runs/ the result didn't show up when i try again it say 'Reusing TensorBoard on port 6006 (pid 115), started 2:06:34 ago. (Use '!kill 115' to kill it.)' and at the tensorboard just going blank and not found

@databowlrecipes318 2 года назад

@@seonahbae4377 try %load_ext tensorboard %tensorboard --logdir ./runs/ -->HPARAMS and then you will see table view, parallel coordinates and scatter plott -->click on

@jayantverma6196 2 года назад

Great Tutorial !!!. But i am having this error in if torch.cuda.is_available(): device = torch.device("cuda") else: device = torch.device("cpu") print(device) model.to(device) error: model not defined Can you please help me out?

@databowlrecipes318 2 года назад

you`re running in colab?

@jayantverma6196 2 года назад

@@databowlrecipes318 yes

@databowlrecipes318 2 года назад

@@jayantverma6196 try model= Net().to(device) --> class Net (and not model)