Thank you for the videos man really helpfull! Some comments, correct me if I'm wrong: 1. When doing x_train.reshape you should use x_train.reshape(60000,-1). In your video you use x_train.reshape(-1,784), stating that the -1 will keep the 60000 the same. Actually the -1 will cause reshape to automatically find the 784 without you having to compute 28*28, so to take full use of the syntax its easier to use x_train.reshape(60000,-1) 2. You mention that the type of the data is float64, but the type of the data is unit8. Therefore, I don't think we will be computationally more efficient by changing to float32. 3. Add this to the first lines of your script if you want a clear terminal output: import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' from os import system system("clear") 4. Add this to your vscode settings if you want everything to work nicely: { //to open settings in json format: "workbench.settings.editor": "json", //to open default settings when opening user settings "workbench.settings.openDefaultSettings": false, //"python.pythonPath": "C:\\Users\\31627\\pyver\\383\\Scripts\\python.exe", //this is the default python //"python.pythonPath": "C:\\Users\\31627\\.conda\\envs\\tf2.4\\python.exe" "python.pythonPath": "C:\\ProgramData\\Anaconda3\\python.exe", //this is the default python "python.disableInstallationCheck": true, //dont know why "editor.tabCompletion": "on", //to be able to tab out of '' "breadcrumbs.enabled": false, "workbench.startupEditor": "newUntitledFile", "workbench.editorAssociations": { "*.ipynb": "jupyter-notebook" }, "workbench.colorTheme": "Default High Contrast", //to not show the file path at the top of the code file //we installed the extension code runner to have a clear code ouptue in the terminal "editor.fontSize": 17, "editor.fontWeight": "500", "debug.console.fontSize": 17, "terminal.integrated.fontSize": 17, "terminal.integrated.fontWeight": "600", "kite.showWelcomeNotificationOnStartup": false, "python.formatting.provider": "autopep8", "editor.formatOnSave": false, "python.formatting.autopep8Args": [ "--ignore", "E402" ], "code-runner.executorMap": { "python": "$pythonPath -u $fullFileName" }, "code-runner.clearPreviousOutput": true, "code-runner.showExecutionMessage": false, "code-runner.saveFileBeforeRun": true, "code-runner.runInTerminal": true, "python.showStartPage": false, "python.condaPath": "C:\\ProgramData\\Anaconda3\\_conda.exe", "python.defaultInterpreterPath": "C:\\ProgramData\\Anaconda3\\python.exe", "notebook.cellToolbarLocation": { "default": "right", "jupyter-notebook": "left" }, //this is the default conda }
Thanks for this, always enjoy reading comments that offer alternative suggestions. Just my two cents: 1. Nice spot, certainly seems neater to do it this way! To be more general one could also write: x_train = x_train.reshape(x_train.shape[0], -1) Assuming the first dimension corresponds to the samples, different x_train sets will still be reshaped correctly using this, rather than having to hard code the exact number of samples each time 2. Actually by rescaling the pixel values to be between 0.--1.0 (accomplished by the /255.0 operation), then the datatype does by default become float64. Manually setting the datatype therefore to float32 cuts memory usage by half One can check this for themselves using x_train.dtype (which will return int8, float64 or float32 depending on which transforms have been applied) And to get the actual size of this object in memory, you can use: import sys; sys.getsizeof(x_train) 3. & 4. Didn't look into these as not too bothered by the warning messages, and not a vscode user :)
Bro I am not getting the things used inside those built in function as I am new i have not watched the theory part.....what would be your suggestions @@hamzajaved5283
@@hamzajaved5283bro I am not able to understand the function parameters like some what ... I amnew and I have not watched theory of neural network correctly... I started watching this course of tensorflow ...bro what would be your suggestion
using the exact same code, I am getting x_train.size=60000,28,28 but when I run the model I am getting 1875/1875 in each epoh rather than 60000/60000....why is this happening??
you should watch statquest when it comes to theory seriusly guys, although aladdin resources are good. Stat quest is by far more concisive, and understandable
I'm trying with tensorflow 2.17.0 and the line inputs = keras.Input(shape=(28*28)) produces the error "cannot convert '784' to a shape. solution: apparently shape has to be explicitly a tuple, so shape(28*28, ) (with comma) does it.
4:06 i cant compile its give me error : list index out of range from tf.config.experimental.set_memory_growth(physical_devices[0], True) line. please help me
The code provided doesn't work, if I modify it to work with latest tensorflow, I get 0.15 accuracy. I don't have a GPU to use, its an older laptop, does that change the answer? Currently I'm using the sequential method.
hey i used adadelta and rmsprop as my optimizers and in both cases for training set the accuracy was above 0.99 and loss of around 0.0062, but while evaluating the loss function is pretty high 1.32 and 1.42 resp with accuracies of 0.95 for both. What might be the reason for this huge devaiation, is it due to over-fitting or any other concept i am missing??
I notice that the n of parameters comes from the cross links, 512*28*28 and just as I was writing I noticed that the remaining 512 have to be the weights, interesting that they have so many as you actually need just one for optimizing the shift.
sir,in the last portion of this lecture you mentioned about extracting the model features for the layers of the model.. so my doubt is when we should extract the model features ...is it we can extract the model features both after training the model and before training the model.Since while I build the cnn model then I found we can extract the features of the layers of the cnn model both after and before training the model and even we can predict the inputs using those features. So,its creating some confusion about how we are able to predict the inputs using the intermediate layer features even before we train the model. please help me to solve this confusion.
I've been trying to train the exactly same model, but for some reason i was able to get max accuracy of only about 92 percent. I even tried tuning the hyper parameters but the results were same. Can you tell me what might be the probable issue?
can just select which parameters/tensors to get derivatives for and train (aka step, update) only them, while selected others are kept fixed. PS (long tangent): Generally can use derivative of anything wrt any tf.Variable type tensors for whatever purposes, when using tf.GradientTape for training a nn or for anything else (like when fitting arbitrary parametric functions or anything where derivatives are useful). Can save (e.g., as pickles), load, swap or manually update the trainable weights (which are all tf.Variable type) or any other Variable type tensors using .assign() method, generally is convenient. Any consecutive steps (can be sequence of methods called from several classes) that need to happen fast (e.g,. during training) can be just wrap in a function decorated with @tf.function, frees up the whole pipeline to be handled/debugged eagerly (i.e. numpy like behaviour) outside of training, and by the way when the graph mode is on (only inside @tf.function executions) frees and option train eagerly/interactively but slower. Moreover, hands on control of what Jacobeans constrain when running tf.GradientTape in eager more allows to check and skip gradient updates if there are any tf.nan values, which would otherwise break the model. There jacobians can also be manually fiddled with before passing to optimiser, like can do clipping, can normalise them, maybe even reweight them per datapoint. In summary tf.GradientTape and @tf.function, mixed with .Layer and .Model inheritance tools, allow user to do almost whatever they want with this library.
@@donfeto7636 This channel is awesome! Also, on coursera there is a useful course called: 'custom-models-layers-loss-functions-with-tensorflow'. [my 1st reply got deleted probs cos i put this full link in.] The main thing I was saying is that some tf decorators have some nuances to consider, to train faster and correctly. Googling forums etc and trying things out until it trains equally well as the official best practice code, step by step, was a good exercise when pushing the library to do unusual custom things. I really like the fact that tf now very nicely supports the eager mode functionality, meaning that things can be selectively ran in graph mode only when and where that's needed for speed up.
unfortunately in Tensorflow 2.8.0 the Functional API is broken. After consulting google it appears that recently the Functional API is outputting the same error message that nobody knows how to solve.
Hey thanks for this awesome lesson. A query though. The same model with functional API gives me 9.7-9.8% accuracy while the Sequential is giving me 97-98% accuracy
The tutorial is very helpful, though I'm getting an error when trying to print model summary following tutorial 3: print(model.summary()) : raise ValueError(f"Cannot convert '{shape}' to a shape.") ValueError: Cannot convert '784' to a shape.
How to only print accuracy while training a Keras functional API model? Please help? If you are here? I am trying to compare 3 different output layers with different activation functions. But the problem is, I only want the accuracy and not the loss while training. I have no issues with it but the line is too LONG. I want to compare each layers' accuracy. Epoch 1/5 1875/1875 - 4s - loss: 3.7070 - Sigmoid_loss: 1.1836 - Softmax_loss: 1.2291 - Softplus_loss: 1.2943 - Sigmoid_accuracy: 0.9021 - Softmax_accuracy: 0.9020 - Softplus_accuracy: 0.5787
Question: when building a model without a final activation layer and the allowing the loss function to apply the last activation; what does the model.predict code look like? Thanks.
Hi, i have a question. Why is it the the shape of the input is 28*28? I understand that the images are 28 by 28 pixels, but i thought the entries were 60k? I dont understand exactly how this part goes.
the division into 255 is not because of performance .. its because of weights asigned to the neural network -- weights going to be generated randomlly between 0 and 1 and if you kept your Xs value big it will be ignored and may all neural network wont learn .. thanks
I understand why the last Dense layer is 10, but why is the first layer 512, and the second layer 256, im new to this completely so if anyone could give an explination for dummies id appreciate it :)
it's the random number that we can take ... but there is some logic in minimising values the dense layers .... for example let's take a human image that needs to be predicted.... in the first dense layer we are identifying will produce some results that will predict fingers or hands and legs, hair , eyes etc (identification of small parts).. now coming to 2nd dense layer we will combine those fingers and will form a hand image or leg image , by using eyes and hair we can identify a head.. now coming to final output layer we will combine all these values(hand, face ) and we will predict it's a human image... so that it works..
for model training we need a lot of data - 60000 samples from y_train but for evaluation smaller dataset will be enough, hence only 10000 samples in y_test
dude how is your training time soo less mine is 69 -75 sec per epoch also, while taking batch size = 32 only 60000/32 training data elements are passed not 60000 so I changed batch size to 1 and now it is taking 69-75 sec
i think this is because he uses a gpu, and you dont? it can be tricky to activate it sometimes, i think. also the batch-size doesnt make sense if it is 1. then you do each train step only with one image. if your batch-size is eg 32, then normally still all 60000 images are taken within one epoch. be careful with these answers, im not a pro myself
hello mine also takes only about a second each and I am running a GTX 1060. Since tensorflow appears to run on the GPU I would assume the stronger the GPU the faster the computations.
You can use sigmoid function for binary classification like cat-no cat. For the MNIST problem you should use softmax activation function for the output layer as we need multiple labels.
@@thevoid5181 from keras.datasets import mnist doesnt work. They doesnt find datasets. u can find datasets over "from keras import datasets", but from keras.datasets it doesnt work