Building an OCR Model to Crack Captchas: A Neural Network Tutorial with Keras and TensorFlow

Подписаться 102 тыс.

Просмотров 25 тыс.

50% 1

Inside my school and program, I teach you my system to become an AI engineer or freelancer. Life-time access, personal help by me and I will show you exactly how I went from below average student to making $250/hr. Join the High Earner AI Career Program here 👉 www.nicolai-nielsen.com/aicareer (PRICES WILL INCREASE SOON)
You will also get access to all the technical courses inside the program, also the ones I plan to make in the future! Check out the technical courses below 👇
_____________________________________________________________
In this video 📝 we will create an OCR Model To Read Captchas With Neural Networks In Keras And TensorFlow. We will first go over what a recurrent neural network is and why we are going to use that in this video to create an OCR model. We will talk about the CTC loss and at the end of the video, we will create the model, load in the dataset, preprocess it and train our neural network.
If you enjoyed this video, be sure to press the 👍 button so that I know what content you guys like to see.
_____________________________________________________________
🛠️ Freelance Work: www.nicolai-nielsen.com/nncode
_____________________________________________________________
💻💰🛠️ High Earner AI Career Program: www.nicolai-nielsen.com/aicareer
⚙️ Real-world AI Technical Courses: (www.nicos-school.com)
📗 OpenCV GPU in Python: www.nicos-school.com/p/opencv...
📕 YOLOv7 Object Detection: www.nicos-school.com/p/yolov7...
📒 Transformer & Segmentation: www.nicos-school.com/p/transf...
📙 YOLOv8 Object Tracking: www.nicos-school.com/p/yolov8...
📘 Research Paper Implementation: www.nicos-school.com/p/resear...
📔 CustomGPT: www.nicos-school.com/p/custom...
_____________________________________________________________
📞 Connect with Me:
🌳 linktr.ee/nicolainielsen
🌍 My Website: www.nicolai-nielsen.com/
🤖 GitHub: github.com/niconielsen32
👉 LinkedIn: / nicolaiai
🐦 X/Twitter: / nielsencv_ai
🌆 Instagram: / nicolaihoeirup
_____________________________________________________________
🎮 My Gear (Affiliate links):
💻 Laptop: amzn.to/49LJkTW
🖥️ Desktop PC:
NVIDIA RTX 4090 24GB: amzn.to/3Uc7yAM
Intel I9-14900K: amzn.to/3W4Z5Cb
Motherboard: amzn.to/4aR6wBC
32GB RAM: amzn.to/3Jt2XVR
🖥️ Monitor: amzn.to/4aLP8hh
🖱️ Mouse: amzn.to/3W501GH
⌨️ Keyboard: amzn.to/3xUGz5b
🎙️ Microphone: amzn.to/3w1F1WK
📷 Camera: amzn.to/4b4Ryr9
_____________________________________________________________
Tags:
#OCR #NeuralNetwork #Captchas #NeuralNetworks #DeepLearning #NeuralNetworksPython #NeuralNetworksTutorial #DeepLearningTutorial #Keras #Tensorflow

Наука

Опубликовано:

11 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 62

@NicolaiAI Год назад

Join My AI Career Program www.nicolai-nielsen.com/aicareer Enroll in My School and Technical Courses www.nicos-school.com

@theuser810 11 месяцев назад

The repository link is not in the description

@axelanderson2030 Год назад

For anyone who is getting poor results: 1. The small dataset means that a random split might not generalise the problem. for example, the train dataset might contain much higher percentage of a digit than another 2. You can use opencv to perform preprocessing which can improve performance. Using morphological transformations to remove noise can improve performance immensely. 3. To avoid overfitting, I found that a Gaussian noise layer can help. This makes it harder to learn therefore harder to overfit. Hope this helps!

@kalifardiansyah5863 Год назад

have a question!. how to avoid miss detect of character? especially between two similiar character. example. letter Z detected 2, letter S detected 5, letter I detected 1, etc

@axelanderson2030 Год назад

@@kalifardiansyah5863 you may require more training data, or a larger CNN architecture

@HarshpreetSingh-jz2lf Год назад

I tried it with 60000 images, used morphological techniques but still doesn't provide accuracy, val_loss just doesn't go below 14

@axelanderson2030 Год назад

@@HarshpreetSingh-jz2lf do you have a class imbalance in the dataset? Is the model built correctly? Is the data preprocessed correctly? I can't help you if you don't provide any context except for "it no work"

@alexmoruz1993 2 года назад

Hi Nicolai, I was wondering would there be a way to feed in this kind of network wider images with text or have kind of dynamic input with size?

@ehsanroshan7068 2 года назад

Hi Nicolai, thanks for great explanation. Could you please explain how to measure accuracy?

@adepusairahul7375 7 месяцев назад

where is the repository link i am not able to find it in description

@syedmuzammilahmed6872 8 месяцев назад

Hi Nicola When i add "num_oov_indices" = 0 parameter in stringLookup code then model training code work but it post labels on wrong images in visualization part before training and creating model. So i removed "num_oov_indices" and now my model training code of earlystopping is not working. Code stop in very first epoch Any solution for this ?

@omkarmestry4117 Год назад

I m trying to run this code but m getting error like InvalidArgumentError : graph execution error Anyone can help with this

@souhailel-ghayam4714 3 года назад

Hey, Thank you very much for this beautiful explanation of the code and the philosophy behind ocr with LSTM and CTC layer. Can you please verify if the code always works well because I was executing it and it was working but now doesn't. I think there is a problem in mapping characters to numbers and mapping numbers to their original characters by the function of ('' layers.experimental.preprocessing.StringLookup''). I tried to compilate it in google colab but when I tried to visualize the data it doesn't give the correct label text. I would be very thankful if you verify it and give some solutions to fIxe the problem of mapping characters to numbers and mapping numbers to their original characters .

@NicolaiAI 3 года назад

Thank you very much for watching! The code should not depend on anything and should be working every time, hmm 🤔

@nadyasudusinghe2213 2 года назад

Hi, I'm getting the same error. Did you find the solution?

@traderdaniel4749 Год назад

Same here. I used only digits as labels therefore I removed "char_to_num" and "num_to_char"

@benoitd94 10 месяцев назад

Do you think I can use your code to decode the digits of my water counter?

@NicolaiAI 10 месяцев назад

Maybe u Can try easyocr for that!

@hsnhsynglk 2 года назад

## Preprocessing # Mapping characters to integers char_to_num = layers.experimental.preprocessing.StringLookup( vocabulary=list(characters), mask_token=None ) # Mapping integers back to original characters num_to_char = layers.experimental.preprocessing.StringLookup( vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True

@badihaboulhosn8178 2 года назад

Thanks, thought i was the only one!

@syedmuzammilahmed6872 8 месяцев назад

Thanks Man

@UZMAALFATMI 7 месяцев назад

thanks so much!

@GuyJustCool 2 года назад

Dear Coding Lib! im here with the Capthcha project! seems like turning the shuffle on messes with the shuffling function and does incorrect tplit. I have yet to find solution, and would really appreciate if you looked into it! If shuffle is off, it works well. Another person pointed the bug out, and its labels being on wrong images

@HassanKhan-ei2wh Год назад

@syedmuzammilahmed6872 8 месяцев назад

@@HassanKhan-ei2wh Thanks Man

@syedmuzammilahmed6872 8 месяцев назад

@@HassanKhan-ei2wh When i add num_oov_indices = 0 parameter in stringLookup code then model training code work but it post labels on wrong images. So i removed num_oov_indices and now my model training code of earlystopping is not working. Any solution for this ?

@coconutnut21 Год назад

Can I use this for model for license plates?

@alokthakur3298 6 месяцев назад

can anyone provide me wiyh the code

@user-kw9cu 2 года назад

can you provide library versions you used

@user-yr2cb6ms3r 6 месяцев назад

can i extract text from images by the way ? My final project is extract text from images but i can not coding . I need to help please .

@EnsignerTV 3 года назад

thanks a lot !

@NicolaiAI 3 года назад

Thanks for watching!

@hendrywijaya1017 2 года назад

Excuse me bro, i have an issue when im running build_model() function after CTC Loss its happen in line 43 about x = layers.Reshape(target_shape = new_shape, name='reshape')(x) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 73 74 # Panggil Functionnya buat bkin model ---> 75 model = build_model() 76 model.summary() in build_model() 41 # floor division menghasilkan nilai berupa hasil dari pembagian bersisa 42 new_shape = ((img_width // 4), (img_height // 4) * 64) ---> 43 x = layers.Reshape(target_shape = new_shape, name='reshape')(x) 44 x = layers.Dense(64, activation='relu', name='dense1')(x) 45 x = layers.Dropout(0.2)(x) /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in __call__(self, *args, **kwargs) 975 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list): 976 return self._functional_construction_call(inputs, args, kwargs, --> 977 input_list) 978 979 # Maintains info about the `Layer.call` stack. /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list) 1113 # Check input assumptions set after layer building, e.g. input shape. 1114 outputs = self._keras_tensor_symbolic_call( -> 1115 inputs, input_masks, args, kwargs) 1116 1117 if outputs is None: /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs) 846 return tf.nest.map_structure(keras_tensor.KerasTensor, output_signature) 847 else: --> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks) 849 850 def _infer_output_signature(self, inputs, args, kwargs, input_masks): /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks) 886 self._maybe_build(inputs) 887 inputs = self._maybe_cast_inputs(inputs) --> 888 outputs = call_fn(inputs, *args, **kwargs) 889 890 self._handle_activity_regularization(inputs, outputs) /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in call(self, inputs) 537 # Set the static shape for the result since it might lost during array_ops 538 # reshape, eg, some `None` dim in the result could be inferred. --> 539 result.set_shape(self.compute_output_shape(inputs.shape)) 540 return result 541 /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in compute_output_shape(self, input_shape) 528 output_shape = [input_shape[0]] 529 output_shape += self._fix_unknown_dimension(input_shape[1:], --> 530 self.target_shape) 531 return tf.TensorShape(output_shape) 532 /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape) 516 output_shape[unknown] = original // known 517 elif original != known: --> 518 raise ValueError(msg) 519 return output_shape 520 --------------------------------------------------------------------------- and this the error message ValueError: total size of new array must be unchanged, input_shape = [50, 50, 64], output_shape = [50, 768]

@chelvanchelvam4332 3 года назад

can it suitable for text recognition task?

@NicolaiAI 3 года назад

Yes if u just train it on what u want to recognize

@chelvanchelvam4332 3 года назад

@@NicolaiAI Thank you I will try.

@prathamshah5521 5 месяцев назад

Hey i am not getting accurate results, i checked your github for some reason the labels arent matching the captchas during testing what would you recommend to do

@LucasDM4 5 дней назад

Fix the code / fix the labels

@abhisekseal8044 2 года назад

Hi, I am a beginner in this field and I've watched your video and implemented this code. Its working fine but I need to test a single captcha image how can I do that. I was trying to do that but the prediction was not good . Please help me out if you can. 🥺

@warzone_gods Год назад

Have you found the answer to this?

@Cordic45 2 года назад

Sir Why we can't use regular objects detection to detect the number ?

@Konnits Год назад

Hi! Im trying the code but i having an error while training : Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [5], [batch]: [6]. Anyone can help me to fix this?

@arpittalmale6468 Год назад

same bro

@lanasillomaster7034 7 месяцев назад

I was replicating this project with another dataset i made and got that error because I forgot a letter when labelling a file

@tricialamjingyi 2 года назад

Hi, how can I get for captcha that has 6 digits each picture? Currently it’s 5 digits in your example, I know I need to change something in the model but I can’t seem to figure it out, :( the error I keep getting is cannot add tensor to batch. Number of elements does not match. Shapes are: [tensor]: [5] [batch]: [6] How should I change or how do I understand what I need to change?

@arslanmushtaq9774 Год назад

Did you find the solution?

@kentoky6568 Год назад

Hello, in my case I tried changing the dataset for images with 4 characters and it was adapted to all 4, it would mean that you should make a model for each different length.

@aryangupta2051 10 месяцев назад

hey did you fix it?

@aryangupta2051 10 месяцев назад

@@arslanmushtaq9774 hey did you fix it?

@bbtvines 2 года назад

how to impliment it???You just read all docs

@NicolaiAI 2 года назад

Hi, 80% of the video is implementation

@creatur 2 года назад

@@NicolaiAI I am having a single captcha and I trained my modes. So how can I solve that captcha?

@NicolaiAI 2 года назад

What do u mean by single captcha? In the video they are passed through the model one by one too

@creatur 2 года назад

@@NicolaiAI 😔😔😔I am noob with tf. I wanted to make a api which gets captcha by base6 4 and solves captcha and send back the captcha response

@warzone_gods Год назад

@@NicolaiAI i want to input a single CAPTCHA and I want the model to predict it

@user-tk5xe1km7p 11 месяцев назад

How to crack 6 digits and characters captcha

@aryangupta2051 10 месяцев назад

hey did you get a method?

@traderdaniel4749 Год назад

Anyone else has the same error?: File "C:\Users\user\PycharmProjects\ocr_gas\ocr.py", line 135, in call * label_length = tf.cast(tf.shape(y_true)[1], dtype="int64") ValueError: slice index 1 of dimension 0 out of bounds. for '{{node ocr_model_v1/ctc_loss/strided_slice_2}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](ocr_model_v1/ctc_loss/Shape_2, ocr_model_v1/ctc_loss/strided_slice_2/stack, ocr_model_v1/ctc_loss/strided_slice_2/stack_1, ocr_model_v1/ctc_loss/strided_slice_2/stack_2)' with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = , input[2] = , input[3] = . Call arguments received by layer "ctc_loss" " f"(type CTCLayer): • y_true=tf.Tensor(shape=(None,), dtype=float32) • y_pred=tf.Tensor(shape=(None, 50, 12), dtype=float32)