No video :(

[ Image To Text ] Train new Font with Tesseract in Google Colab (5x Faster)

Подписаться 105

Просмотров 13 тыс.

50% 1

You can train a new font with tesseract in google colab too .
Link to the google colab :
colab.research.google.com/git...
#aniquemaniac #tesseract #googlecolab

Опубликовано:

14 ноя 2021

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 33

@garrynil4363 2 года назад

damn bro ! it took me 1 hour to train. Which was taking almost 6 hours before. subscribbbed !!!!

@henriquesouza6699 2 года назад

You are the best!! Thankssss

@Ethiopic 11 месяцев назад

Brilliant: thank you.

@faint.2396 Год назад

Thank you very much!

@rokkhandu7530 2 года назад

worked thanks !

@preciouschannel 4 месяца назад

Thanks for sharing

@ghansim1307 2 года назад

great !! it's easy now

@chinabugaga 2 года назад

thanks a lot

@mohammadwasim4124 2 года назад

Doing great job bhai

@aniquemaniac 2 года назад

Thanks bhai

@jeyarraa2556 2 года назад

bro, your final output text file looks formatted line by line when compared to the source image file.. looks good.. my output text file comes as a single large paragraph without any line formatting.. why so ?

@nkkindcmmin4294 Год назад

NICE

@coconutnut21 Год назад

I'm really confused if it is really working I hope you add first the detection of the default pytesseract and compare it to the result of pytesseract with the trained data file so that we can see if the training is effective or not

@heremgok9400 2 года назад

Oh ! i was trying to train with tesseract 5 .

@nikolaydd6219 2 года назад

thk !

@mahmuodaboalhassan1410 2 года назад

how i can train the arabic lang i know there were a ara.traindata but i need to add new characters my question is how i can prepare my data and the font file just for creating the train data only or what if there is any link for discussions i will be hopeful

@youseffarouk6189 Год назад

لقيت اجابة ؟؟

@aligadget6055 27 дней назад

Hi, how can I retrain Tesseract for Farsi language!?

@huuvo272 2 года назад

thank you very much, can you help me how to config train only with number and comma, ex: 12,355,136

@ozgurkalyoncu 2 года назад

how we can contact with you for a job

@suzzmerick5815 2 года назад

how to train images?? can someone help me?

@manuthvann5717 2 года назад

that's what i have been looking for dear , but since m a newbie with tesseract m a bit curious with the train data u have used , is it the pdf image or text line image dear ? and after training it can this model be used with web application ? Looking forward to hearing back from you thanks

@aniquemaniac 2 года назад

I used a font file (typouprighBT.ttf ) to generate trained data. If you have only image or pdf file you can identify the font type of the image ..from website www.myfonts.com/whatthefont, then search the font download it and generate train data . And for web application i never tried it , may be there is some ways to use the trained model with tesseract.js

@manuthvann7560 2 года назад

@@aniquemaniac oh sorry to get back to you this late , but still thanks again dear . I have followed ur step and it’s working but since you have directly use .ttf font then is that okie to increase the max_page based on our preferences? Is it going to be overfit dear ? Looking forward to hearing back from you .

@aniquemaniac 2 года назад

@@manuthvann7560 yeah no problem, you can increase max_pages too ..

@manuthvann7560 2 года назад

@@aniquemaniac and is there anyways we can train multi fonts at the same time dear ?

@ganeshrajv130 Год назад

what to select in the github traied data ( for hindi training ) and how to train for tesseract 5

@aniquemaniac Год назад

I made it with tesserct 4 version , i hve not tried with tesseract 5

@ganeshrajv130 Год назад

@@aniquemaniac okay plz could you tell me how to tune the LSTM model like I need to use different activation funtion and so Plus I need to retrain for hindi language with tesseract 4 is that possible with your colab code

@agiyos7168 2 года назад

How I can train more than one font?

@faint.2396 Год назад

Just replace the font file with the new font and re-do all the steps in the video.

@13astix94 2 года назад

Running the "tesstrain.sh" throws an error all the time. Does anybody know the reason? == Constructing LSTM training data === [Wed Jun 15 19:18:35 UTC 2022] /usr/bin/combine_lang_model --input_unicharset /tmp/eng-2022-06-15.xPx/eng.unicharset --script_dir /content/drive/MyDrive/langdata_lstm --words /content/drive/MyDrive/langdata_lstm/eng/eng.wordlist --numbers /content/drive/MyDrive/langdata_lstm/eng/eng.numbers --puncs /content/drive/MyDrive/langdata_lstm/eng/eng.punc --output_dir train --lang eng Loaded unicharset of size 69 from file /tmp/eng-2022-06-15.xPx/eng.unicharset Setting unichar properties Setting script properties Config file is optional, continuing... Failed to read data from: /content/drive/MyDrive/langdata_lstm/eng/eng.config Null char=2 Invalid format in radical table at line 0: 19886 3 23 6 3 Creation of encoded unicharset failed!! Error writing recoder!! Reducing Trie to SquishedDawg Error during conversion of wordlists to DAWGs!! ERROR: Program combine_lang_model failed. Abort.