You can look into: 1. In Tesseract 3 there is a metadata result which contains a recognized font. Probably it is not super reliable, but it might work for basic fonts and detect bold and non-bold fonts. 2. In Tesseract 4 you can export HOCR output and configure it in a way to get boxes around each character (not sure about Tesseract 3). I am not sure how reliable these boxes are either, but if it is okay, you could use them to have a second algorithm which just classifies whether a single character is bold or not and remove non-bold text from the tesseract output. 3. In case you have precise line boxes before using tesseract, you could also look into training an algorithm which segments the part of the line which is bold, then crop the image and use tesseract only for the bold parts. This would probably the most technical solution, but I think it could work as well.