Тёмный

HOW TO CONVERT .PDF TO .TXT USING PYTHON 

I know python
Подписаться 49 тыс.
Просмотров 34 тыс.
50% 1

In this video, I will show you how to convert a pdf file into a text file using python we have pypdf2 module for this short project
source code : github.com/har...
questions related to this video
* PDF file: Reading and Extracting data using Python
* Building a PDF Data Extractor Using Python!!
* Convert PDF to Text: Python PDFminer example using Python
* Using PyPdf2 for analysing the PDFs
* PYTHON CODE TO CONVERT .PDF FILES IN A FOLDER TO .TXT FILES
* How To Read PDF Documents In Python
* Convert pdf to text file with Python
* How to convert Pdf to Text format using python
tags : #pdftotxt #pypdf2 #iknowpython

Опубликовано:

 

29 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 48   
@joseadriano8168
@joseadriano8168 4 года назад
can't open file 'Show': [Errno 2] No such file or directory i have this error
@abhishekthakkar8897
@abhishekthakkar8897 4 года назад
i think txt file open in w(write) mode becase overwrite it good for this
@omarrazi4826
@omarrazi4826 4 года назад
its only converting one page
@XavierBustosC
@XavierBustosC Год назад
3update to 2023, many instructions are deprecated import PyPDF2 pdffileobj=open('D:\Python_spyder\\1.pdf','rb') pdfreader=PyPDF2.PdfReader(pdffileobj) x=len(pdfreader.pages) pageobj=pdfreader.pages[x-1] text=pageobj.extract_text() file1=open(r"D:\Python_spyder\\1.txt","a") file1.writelines(text) file1.close() print("hecho")
@anastasiatsoukala306
@anastasiatsoukala306 4 года назад
Hey, thank you for your contribution! Does anyone know how could I set as an input a whole folder of PDFs? In example: Let's say I have a folder of 50 PDFs and want an output folder of 50 converted TXTs. Can I do that in this code?
@kalh-tg9wb
@kalh-tg9wb 8 месяцев назад
PdfFileReader module is deprecated
@marcscherzer
@marcscherzer 4 года назад
The converted txt does not contain any text, can anyone help me?
@manish36556
@manish36556 3 года назад
This method doesnt work. PDF file text format are coming as a blank page.Tried with 4 diffferent files.for camscanner O/P is "CamSanner" nothing else.. And rest of PDF it is blank txt file.can you help in this
@hardcode1136
@hardcode1136 4 года назад
Hello @iknowpython sir! Sir actually I'm using this code on my pydroid3 on my Android phone. Actually it is creating the text file but not writing the all text into that .txt file from that pdf file. It only copies the heading from that pdf file. The pdf file only have text, headers & footers only. Sir plz help🙏. Thank you
@nicolaswirtz6952
@nicolaswirtz6952 4 года назад
Does anyone know how it is extracting and converting the data into text? It seems you are reading in binary and then the module command does its magic to provide the text version. I tried this on a readable pdf with values that were formatted within excel tables. I noticed that when I created the .txt file, a lot of the information was left out. I am attempting to have the program do a "copy and paste" of the pdf file into a text file, but I don't think this method does that. Does anyone else know a different method? Great video though as it worked! Just not for my specific case....
@MrVivekc
@MrVivekc 5 лет назад
Previously I tried to extract text from pdf using pypdf2 but it didn't worked for me, actually output on console was blank but the links on internet have outputs. So do I need to convert pdf to text to achieve result? Normal text extraction and printing on console wont work?
@heyderelesgerov9499
@heyderelesgerov9499 5 лет назад
Thanks for video, but 480p or 720p quality of the video may be good for us.
@Iknowpython
@Iknowpython 5 лет назад
Bro the pixel intensity of video is 2016x1134 .... Try 720 px it will become very clear then
@MrDogloverguy
@MrDogloverguy 4 года назад
The poor quality is on your end, go to the settings on video and you can control quality from there
@hardcode1136
@hardcode1136 4 года назад
Hello @iknowpython sir! Sir actually I'm using this code on my pydroid3 on my Android phone. Actually it is creating the text file but not writing the all text into that .txt file from that pdf file. It only copies the heading from that pdf file. The pdf file have text, headers & footers only. Sir plz help🙏. Thank you
@abhi2k68
@abhi2k68 3 года назад
In my converted text file all words are attached together. There is no spaces between words
@mayuragarwal8860
@mayuragarwal8860 4 года назад
i have one pdf and many failed to convert it to excel. will you accept the challenge??
@petrockspiracy3120
@petrockspiracy3120 3 года назад
Just get frustrating unicode error.
@akashgeorge5433
@akashgeorge5433 3 года назад
what if there are images in the pdf?
@alenwalker7362
@alenwalker7362 5 лет назад
Bro what are the advance stuff should I know when become a programmer what key workds should I know(I hope you know what I'm asking) please help me broo???
@Iknowpython
@Iknowpython 5 лет назад
Brother there is nothing as advance in programming you just get more deeper into the concept but the basics remains the same ..... If you want my advice then polish your skills by working on more and more projects and them experiment with concepts like mixing them or using parts of different concepts ( ex- combining face recognition of python and obstacle avoiding robot car of arduino.... I am working on this project ) so i guess you are getting my point
@alenwalker7362
@alenwalker7362 5 лет назад
@@Iknowpython thanks man
@yagavyagav763
@yagavyagav763 4 года назад
trying to fool us
@Iknowpython
@Iknowpython 4 года назад
Nope you trying to be over smart 😊😊
@yagavyagav763
@yagavyagav763 4 года назад
sorry i know you are a great programmer but since i had my internet down i was a bit angry so i replied a bit bad and remember i have liked the video
@siddhigolatkar8558
@siddhigolatkar8558 3 года назад
Thank you so much
@bilalsharif313
@bilalsharif313 5 лет назад
Thank you very much for your nice presentation. Which version of Python you are using. On the latest version, I am unable to install PyPdf2. Kindly guide me.
@Iknowpython
@Iknowpython 5 лет назад
hey man thank you soo much i am using python 3.6.2 .......can specify what is the error that you are getting
@bilalsharif313
@bilalsharif313 5 лет назад
@@Iknowpython >>> pip install PyPDF2 SyntaxError: invalid syntax
@Iknowpython
@Iknowpython 5 лет назад
it works for me bro......... make sure you have no space before 'pip'
@hamzamuhammadkhan
@hamzamuhammadkhan 5 лет назад
Can we use Spyder, Jupyter notebook, PyCharm for this ??
@Iknowpython
@Iknowpython 5 лет назад
it doesn't matter what ide or editor you use the program remains the same everywhere
@hamzamuhammadkhan
@hamzamuhammadkhan 5 лет назад
@@Iknowpython awesome thanks man
@gandharvkumar4538
@gandharvkumar4538 5 лет назад
why you use r before path name can u elaborate please
@Iknowpython
@Iknowpython 5 лет назад
its is string formating brother for the path definition in windows
@bhanusrinivaskoppolu4814
@bhanusrinivaskoppolu4814 5 лет назад
Bro which compiler do u use
@Banjara_boys_and_girl
@Banjara_boys_and_girl 4 года назад
Thanku brother ❤️
@chakirfri
@chakirfri 4 года назад
Works perfectly, Thank you Sir !
@Iknowpython
@Iknowpython 4 года назад
Welcome man , i am happy it helped 😊😊
@chakirfri
@chakirfri 4 года назад
@@Iknowpython i used it on a multi pages pdf and it didn't work like the example you showed any ideas bro ?
@finociasubahani6035
@finociasubahani6035 5 лет назад
Thanks bro
@Iknowpython
@Iknowpython 5 лет назад
Welcome brother
@saurabhchavan5451
@saurabhchavan5451 4 года назад
Hey bro if more than one pages Then what can do Pageobj=pdfreader.getPage(?????????)
@pythonmacho9954
@pythonmacho9954 4 года назад
@@saurabhchavan5451 you can use for loop this is a part of code which helps to read multiple pages for filename in folder: pdf = open(join(pdf_dir, filename),'rb') pdfReader = PyPDF2.PdfFileReader(pdf) for page in range(1, pdfReader.numPages): pageObj = pdfReader.getPage(page) pdfWriter.addPage(pageObj) text = pageObj.extractText()
@MrVivekc
@MrVivekc 5 лет назад
Video is informative will definitely try this. Waiting for some more stuff. Keep it up buddy 👍👍
@Iknowpython
@Iknowpython 5 лет назад
Thank you soo much
@zephird.t.3038
@zephird.t.3038 4 года назад
Very interesting. 1. How can I convert a text dialogue, assigning one voice to each character of the conversation?
@pythonmacho9954
@pythonmacho9954 4 года назад
using the library pyaudio(audio I/O library.) and this pyttsx3 help to install multiple voices by using the code engine.setProperty('voice', voices[0].id) or voices = engine.getProperty('voices')#this is a part of a code engine.setProperty('voice', voice.id) #instead of 0 you can add multiple installed voices this is the way to do it we dont need to import pyaudio but we get a error without it
Далее
Extract PDF Content with Python
13:15
Просмотров 216 тыс.
FATAL CHASE 😳 😳
00:19
Просмотров 1,4 млн
Merge, split PDF files and read text with Python
17:08
How To Read PDF Files in Python using PyPDF2
11:32
Просмотров 76 тыс.
PDF to TXT using Python
9:05
Просмотров 8 тыс.
Extracting data from PDF files using Python
35:35
Просмотров 45 тыс.