Тёмный

#9 Basic Python Data Extraction from Text Files 

Data Skills for Everyone
Подписаться 422
Просмотров 10 тыс.
50% 1

Опубликовано:

 

15 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 2   
@soccerfan5908
@soccerfan5908 7 месяцев назад
Great video, It would be nice if you could explain it with more detail the code. The only thing I couldn't figure out was how did you get the break in the lines. I type all the code but always get the result like this: ['Keyboard ', '3', 'million'], ['Note ', '11', 'thousand'], . ? What do you think can be? I'm using Visual Studio code.
@bennguyen1313
@bennguyen1313 7 месяцев назад
Not sure how to choose from the many python packages to extract data from a PDF.. PyMuPDF,PyPDF2 , PDFplumber, tabula-py, etc.. For example, what if the PDF is a scan of a paper document.. i.e. it's crooked, and quality is bad. Is there one that does it best? Or maybe I should use AI (ChatGPT + GPT4Vision/Ai PDF) to do an OCR, then have it extract the data? Also any suggestions how to get the values from specific columns in a text file. For example, I have a text file with data like this: #Time (HHH:MM:SS): 002:34:02 # T(ms) BUS CMD1 CMD2 FROM SA TO SA WC TXST RXST ERROR DT00 DT01 DT02 DT03 DT04 DT05 DT06 DT07 # ===== === ==== ==== ==== == ==== == == ==== ==== ====== ==== ==== ==== ==== ==== ==== ==== ==== 816 B0 D84E BC RT27 2 14 D800 2100 0316 0000 0000 0000 0000 CCCD 0000 817 A0 DC50 RT27 2 BC 16 D800 2120 0000 4080 3000 0000 3000 0000 0000 #Time (HHH:MM:SS): 002:34:03 # T(ms) BUS CMD1 CMD2 FROM SA TO SA WC TXST RXST ERROR DT00 DT01 DT02 DT03 DT04 DT05 DT06 DT07 # ===== === ==== ==== ==== == ==== == == ==== ==== ====== ==== ==== ==== ==== ==== ==== ==== ==== 056 B0 D84E BC RT27 2 14 D800 2100 0316 0000 0000 0000 0000 CCCD 0000 057 A0 DC50 RT27 2 BC 16 D800 2120 0000 4080 3000 0000 3000 0000 0000 How can get just the data from DT00 thru DT07 into an array, without doing lots of preprocessing to scrub out the repeating #Time headers that appear throughout the file?
Далее
#10 Python Data Extraction using Server Log Files
16:45
Extracting data from PDF files using Python
35:35
Просмотров 45 тыс.
Introducing Python in Excel
19:01
Просмотров 1,6 млн
Python File Handling for Beginners
22:40
Просмотров 15 тыс.
Lecture 1: Introduction to CS and Programming Using Python
1:03:30
Exploratory Data Analysis with Pandas Python
40:22
Просмотров 482 тыс.
Please Master These 10 Python Functions…
22:17
Просмотров 180 тыс.
#8 Generate bulk PDF reports in Python from a csv file
37:34