Тёмный

Process HUGE Data Sets in Pandas 

NeuralNine
Подписаться 358 тыс.
Просмотров 39 тыс.
50% 1

Today we learn how to process huge data sets in Pandas, by using chunks.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: www.neuralnine.com/books/
💻 The Algorithm Bible Book: www.neuralnine.com/books/
👕 Programming Merch: www.neuralnine.com/shop
🌐 Social Media & Contact 🌐
📱 Website: www.neuralnine.com/
📷 Instagram: / neuralnine
🐦 Twitter: / neuralnine
🤵 LinkedIn: / neuralnine
📁 GitHub: github.com/NeuralNine
🎙 Discord: / discord
🎵 Outro Music From: www.bensound.com/

Наука

Опубликовано:

 

11 окт 2022

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 42   
@Open5to6
@Open5to6 6 месяцев назад
I can't always follow everything he says, cause he moves pretty quick and throws a lot at you, but he's always straight to the point, no fluff, and innovative. I always glean more things to look up after hearing it from NeuralNine first.
@aniv6346
@aniv6346 Год назад
Thanks a ton ! This is very helpful !
@leythecg
@leythecg Год назад
wie immer top content perfekt präsentiert!
@goku-np5bk
@goku-np5bk 7 месяцев назад
why would you use csv format instead of parquet or hdf5 for large datasets?
@csblueboy85
@csblueboy85 Год назад
Great video thanks
@Ngoc-KTVHCM
@Ngoc-KTVHCM 7 месяцев назад
In excel file, method "pd.read_excel" has no parameter "chunksize", how to handling the big data in many sheet in excel? Please help me!
@wildchildhep
@wildchildhep Год назад
it works! thanks!
@JuanCarlosMH
@JuanCarlosMH Год назад
Awesome!
@lakshay1168
@lakshay1168 Год назад
Your explanation is very good can you do a video on the Python project that else the position of an eye
@franklimmaciel
@franklimmaciel 4 месяца назад
Thanks!
@siddheshphaple342
@siddheshphaple342 10 месяцев назад
How can I connect database in python, and how to optimise it if I have 60L+ records in it
@tcgvsocg1458
@tcgvsocg1458 Год назад
i was litteraly watch a video when you post a new video...i like that!(8)
@ramaronin
@ramaronin 3 месяца назад
brilliant!
@maloukemallouke9735
@maloukemallouke9735 8 месяцев назад
thanks but how you deal with depending row like times series data or observations like text where context correletead to row?
@mainak222
@mainak222 13 дней назад
I have the same question, do you have an answer?
@thisoldproperty
@thisoldproperty Год назад
I like the simplicity. Wonder if a similar thing could be done with sql queries given they usually store incredibly large datasets.
@jaysont5311
@jaysont5311 Год назад
I thought I read that you could, I could be wrong tho
@mikecripps2011
@mikecripps2011 8 месяцев назад
Yes, do it all day long. I read 2.5. billion records a new level for me this week on a wimpy PC. I chunk it by 200 K Rows normally.
@nuhuhbruhbruh
@nuhuhbruhbruh 7 месяцев назад
@@mikecripps2011 the whole point of SQL databases is that you can directly manipulate arbitrary amounts of data without having to load it all in memory though, so you don't need to do any chunking, just let the database run the query and retrieve the processed output
@TomKnudsen
@TomKnudsen Год назад
Thank you.. Could you please make a tutorial on how you would stip out certain elements from a file that is not your typical "list", "csv" or "json".. Find this task to be the most confusing and difficult things you can do in Python. If needed, I can provide you with a text file which include information about airports such as runways, elevation, etc. Perhaps there are some way to clean such file up or even convert it to a json/excel/csv etc.
@lilDaveist
@lilDaveist Год назад
Can you explain what you mean? List is a data structure inside Python, csv is a file format (comma separated values), and json is also a file format (JavaScript Object Notation). If you have a file which incorporates many different ways of storing data you have either manually or in a script way copied a file line by line and pasted it in another file.
@kavetisaikumar
@kavetisaikumar Год назад
What kind of file are you referring to here?
@artabra1019
@artabra1019 Год назад
OMG tnx im trying to open csv file with million data then my pc collapse so i find some i9 computer with 16gb ram to open it thanks now i can open big files using pandas.
@uzeyirktk6732
@uzeyirktk6732 Год назад
how we can further work on it. Suppose if want to use groupby function on column [ 'A '].
@15handersson16
@15handersson16 8 месяцев назад
By experimenting yourself
@FabioRBelotto
@FabioRBelotto Год назад
Can we use each chunk to spawn a new process and do it in parallel?
@Supercukr
@Supercukr 22 дня назад
That would defeat the purpose of saving the RAM
@tauseefmemon2331
@tauseefmemon2331 Год назад
Why was the RAM increasing? should not it stop increasing once the data is loaded?
@thisoldproperty
@thisoldproperty Год назад
It takes a while to load 4GB into memory. So the shown example was during the process load.
@vishkerai9229
@vishkerai9229 6 месяцев назад
is this faster than Dask?
@hynguyen1794
@hynguyen1794 7 месяцев назад
i'm a simple man, i see vim, i press like
@pasqualegu
@pasqualegu Год назад
all workеd
@driouichelmahdi
@driouichelmahdi Год назад
Thank you
@ashraf_isb
@ashraf_isb 3 месяца назад
1000th like 😀
@hkpeaks
@hkpeaks Год назад
Benchmark (Pandas vs Peaks vs Polars) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-1Kn665ADSck.html
@imclowdy
@imclowdy Год назад
Awesome! First comment :D
@wzqdhr
@wzqdhr 2 месяца назад
The hard part is how to append the new feature back to the original dataset without loading them in one shot
@MegaLukyBoy
@MegaLukyBoy Год назад
Is pickle better?
@WilliamDean127
@WilliamDean127 Год назад
Still would load all data at one time
@RidingWithGerdas
@RidingWithGerdas Год назад
Or with really huge datasets, use Koalas, interface is pretty much the same as pandas
@Zonno5
@Zonno5 Год назад
Provided you have access to scalable compute clusters. Recently Spark got a pandas API so koalas has sort of become unnecessary for that purpose.
@RidingWithGerdas
@RidingWithGerdas Год назад
@@Zonno5 talking about pyspark?
Далее
Help Barry And Barry Woman Scan Prisoners
00:23
Просмотров 3 млн
Викторина от ПАПЫ 🆘 | WICSUR #shorts
00:56
Makefiles in Python For Professional Automation
13:43
1 billion row challenge in Rust using Apache Arrow
9:12
Debugging 101: Replace print() with icecream ic()
12:36
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Проверил, как вам?
0:58
Просмотров 310 тыс.