To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/JohnWatsonRooney . The first 200 of you will get 20% off Brilliant’s annual premium subscription.
Great tips! A suggestion as an addition, I would add to help with the potential memory issues of all the data frames is to import gc, and in your processing loop add: del [[df]] gc.collect This will delete and remove from memory the df dataframe since you've already appended it to data in each iteration. This has saved me several times with memory issues. Hope this helps!
thank you for yet another highly-applicable video John. just out of curiosity since you made this video and I know you are more well-versed in data extraction, do you plan to make anything else about excel and data manipulation :D ?
I usually find your videos top-notch, but this case can also be permanently solved with Power Query in a maximum setup time of 5 minutes... looking forward to your next video, cheers.
he's just giving an example of the many types of automations that can be done with python. If you just need simple data processing, power query will do the job. But when you need to do something more complex, like getting some information from the web in real time, sending several emails with reports, etc.. power query will be useless
@@samuelmartins82 Agreed, but here's the context as described in the title: it's a simple Excel task and not complex. Sure, it can be solved with Python, but I don't agree with the argument about wasting time in this context. This recurring task is solved faster and, in the long run, more efficiently with Power Query. Nonetheless, it's interesting because the presented solution could be the beginning of a slightly more complex solution.
@@oOBenictusOoAgree, these techniques are ‘building blocks’ for much more involved and often fully bespoke ground-up solutions. It can be considered foundation knowledge. However, as you rightly say, simple ETL data operations are best met with optimised tooling, such as PowerQuery. It also avoids creating solutions which require expensive dev’s to support.
I'm sure you know this by now, but the Polars read_csv method supports glob patterns, so the loop approach is unnecessary - you can simply pass _folder + "/*.csv"_ as the source parameter and the concatenation will be done automatically.
Great job john thanks for this content. I think your channel may be grow more. If you make and put another types of content like django, flask, javascript like that.
Thank you the video. These simple tutorials help people like to grasp some of the rudiments of the learning a programming language like python. BTW could you share the like to your dot files for nevoim. Would like to use of the stuff seen in your nevoim and figure out how to do it
Great vid. Can you tell me why, after you have appended the files in the loop (which I assume merges them together within the list) you then concatenate them into a new data store (ie the result)? Aren’t you combining them twice? Thanks
The variable 'data' here is not a dataframe. It's a Python list which contains multiple distinct dataframes (one for each file in 'files'). pd.concat is used to append them all into a single dataframe object ('result').
Good intro on how to do this in Python, but it is not faster than Excel in any way - neither first nor subsequent runs. A video demonstrating this in Excel would take less than 1-2 minutes.
@crystalcleargirl07 well I don't know what you do for a living. I am a CPA working for a large hospital in CA, and we only used Excel. What important job do you do that requires you to use Python
i get it, but i've been doing this for a living for the last 15 years, and I know both VBA and Python and it's just impractical to go for Python in 95% of cases. Also, PowerQuery can go above and beyond the excel row threshold and accomplish the same thing that was done here@@kexec.