@@LearnMicrosoftFabric I may have missed this in your videos but do you have a section on how to show the contents of a file directly and load the most recent file (my files all have date stamps in them). I have not had any luck with os.listdir().
@@jampeauk Hi James, for file system searching you probably want to use mssparkutils which has that kind of list files in a directory functionality - I plan to cover this in my upcoming video on mssparkutils 👍
@@LearnMicrosoftFabric awesome thanks Will, looking forward to this. To provide a little extra context I would like to list the files located in my S3 Bucket which I have added as a Shortcut.
I came here for same question. That some people already asked. How to call this api for multiple cities. I watched your other videos that you used notebook to transform data and in other video scheduled in pipeline. If you can show how to call this api for multiple cities, would be a great project. You can create a playlist as a end to end project. I really like your channel, following your daily spark videos. I believe this channel will be one of the main source of fabric youtube channels.
Hey! Massive thanks! Do you’ve plans to cover any oauth based API on your system! Also how to parallelise these APIs for massive data loads! Let say you want to fetch data for 100 cities on everyday basis. Also triggers when 101st is added all those scenarios
Hi, Greats questions! Absolutely yes, I plan to do more videos about handling different auth scenarios, and also loading v big datasets with parallel reads. Watch this space :)
Thank you so much Will for your detailed instructions!!! Could you help me make an instruction to load Excel files in OneLake (specifically stored in lakehouse) into Tables in Datawahouse?
hey thanks for watching! to read excel into a lakehouse table, you can either use pandas to load into a pandas df and convert to spark df (and then lakehouse table) or you can use the pyspark.pandas library (pandas within spark) - good luck!
Great content. Thanks for explaining about different options available in Fabric. I need to load a Fact data which is a bookings data through REST API call. How to setup the loading into lakehouse for ingesting weekly updates. Do i need to start with pipeline or is there a way to start with notebook directly to load data into the lakehouse.
thanks for watching! it depends on the complexity of your api call really! if it’s simple, then you can use dataflows or data pipelines, more complex authentication or transformation will require a notebook
Amazing video, thanks for this Will! I wanted to ask if PySpark would be the most optimal choice to achieve this or if I could use SQL to achieve the same goal?
Yes you could also use SQL! The good thing about fabric is that you're free to use whichever language you are comfortable with! (well as long as it's T-SQL, Python, R, Scala or KQL)
@@LearnMicrosoftFabric Thanks for that, that's really useful to know! I guess my follow up would be whether there's any compatibility issues or limitations that I might encounter if I was to use SQL within MS Fabric?
how to run a pipeline for data copying. In fact, I have an API that uses two authentication systems: token and basic authentication (user and password). the first connection to the API (via the post method) allows you to retrieve the token which will be used afterward by the second request to execute the request itself. Is it possible to create a paper that can do the job? should I use nodebooks or is there a solution? the result of the second query will of course be stored in a lakehouse table.
Hi there, good thanks, you? In this video here I go right from end-to-end talking about extraction , storage and then visualization. Hope it helps 👍ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-hwwU8V48g-4.html
@@LearnMicrosoftFabric Will how are u? Your video are util! I have a question.. It's possible obtain data from JSON API rest and will transformate to table in a datalake? I can't execute this.. only transform in a Warehouse! Thanks!
Thanks. Please explain what is best practice to make a nested api calls and merge the results back into one json file? For example, the first api call /students - gives me a list of all students, then for each I need to make another call /{sudent_id}/courses to get their courses information. I need to save the results of all students’ courses as one json file. It’s easy to do in Dataflow, but it cannot save the results as json, only table. So what is the right way to do it in Pipeline?Thanks!
Hey it's not something I've done with Data Pipelines tbh, but might be possible with the For loop activity? If you know how to use Python, I would recommend doing this in Fabric Notebooks with the requests library - much easier to manage this kind of logic in a notebook.
Thaks. One of the main advantages in Power BI tools is low-code/ no-code. I know Python, but I we need a simple GUI low-code experience. Like a Power Query / Dataflow. I hope Pipeline can provide it @@LearnMicrosoftFabric
@@mshparber if it helps there is now a GUI which should do what you are after, do some watching/reading on "Data Wrangler" it is currently only avaliable for Pandas in Notebooks but it should be useful.
Hello Very good Tks very much. My ERP is 100% online but i can´t connect to it. I think i have all the data necessary. URL, db Name, Username Password or API.
Hey if it's 100% online and an ERP system, it's likely to have an API to connect to. Google " {ERP NAME} API documentation" and find out how to connect to it. Or if it's one of the big ERP systems, you could use a dataflow because they might have a pre-built connector for your ERP system available. Good luck