Тёмный
Soumil Shah
Soumil Shah
Soumil Shah
Подписаться
I earned a Bachelor of Science in Electronic Engineering and a double master’s in Electrical and Computer Engineering. I have extensive expertise in developing scalable and high-performance software applications in Python. I have a RU-vid channel where I teach people about Data Science, Machine learning, Elastic search, and AWS. I work as data collection and processing Team Lead at Jobtarget where I spent most of my time developing Ingestion Framework and creating microservices and scalable architecture on AWS. I have worked with a massive amount of data which includes creating data lakes (1.2T) optimizing data lakes query by creating a partition and using the right file format and compression. I have also developed and worked on a streaming application for ingesting real-time streams data via kinesis and firehose to elastic search
Комментарии
@MrLeanhduclk14
@MrLeanhduclk14 5 часов назад
HI Soumil, Perfect solution Soumil. So my case, I have some streaming tables same like your demo, and after landing on S3, how can I do join them for further real time analytics ? Can Flink do it by select data from Sink table and join each other for further analytics ?
@DamosyTheFreckle
@DamosyTheFreckle 6 часов назад
nope doesn't work, don't waste your time
@shyamgurunath5876
@shyamgurunath5876 8 часов назад
You will reach more heights soumil… will be there to watch ❤
@SoumilShah
@SoumilShah 5 часов назад
Thank you sir
@surajbhardwaj2599
@surajbhardwaj2599 8 часов назад
Sir you are amazing. Thanks for the content...
@SoumilShah
@SoumilShah 8 часов назад
So nice of you
@employedgorilla
@employedgorilla 9 часов назад
You deserve it bro
@SoumilShah
@SoumilShah 9 часов назад
Thanks❤
@emonymph6911
@emonymph6911 8 часов назад
@@SoumilShah you're welcome. please make a video on where in the stack we should build data objects e.g. metadata layer or somewhere else. The idea is if we have to replace tech X with Y when X is outdated and Y is new and improved processing speed, how can we keep our tables in-tract and unchanged (assuming the storage layer remains unchanged)? full object rewrite is not fun.
@SoumilShah
@SoumilShah 8 часов назад
@@emonymph6911sure thing !!
@SachinShukla230187
@SachinShukla230187 11 часов назад
Amazing, I have good experience in Python but no video gave me the right insight or interest to understand these patterns thank you Soumil because of you I have learnt these things otherwise I was running away here and there.....
@SoumilShah
@SoumilShah 8 часов назад
Thanks a lot Really thank you I mean it
@MrHatemfaheem
@MrHatemfaheem День назад
gtihub link not working
@electricalsir
@electricalsir День назад
essentially enjoyed
@KartikGautam
@KartikGautam 2 дня назад
Hi Soumil, I am unable to access the pdf can you help me with that. Thanks
@electricalsir
@electricalsir 3 дня назад
good
@melojuan
@melojuan 3 дня назад
what a legend!
@harivigsp7934
@harivigsp7934 4 дня назад
can you please put a video on iceberg DR?
@rigseoservice
@rigseoservice 4 дня назад
very annoying to watch. frequent switching between windows very stressing
@martingregson7136
@martingregson7136 5 дней назад
Do you bowl as fast as you talk?
@sarathju3867
@sarathju3867 6 дней назад
Thanks for posting this ❤ it
@SoumilShah
@SoumilShah 6 дней назад
Thank you sir
@chandini766
@chandini766 6 дней назад
Hi Soumil, Thank you for your detailed videos. Could you point to any resource that can help setup the IntelliJ for pyspark?
@4BroGame
@4BroGame 6 дней назад
Hey bro I cloned a website and now I am opening that website code in vs code editor but after doing the necessary editing only text is changing not the images. Like I am putting my image URL on the place of website image URL but after saving it and opening it with live server the preview is showing me the images of cloned website not mine and in inspect element it is showing the image code of cloned website not mine why. I am trying from 6 hours and nothing is works for me. Will you plz tell me how can I change the images and edit it.
@Vamsikuruva-d8b
@Vamsikuruva-d8b 6 дней назад
After installing, when I try to run elasticsearch.bat file it is showing error like \Java\jdk-21.0.1 was unexpected at this time. But my jdk and java bin folder paths setted correctly in environment variables
@prasantkumarsrivastava5925
@prasantkumarsrivastava5925 7 дней назад
yes, please slow down yourself in every respect pls
@krishnendudas8573
@krishnendudas8573 7 дней назад
Thanks for the video. It's a good one. Do you have any samples related to the scenario where we have to read the Avro data from a Kafka topic and upsert into the Hudi tables?
@BabaiChakraborty-ss8pt
@BabaiChakraborty-ss8pt 7 дней назад
amazing work @soumil. Thanks
@debmidya411
@debmidya411 7 дней назад
Hi Soumil, thanks for the video. Using openjdk 11 and Python 3.8. I can't see the table printed when run 'Creating Dataframe from List of Tuples'. Used Jupyter notebook as well as VS code Editor. Any idea.
@IleniaQuintero
@IleniaQuintero 9 дней назад
Hello, I was looking at your video channel. We may be helping a company that uses secure images to increase supply chain security and help cloud native development. Would you be willing to help try their software, make a video, and help show devs how to use their tools? This is not an offer, but just to start a conversation about your willingness to take on sponsorship. Please provide me with your email if you are interested. You'd have a chance to look at their technology and decide if it's the type of software that you'd be interested in covering in your channel.
@MrTejasreddy
@MrTejasreddy 9 дней назад
hello soumil data,schema assigned to a dataframe but when i used df.show() i am getting error..
@SoumilShah
@SoumilShah 9 дней назад
What error
@saurabhshinde7135
@saurabhshinde7135 9 дней назад
Thanks for the great content man. Can you please re-upload the Lab link as repository is not accessible
@DotCreatorOfficial
@DotCreatorOfficial 9 дней назад
aviter game hack video plese
@louisadibe3189
@louisadibe3189 9 дней назад
cool video content
@prasanthvegesna2306
@prasanthvegesna2306 10 дней назад
Hi Soumil , Thank you for the video. I know there are certain catalogs available now for iceberg . In this case we utilizing glue as catalog or one table as a catalog? Also to automatically or incrementally sync data into iceberg table we have to event based trigger process to run that Java command?
@sayedsamimahamed5324
@sayedsamimahamed5324 10 дней назад
Where is the concept for CDC?
@world52love
@world52love 10 дней назад
how to handle zero values in csv file and how to fill those values
@worldtour666
@worldtour666 10 дней назад
@Soumil, How to run glue notebook ondocker container? Please refer any video?
@xyz-jn4oj
@xyz-jn4oj 10 дней назад
How to handle if secret manager has rotation in python?
@JuanMa-lv7bd
@JuanMa-lv7bd 11 дней назад
Thanks for the video, I noticed that the error logs were not marked as error by datadog, any idea on how to do that? I'm trying to send an artificial error to see if I can create a notification when something fails but datadog always mark them as INFO logs
@andriifadieiev9757
@andriifadieiev9757 11 дней назад
Thanks for sharing! For the future video, same with UntyCatalog maybe?
@SergeyTarabara
@SergeyTarabara 11 дней назад
Soumil thank you for the video! Is the same thing possible, but with the Iceberg format?
@johnnydrumgole8476
@johnnydrumgole8476 11 дней назад
Hello im having trouble with creating the connection
@selmaiilonga6262
@selmaiilonga6262 12 дней назад
Hi please provide the link to for the smart library
@SergeyTarabara
@SergeyTarabara 12 дней назад
We need the same thing about Iceberg. Thx Soumil!
@PikaPikaKatzy
@PikaPikaKatzy 13 дней назад
Could not create pixmap from (the path)
@MrSkelver
@MrSkelver 13 дней назад
thank you so much
@isharkpraveen
@isharkpraveen 14 дней назад
Why did u hashed? You can directly remove the duplicates by .dropDuplicates right?
@amitkhandelwal8030
@amitkhandelwal8030 14 дней назад
Hi Soumil you have not given any configuration to do in airflow.cfg did the solution you give will work when we want to parallelise multiple task inside a dag and parallelise multiple dag ? Other people are giving solution like change the database to mysql or postgress and chnage executer to LocalExecutor what do you think about these solutions?
@ismail3035
@ismail3035 14 дней назад
We dont really need to attach ticketId to user as user can exist without a ticket. Also for fetching all the tickets associated with a given user, we can use userId GSI on ticket model
@robstuckey
@robstuckey 14 дней назад
awesome video! thanks for sharing. +1 sub
@Levy957
@Levy957 15 дней назад
you are a god
@electricalsir
@electricalsir 16 дней назад
thanks soumil
@ashokjangam7329
@ashokjangam7329 17 дней назад
@soumilshah thanks for your informative video, but the link you have given in description for pdf files is not working. could you please update that with right url.
@IndianSumaira
@IndianSumaira 17 дней назад
Sir, can i store and unzip these docs in other drives and not in c:?
@rommel23nb
@rommel23nb 18 дней назад
Thanks Mr. Shah--- I used these commands to prepare a cheat sheet for data cleaning--- regards