I earned a Bachelor of Science in Electronic Engineering and a double master’s in Electrical and Computer Engineering. I have extensive expertise in developing scalable and high-performance software applications in Python. I have a RU-vid channel where I teach people about Data Science, Machine learning, Elastic search, and AWS. I work as data collection and processing Team Lead at Jobtarget where I spent most of my time developing Ingestion Framework and creating microservices and scalable architecture on AWS. I have worked with a massive amount of data which includes creating data lakes (1.2T) optimizing data lakes query by creating a partition and using the right file format and compression. I have also developed and worked on a streaming application for ingesting real-time streams data via kinesis and firehose to elastic search
HI Soumil, Perfect solution Soumil. So my case, I have some streaming tables same like your demo, and after landing on S3, how can I do join them for further real time analytics ? Can Flink do it by select data from Sink table and join each other for further analytics ?
@@SoumilShah you're welcome. please make a video on where in the stack we should build data objects e.g. metadata layer or somewhere else. The idea is if we have to replace tech X with Y when X is outdated and Y is new and improved processing speed, how can we keep our tables in-tract and unchanged (assuming the storage layer remains unchanged)? full object rewrite is not fun.
Amazing, I have good experience in Python but no video gave me the right insight or interest to understand these patterns thank you Soumil because of you I have learnt these things otherwise I was running away here and there.....
Hey bro I cloned a website and now I am opening that website code in vs code editor but after doing the necessary editing only text is changing not the images. Like I am putting my image URL on the place of website image URL but after saving it and opening it with live server the preview is showing me the images of cloned website not mine and in inspect element it is showing the image code of cloned website not mine why. I am trying from 6 hours and nothing is works for me. Will you plz tell me how can I change the images and edit it.
After installing, when I try to run elasticsearch.bat file it is showing error like \Java\jdk-21.0.1 was unexpected at this time. But my jdk and java bin folder paths setted correctly in environment variables
Thanks for the video. It's a good one. Do you have any samples related to the scenario where we have to read the Avro data from a Kafka topic and upsert into the Hudi tables?
Hi Soumil, thanks for the video. Using openjdk 11 and Python 3.8. I can't see the table printed when run 'Creating Dataframe from List of Tuples'. Used Jupyter notebook as well as VS code Editor. Any idea.
Hello, I was looking at your video channel. We may be helping a company that uses secure images to increase supply chain security and help cloud native development. Would you be willing to help try their software, make a video, and help show devs how to use their tools? This is not an offer, but just to start a conversation about your willingness to take on sponsorship. Please provide me with your email if you are interested. You'd have a chance to look at their technology and decide if it's the type of software that you'd be interested in covering in your channel.
Hi Soumil , Thank you for the video. I know there are certain catalogs available now for iceberg . In this case we utilizing glue as catalog or one table as a catalog? Also to automatically or incrementally sync data into iceberg table we have to event based trigger process to run that Java command?
Thanks for the video, I noticed that the error logs were not marked as error by datadog, any idea on how to do that? I'm trying to send an artificial error to see if I can create a notification when something fails but datadog always mark them as INFO logs
Hi Soumil you have not given any configuration to do in airflow.cfg did the solution you give will work when we want to parallelise multiple task inside a dag and parallelise multiple dag ? Other people are giving solution like change the database to mysql or postgress and chnage executer to LocalExecutor what do you think about these solutions?
We dont really need to attach ticketId to user as user can exist without a ticket. Also for fetching all the tickets associated with a given user, we can use userId GSI on ticket model
@soumilshah thanks for your informative video, but the link you have given in description for pdf files is not working. could you please update that with right url.