Thank you for watching this video! I have tried to cover entire Data Engineering as a concept which can be useful for experts to beginners! Btw, thanks for 4K subs! Our community is growing strongly ♥
This was informative and I can totally relate it. I worked as a report programmer and now as an analyst and also involved in ad hoc model development request. I thought algorithms can do magic😅I spent mostly 6 months to learn basic ML concepts but when I got my hands wet on model building I realised that it’s the data which does the magic mostly and without a quality data no model can survive. Now I’m learning DE concepts which I neglected earlier although I use hive every day for my work.😢 Can you please make a separate video on different job families inside DE ? and also any tips for analysts or some one who is already in data field and wants to become cloud data engineer?
Thanks Suraj! You can learn tech like Spark, Snowflake, BQ etc in your free time if you already have worked in Hive, it will give you a good kickstart 😊 and yes it is a great suggestion. I will add it in the backlog!
I have been following your channel from the beginning and I cant thank you enough for the valuable information/perspective you put into the world. I am extremely excited for our meeting on Monday through Topmate! -Chris
Thank you very much for clearing fundamental concepts of data engineering very comprehensively. Your videos are far and far better than other youtubers like darshil parmar, learning bridge etc who have never explained in layman's term and always tried to explain as fast as possible (from my point of view). Keep it up
Thank you so much! These guys have been on RU-vid for a long time and kind of feels like an achievement to be even compared with them! I generally try to focus on improving the quality of my own video with every upload 😊 Thanks for watching and liking this content!
Yes, there are always new things to learn. For example, I mostly worked on aws and azure but now working in GCP. Slowly I am also doing a little bit of ML engineer work even as a DE. Learning is great in this field. New concepts like Data mesh and tools like immuta and starburst are hot topics to learn.
@@JashRadia Yes because I've graduated this year so i was not having idea about it. I joined cognizant as a big data and pyspark role. Slowly learning tools and technologies. Btw thanks for answering the question.
is data engineering tasks repetitive? I find the cleaning data exercises mundane.Maybe DE is more than just cleaning data. What are the interesting DE tasks in your opinion?
hi this is a very informative video .can you say any websites to practise the real time projects and daily how many hours we need to practise to become a data engineer can NON IT student can become a data engineer
Thank you for watching it! For all this, checkout my data engineer roadmap with links of resources and projects. For a Non IT student, you need to spend more time in the prerequisites section mentioned in that video. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-WgCavqDntlQ.html
Very nicely explained... If possible, can you please make a video on how to approach Data Engineering career for Database developers and DBAs (Oracle, Postgres, Sybase or any other) ?
Thank you and good suggestion! For DBAs, learning DE concepts becomes a little easier since they might have experience with SQL and DBMS or DWH concepts. It can give you a boost. You can skip those sections in my roadmap..
@@JashRadia hello sir, I am really confused at my career path.please please kindly help me. I am in cognizant. Jr software engineer. And doing basic devops pipeline work. I don't want programming and coding like development team , where should I go in devops or data engineering as I feel DE is hectic , more heavy duties, tough to learn. I know power bi, tableau ,SQL ,python. Where should I head.. please help me
Pretty much great video a about DE and history.....DE has been there since long time and hope you start the separate playlist for DATA Engineering and where you can explain and few practical stuff and also You can come up with the industry live use cases for the same.
You can refer to this course for DWH. www.udemy.com/course/data-warehouse-fundamentals-for-beginners/ I have seen a lack of courses on Data Modeling especially related to questions that are generally asked in the interviews. I am creating such course and will soon launch it when it is done.
Jash the good thing about u is that u explain thing's well and also put the links for us reference in all video's.. Ur DE roadmap is my fav video... Love your content 🙂
@@JashRadia Google is my dream company... And u r one of the reason I'm switching to data engineering... Presently I'm working as ETL MDM datastage developer from 1 year... Hope I get a chance to crack google interview.. Keep posting such great insightful videos jash... U r motivating many like me..
Liked and Subscribed, liked the guy who got enlightened, add such things more often at the right time and place, for the audiences to remember apart from the excellent analysis of data definitions across the board, Liked and Subscribed @Jash
How we can start side projects in data engineering. Where we can connect to extract raw data except web scrapping?? How we can design near real time data pipelines same as we use in projects in companies??
Checkout websites like project pro for such projects. Try the websites like data.world/ and kaggle for datasets. You can also search for standard datasets like TPC-H from relational dataset repository relational.fit.cvut.cz/dataset/TPCH For real time data, there are multiple Open source APIs available here: www.programmableweb.com/category/real-time/api
Yes and it is easy to get a data engineering job at a mid scale or service based company as a fresher. After some time you can switch to MAANG level companies too.
Hi Jash, That was truly informative and cleared the things which one should bag to start applying for Data Engineer roles. However, I have a question. In what hierarchy should one learn the following topics to become good in the data domain :- Python (Beginner to Advance), Machine Learning, Cloud Computing, Deep Learning, AWS/GCP/Azure, AI/Deep Learning, System Design, DBMS/Distributed Systems, DSA (if needed) Thank you.
Thank you and for hierarchy, we should follow this: SQL -> Python (DSA) -> Spark -> DWH and other data concepts -> Cloud -> System design -> Docker -> ML
Thanks alot for this video. It was really informative. Just one question- As a data engineer which language shall i pick for spark.. Is it python or scala. As people say python is good if you want to go in ML related things. But scala is good for hard core data engineering work. Just wanted to know your thoughts on this. Thanks in advance.
Thank you so much and I prefer Python because 1) libraries related to Data 2) Use cases outside of Spark 3) More job postings 4) Easily integrated with cloud connectors like Snowflake which is not available in scala Yes, scala is faster in terms of performance but PySpark is getting better with every version. Python use cases overall outside spark will beat scala anyday. So I'd prefer Python.
Man ! I love your videos. so crisp , point to point and informative. I am trying to switch into DE from Mainframe Analyst. While preparing I figured out from my own that there are different kind of DE roles, as you mentioned in this video as well. In 7 years of my IT expreience, I have worked on SQL, Power BI,Visuzulations, Data Analysis, Databasees, python. Apart from this in past 4 months i learend Snowflake, Airflow, Big Query,basics of pyspark,GCP certification,GCP hands on Labs. Now i am little bit stuck and confused when i see JD of jobs. My skills are matching only 50% of what i have learned or sometimes companies are looking for extra skills as well. What to do in this condition? Where to focus more or what mistake i am doing? Would be great if you can guide me man!
Thank you and 50% is fine. Try figuring out what is the most common skill missing. Work on that. Even if JD matches 70%, you should apply. Don't wait for 100% match.
@@shivendrakhare1583 bhaiya yaar we need to every here and then upgrade our self in IT n .. Also do any guy need to remember the first work he/she has done in starting of his/her career as I am afraid I am least occupied in learning and remembering things.
sir i am fresher in college ,i wanted to pursue my career as data engineer ,what are the online platform that you suggest me to take courses,i tried ibm's data engineering but it was boring and i need a course which is interactive
Try websites like project pro and then learn different services in cloud platform. every system design data pipeline question in interview can be different. You just have to figure out when to use what services. This comes with practice and knowing usecases of cloud services.
@@JashRadia Thank you very much for reply. I lost my job this month. Giving lot of interviews. I have 3.8 years of experience in sql ,Python ,spark , airflow ,bigquery , snowflake , dbt etc. Still I am getting blank in system design questions. Hope you will make video of this topic as well.
Hi Jash, I have started working on spark, and I want to learn about internals of spark , like how executors, cores, partitions, jobs, stages, tasks and how they are created when I run a spark job (with several joins and aggregation). I am able to see these in Spark UI but not able to understand how the no.of jobs,stages created each time. I would appreciate if you could suggest any blog, video or courses for the same because the only example i find on the internet is the word count problem. I would also suggest you to write a blog or make a video on this topic because it is not explored much. Great video btw.. i was able to relate to most of what you said. #YNWA.
Thanks a lot Kaushik and yes, I get your question. I am creating a course on DE for a website which will cover all these things. You can also book some time with me on topmate to get even better idea. Apart from that, you can refer to these courses/articles: medium.com/analytics-vidhya/spark-ui-c7f2ca9ef97f www.databricks.com/session/deep-dive-into-monitoring-spark-applications-using-web-ui-and-sparklisteners
Yes, 100%. DE will and always be a prerequisite for DS. Only tools and technologies will change. Checkout this post I did on LinkedIn for this question recently. www.linkedin.com/posts/jashradia_bigdata-dataengineering-data-activity-6995981582578642944-36SG?
@@Nick-du9ss we don't need to store or process PII data. Proper anonymization with masking and tokernization needs to be done. This is handled in today's environment as well and same can be implemented in web3. Don't see what's so different about that?
Sir I had learned fundamental but have some series projects in mind based on data Science A. I predictable application using cogintive service based such cv. Which way i sould go through.
Hello sir I am a teacher of English age 30 l have been trying to make a transition into this domain is it really possible if I devote 1 year for building my skills on python Hadoop SQL etc
Computer science all the way. Data analytics is a topic that shouldn't have its own degree. You should learn it from websites like Coursera or udemy or RU-vid. Doing a specific degree in it doesn't make sense. CS will be useful no matter where your career takes you including data domain.
I believe Python has more scope outside of big data world too. And also, data libraries make python simple to use. In terms of job openings, Python is way ahead than java plus scala
Checkout the roadmap on my channel. It will make you good in all those areas from basic. Then you have to double down on the field you are interested in
I have already created a video about system design. And DSA is nothing special. Only intermediate level skills or problem solving questions are required from leetcode etc.
Learning Python basics is key. Because as a DE you will be using it alot. Then gain some knowldge on basic data libraries like pandas, numpy etc. Learning ML related python is optional but good to have. Scikit would fall in this category.
Start with the roadmap that I have posted on the channel and start getting DE experience in any company. Once you have enough skills and experience, your non tech degree won't matter and then you can apply to google.
Most people believe that data engineering is all about tech. But I think we also have to be business centric. In the intro, I mentioned Finding out important metrics- this can help on creating the data model before creating the data pipeline Transforming and aggregating data- this is one of the core skills for DE Finding out insights - I agree this point falls more on analyst side but if we understand data enough to figure out what went drastically wrong when it did, it can reduce the debugging time in the pipeline, too. Again, these are only my views.
Hi Rahul, please checkout my DE roadmap video link is in there in the description. It applies to everyone. You can skip the parts you already know. If you need more 1:1 help, feel free to book a time with me on Topmate. Link is in the description.
Depends on the cloud platform you are working. For GCP, Professional Data engineer is recommended. Also, for general spark certification, databricks certification has a lot of value. Developer associate one.
@@JashRadia thank you for the quick response. Please let me know if you can mentor me in landing at Google or other product based organization. I have 3 years of experience in aforementioned work at ADP Hyderabad
@@pavankumar-ni3my yes I can. I have been doing this with about 15 more people on topmate on regular basis. Feel free to book a slot with me. Link is mentioned in the description.
Wow that was a great eye opener, loved your content coz as u mentioned for me DE was like pipelines, tools, etc... I would really like to hear about the points u mentioned here. How can I build my approach trying to achieve them? Any roadmap, some content that helps me understand it deep dive? 1.Figuring out important metrics in data. 2. Finding out insights. 3. Making recommendations based on historical data points.
Thank you, Kunal! 😊 these skills are much harder to master compared to technical skills. For them, we also have to understand the business context and goal of why we are building the data pipeline? What question are we trying to answer? You have to think from an analyst or a product owner's lenses to get this business value. Your experience will also help understanding this viewpoint over time. I have a technical roadmap on the channel but this will help you from technical point. Business domain knowledge is also a must no matter where you work. Be it healthcare, finance or something else.
Very little use of both I would say. Javascript can be used somewhere like writing stored procs or some container based apps otherwise not required. You don't need react js at all
how much time required for learning data engineer? To land a internship or job (at least 5lpa), I know python, SQL, and dbms. I am in 3rd year information Technology branch.
@@JashRadia thank you! One more question how much DSA i have to know for data engineering ? And which level of problem solving (Easy or medium of leetcode).
@@longliveindia1637 Data engineers and Data scientists generally earn significantly more than Data Analysts. But data analytics is a good way to enter data world from a non tech background.