Тёмный

What tools should you know as a Data Engineer? 

Kahan Data Solutions
Подписаться 42 тыс.
Просмотров 65 тыс.
50% 1

Опубликовано:

 

3 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 70   
@KahanDataSolutions
@KahanDataSolutions 2 года назад
Get my Modern Data Essentials training (for free) & start building more reliable data architectures www.ModernDataCommunity.com
@DaddyShegz
@DaddyShegz Год назад
Hi. I tried the link but it says "forbidden". Is there another way to access the pdf? Thanks
@mrviper3344
@mrviper3344 Год назад
All the names of the tools talked in the video: *Coudbased db Amazon Redshift Google BigQuery Snowflake Azure Synapse *Traditional row-based db SQL Server MySQL PostgreSQL *NoSQL db MongoDB elastic cassandra cosmosDB amazon DynamoDB *Extract & Load Batch Fivetran Stitch Airbyte Azure Datafactory Amazon Glue *Streaming Apache Kafka Amazon Kinesis *Transform dbt - data built tool *Reverse ETL Census hightouch rudderstack *Version Control & automation GitHub GItLab CI/CD *Task Orchestration & Scheduling Apache Airflow Jenkins Luigi *Infrastructure Management Terraform Ansible *Containers Docker *Container Orchestration Kubernetes *BI & Analytics Reporting Power BI Tableau Looker *Open Source Metabase *Spreadsheets
@r.c.r7308
@r.c.r7308 4 месяца назад
Or just turn on subtitles ^ ^ but thanks for the effort :D
@TNTsGOboom
@TNTsGOboom 2 года назад
You have a new subscriber! I love the way you explain data engineering. You and Seattle Data Guy are my faves when it comes to Data Engineering Content Creators.
@KahanDataSolutions
@KahanDataSolutions 2 года назад
Thanks Turk! Much appreciated
@adamo1262
@adamo1262 2 года назад
I'm really interested in this field and currently leaning Python. I must say this list is great but I'm really overwhelmed by the amount stuff one has to learn to transition in this field! I'm gonna stick with it and hopefully come through from the other end 😁
@KahanDataSolutions
@KahanDataSolutions 2 года назад
Definitely stick with it! One thing to remember is while there are many tools, you don't need to know ALL of them to have a successful career and you also don't need to learn all at once (it takes a whole career to do that). Here is a recommendation to help you get started: 1. Start with getting very comfortable w/ SQL (and/or Python if you'd like) 2. Learn more about data modeling techniques (ex. dimensional modeling, star schema) and the way data typically moves (ex. ETL vs ELT) 3. Pick a common database to study and practice on (ex. Snowflake or SQL Server) 4. Learn how to use a tool like dbt to transform data within those databases which also will show you other important concepts like Version Control 5. Pick a data visualization tool (ex. Power BI or Tableau) and use your transformed data to make a cool dashboard 6. Pick another part of the process (ex. Extract tools, scheduling tools, etc.) and keep adding to your skillset Good luck!
@adamo1262
@adamo1262 2 года назад
@@KahanDataSolutionsI really want to thank you for this thoughtful response and the road map provided. I honestly didn't expect this swift response and it shows that you love what you do! I will defo stick with it and hopefully make a successful career out of it. Thanks again 💪🏿
@Agnostic080
@Agnostic080 2 года назад
@@KahanDataSolutions this is a pretty good list! You could probably even do a video talking about this process
@splashoui3760
@splashoui3760 2 года назад
@@KahanDataSolutions Thank you for your extra detailed explanation to Adam 1. I would like to ask that this video would be more helpful for senior people who is deciding what their companies should use depend on their business case and requirements?
@splashoui3760
@splashoui3760 2 года назад
And about the spreadsheets part, you are def right. We are using Google spreadsheets and using python to automate the process to write our outputs there.
@hamsansari2111
@hamsansari2111 2 года назад
Yesterday I said in your post That its overwhelming with so many tools and today got a video :D
@KahanDataSolutions
@KahanDataSolutions 2 года назад
I got you! You're definitely not alone in that feeling so I figured it'd be a good topic for a video
@ZawmyoHtet-lg7jn
@ZawmyoHtet-lg7jn 11 месяцев назад
This is really helpful, Bro. Thanks a lot.
@robertoferro8512
@robertoferro8512 2 года назад
What an absolutely power video. Please keep such good content coming!
@KahanDataSolutions
@KahanDataSolutions 2 года назад
Much appreciated! Thanks for watching
@AlexKashie
@AlexKashie Год назад
You’ve got a new subscriber. Thank you
@KahanDataSolutions
@KahanDataSolutions Год назад
Thank you!
@nickriebe245
@nickriebe245 2 года назад
Phenomenal video. What tool(s) do you recommend for documentation and/or data dictionaries?
@cloveravalon444
@cloveravalon444 Год назад
It depends where you store data on-primese or cloud.
@DjBaxter15
@DjBaxter15 3 месяца назад
Some other alternatives for scheduling and orchestration are: Dagster Prefect Oozie Or whatever your cloud offering might have, I know Google Cloud has Cloud Scheduler. If you suggest Jenkins as a job scheduling tool in this day in age, I will hunt you down...
@kevon217
@kevon217 Год назад
thanks for an overview of the landscape!
@ligiaimusic
@ligiaimusic 3 месяца назад
Thank you so much for this video! Really helpful!
@KahanDataSolutions
@KahanDataSolutions 3 месяца назад
Glad it was helpful!
@mohammedaminelachhabe2087
@mohammedaminelachhabe2087 7 месяцев назад
Very good video. I think we can also add the cloud functions to this list.
@cyclonus01
@cyclonus01 2 года назад
Good stuff bro. I'd add prefect to orchestration/task flow.
@KahanDataSolutions
@KahanDataSolutions 2 года назад
Good call - Thanks for watching
@tomastruchly9484
@tomastruchly9484 Год назад
This video is kick in the balls of Oracle 😀
@__shaikmalikbasha__
@__shaikmalikbasha__ 10 месяцев назад
Could you please make a complete series on Apache Airflow ❤
@Rex_793
@Rex_793 2 года назад
This was a very informative video - very useful to "get the lay of the land" so to speak.
@KahanDataSolutions
@KahanDataSolutions 2 года назад
Glad to hear it! That was the goal
@adityalakkad499
@adityalakkad499 2 года назад
Apache Superset is one of the promising BI tools in my opinion, Can you share your opinion on this, if possible
@aniltembhare2985
@aniltembhare2985 8 месяцев назад
Thanks you for great information.
@StephenRayner
@StephenRayner 2 месяца назад
Brilliant
@FroFoLife
@FroFoLife 6 месяцев назад
Hi, thank you for your video. I know that this is old now but I wish you would put the names of each tool you listed under the tool. If you aren't familiar with the specific tool it can be hard to know how to spell it. I know I can Google but I was taking notes as I was following along. Thank you.
@johnh7770
@johnh7770 5 месяцев назад
Apache Superset is another open source BI/analytics option
@nicky_rads
@nicky_rads 2 года назад
Nice well rounded video, thanks ! One question, where does Databricks and spark fit into the stack?
@KahanDataSolutions
@KahanDataSolutions 2 года назад
Thanks! Databricks would fall in the same area as "cloud databases". Spark would fit in around the "ELT Components" and used primarily to process large amounts of data.
@guruprasadashridharhegde6792
@guruprasadashridharhegde6792 6 месяцев назад
Apache airflow is a great Orchestration tool.
@poizentv
@poizentv 8 месяцев назад
I really need this so bad. Do you have a Data engineer course ? Or any recommendations?
@yashikakarunan2636
@yashikakarunan2636 Год назад
thank you,great explaination
@KahanDataSolutions
@KahanDataSolutions Год назад
Glad it was helpful!
@thomashass1
@thomashass1 Год назад
Very surprised Apache Spark is not mentioned here.
@mohammedamasah1281
@mohammedamasah1281 7 месяцев назад
Same..
@adityanjsg99
@adityanjsg99 Год назад
I know Databricks, dbt, airflow, kafka and power bi
@Faz13able
@Faz13able 7 месяцев назад
What about spark or pyspark? Where does it fit in?
@ukaszdugozima816
@ukaszdugozima816 Год назад
Hello! Thank you for your invaluable video! I find it extremely useful for beginners! I would like to ask about one thing regarding Data Engineer Career. I learnt Pandas in terms of Data Wrangling and Transformation. Therefore, how about Pandas for Data Engineers? Is it useful tool for ETL/ELT transformations? Obviously, the next step will be PySpark, but I would like to start learninig Pandas. It seems it is a good path for the next one. What do you think about it ? I would appreciate it if you could share your views about it.
@TheRealNCYank
@TheRealNCYank 11 месяцев назад
No Oracle for the second layer?
@vb140772
@vb140772 9 месяцев назад
Thanks!
@KahanDataSolutions
@KahanDataSolutions 9 месяцев назад
Thank you!
@himanshuagrawal2800
@himanshuagrawal2800 9 месяцев назад
Hi can you tell me where exactly apache spark fit in this picture
@andrewmaxwell9399
@andrewmaxwell9399 2 года назад
Hey man, may i ask a question? I have an ETL experiences with 2 etl tools and multiple RDBMS (on premise), and i wanted to shift into Data Engineering roles that works usually combining ETL Tools+Python and its libraries/frameworks, am i considered as new graduates or industry professionals? Since i don't have any experiences with Python ? And does it usually means i have to take "paycut"? let's say i make $500 a month as ETL Developer, and i wanted to shift to Data Engineer roles , does it means i will be getting paid like $300 a month since i don't have DE experiences? I really need some guidance... Thankyou :)
@isaacmoreno7518
@isaacmoreno7518 Год назад
I guess you have not tried Exasol (analytical database, arguably the fastest in the market).
@postmandev
@postmandev Год назад
What about Clickhouse?
@muhammadahtshamulhaq4476
@muhammadahtshamulhaq4476 Год назад
I want to be data engineer but still not good in programming language tried a lot python just know SQL how can I be data engineer
@skateforlife3679
@skateforlife3679 Год назад
Apache airflow gets older, lots of problems in production
@willi1978
@willi1978 2 года назад
ETL doesn't care what the destination is. The expression "Reverse ETL" makes no sense, it's still an ETL process.
@KahanDataSolutions
@KahanDataSolutions 2 года назад
I agree that the term is a bit odd, but that's what has stuck as of today. Another term you might see used to describe that process is "Operational Analytics"
@travis3366
@travis3366 Год назад
Is learning informatica worth it?
@KahanDataSolutions
@KahanDataSolutions Год назад
If you are applying for a job that uses it, then yes. I'm sure there are still many companies that use it.
@TechnologyUncovered-b1i
@TechnologyUncovered-b1i Год назад
Python..
@arsenijen9797
@arsenijen9797 Год назад
👍🏻👌🏻💯%
@sunil-de
@sunil-de 10 месяцев назад
you just list out, half of the data team (Devops Engineer, Data Engineer, DBA, SQL Developer, Server Executive, Data Analyst, Business Analyst), You dont need to learn the all of this to be data engineer...
@naheliegend5222
@naheliegend5222 Год назад
Companies in 2022 still running SQL Server with SSIS and SSAS :D
@hfuhruhurr
@hfuhruhurr 2 года назад
Surprised there was no mention of Pandas.
@KahanDataSolutions
@KahanDataSolutions 2 года назад
That's a good one too. I personally haven't used Pandas much but I know others do.
@in6tinct
@in6tinct 2 года назад
Or Spark/Databricks
Далее
What skills do you need as a Data Engineer?
12:08
Просмотров 23 тыс.
🎙Пою РЕТРО Песни💃
3:05:57
Просмотров 1,3 млн
Data Modeling in the Modern Data Stack
10:14
Просмотров 104 тыс.
ETL vs ELT | Modern Data Architectures
4:42
Просмотров 39 тыс.
Data Architecture 101: The Modern Data Warehouse
5:48
How He Got $600,000 Data Engineer Job
19:08
Просмотров 125 тыс.
Top AWS Services A Data Engineer Should Know
13:11
Просмотров 167 тыс.
How to Become a Data Engineer (with no experience)
6:28
The Harsh Reality of Being a Data Engineer
14:21
Просмотров 239 тыс.