ะขั‘ะผะฝั‹ะน

Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error 

Sumit Mittal
ะŸะพะดะฟะธัะฐั‚ัŒัั 114 ั‚ั‹ั.
ะŸั€ะพัะผะพั‚ั€ะพะฒ 6 ั‚ั‹ั.
50% 1

๐“๐จ ๐ž๐ง๐ก๐š๐ง๐œ๐ž ๐ฒ๐จ๐ฎ๐ซ ๐œ๐š๐ซ๐ž๐ž๐ซ ๐š๐ฌ ๐š ๐‚๐ฅ๐จ๐ฎ๐ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ, ๐‚๐ก๐ž๐œ๐ค trendytech.in/?src=youtube&su... for curated courses developed by me.
๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ!
"๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ."
๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ -
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLINR
๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐จ๐ฎ๐ญ๐ฌ๐ข๐๐ž ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLUSD
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
BIG DATA INTERVIEW SERIES
This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
Our highly experienced guest interviewer, Chandrali Sarkar, / chandrali-sarkar-4570a... shares invaluable insights and practical guidance drawn from her extensive expertise in the Big Data Domain.
Our expert guest interviewee, Soumya Ranjan Parida, / soumya-parida has an interesting approach to answering the interview questions on Apache Spark, SQL and Azure Cloud Services.
Link of Free SQL & Python series developed by me are given below -
SQL Playlist - โ€ข SQL tutorial for every...
Python Playlist - โ€ข Complete Python By Sum...
Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
Social Media Links :
LinkedIn - / bigdatabysumit
Twitter - / bigdatasumit
Instagram - / bigdatabysumit
Student Testimonials - trendytech.in/#testimonials
TIMESTAMPS : Questions Discussed
00:35 Introduction
01:40 Explain your project's end-to-end pipeline and overview.
03:17 What is the data source for your project?
03:36 Where does the data get ingested?
04:36 What types of data are being processed?
05:04 How do you capture incremental data in an OLTP environment?
07:52 What is the frequency and volume of the incoming data?
08:28 Which file formats have you worked with?
09:00 What is the predicate pushdown?
10:14 What optimizations have you applied in Spark?
10:45 Define broadcast join.
11:10 List some transformations you've used in Spark.
11:27 Explain narrow and wide transformations.
12:03 What is the difference between reduceByKey and groupByKey.
12:56 Have you encountered "out of memory" errors in Spark? How did you resolve them?
14:22 How will salting help in resolving out of memory error?
14:46 What is data skewness?
15:22 Explain cache and persist in Spark.
16:57 If memory and disk are full then in that case what will happen?
17:40 When would you use coalesce and repartition?
18:00 Provide a scenario where coalesce and repartition can be used?
18:38 Where does repartition happen at driver or executor level?
19:30 What is the difference between rank, dense rank, and row number functions?
22:06 Describe the internal process of submitting a Spark job.
Music track: Retro by Chill Pulse
Source: freetouse.com/music
Background Music for Video (Free)
Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

ะžะฟัƒะฑะปะธะบะพะฒะฐะฝะพ:

 

27 ะผะฐะน 2024

ะŸะพะดะตะปะธั‚ัŒัั:

ะกัั‹ะปะบะฐ:

ะกะบะฐั‡ะฐั‚ัŒ:

ะ“ะพั‚ะพะฒะธะผ ััั‹ะปะบัƒ...

ะ”ะพะฑะฐะฒะธั‚ัŒ ะฒ:

ะœะพะน ะฟะปะตะนะปะธัั‚
ะŸะพัะผะพั‚ั€ะตั‚ัŒ ะฟะพะทะถะต
ะšะพะผะผะตะฝั‚ะฐั€ะธะธ : 17   
@jithindev9185
@jithindev9185 ะœะตััั† ะฝะฐะทะฐะด
No idea about skewness.but explaining how salting reduce oom..๐Ÿ˜Š Just highlighting the points where an interviewer can easily catch ...
@soumyaparida9231
@soumyaparida9231 ะœะตััั† ะฝะฐะทะฐะด
Yes friend I have not faced it i have not fluked it .I have already told it is my first DE project and our data volume is small.
@gudiatoka
@gudiatoka ะœะตััั† ะฝะฐะทะฐะด
10:50 AQE also changed the joining technique to broadcast if it can capable to hold the smaller df also if not we can alter the broadcast threshold value to as per desired depending upon the culster config
@soumyaparida9231
@soumyaparida9231 ะœะตััั† ะฝะฐะทะฐะด
Yes,I have specifically mentioned our data volume is less.Please listen to that.As a result AQE will automatically choose broadcast join.
@mohammediqbal2406
@mohammediqbal2406 ะœะตััั† ะฝะฐะทะฐะด
โ€‹โ€‹@@soumyaparida9231 Hi brother, how many years of experience do you have?
@soumyaparida9231
@soumyaparida9231 ะœะตััั† ะฝะฐะทะฐะด
โ€‹@@mohammediqbal2406 1.6 yrs full time and 1 yr internship
@jithindev9185
@jithindev9185 ะœะตััั† ะฝะฐะทะฐะด
Repartitioning happens at driver? 18:58
@jameskhan6972
@jameskhan6972 ะœะตััั† ะฝะฐะทะฐะด
I think re partition happens at executor level, Executors perform the actual data movement and redistribution. They read the data from the existing partitions, shuffle it across the network, and write it into new partitions as specified by the re partitioning logic.
@kch8278
@kch8278 23 ะดะฝั ะฝะฐะทะฐะด
I agree with you. Repartitn happens on executor
@rushirajkadge3995
@rushirajkadge3995 10 ะดะฝะตะน ะฝะฐะทะฐะด
Are row_number values correct shown at 22:00 ? I mean if we are partitioning by marks, then how can output look as shown in the video?
@gudiatoka
@gudiatoka ะœะตััั† ะฝะฐะทะฐะด
14:30 salting never decreases the OOM exception rather it causes the OOM as it replicated the smaller table data multiples time. Salting help us to reduce the skewness( through what i observed it is a hoax for me only ๐Ÿ˜Š)
@soumyaparida9231
@soumyaparida9231 ะœะตััั† ะฝะฐะทะฐะด
It will cause of OOM error if memory allocated is less to each executor.It is not a hoax.And data skewness is one of the reasons for OOM error if you don't know.
@soumyaparida9231
@soumyaparida9231 ะœะตััั† ะฝะฐะทะฐะด
Salting is for improving the partitioning.So it varies from project to project.And well you have partitioned the data.So your conclusion is not correct,because you are trying to generalize.
@hdr-tech4350
@hdr-tech4350 15 ะดะฝะตะน ะฝะฐะทะฐะด
Predicate pushdown What opt used in spark Transformation used Groupby n reduceby Faced oom error ? Salting Data skewness Data spillness Cache persists Lru Repartition vs coalesce Rnk densernk rno What happen submit spark job
@gudiatoka
@gudiatoka ะœะตััั† ะฝะฐะทะฐะด
Brother is saying they are getting data from azure sql and after that they are performing the transformation on top of that If the company is used cloud azure database for their project then how come he only take one fact table in general a application consist of more than 1 fact and if one fact is there then multiple dimension table. So if a company can have money to use azure sql db vm then database must be normalized. These are the commin mistake Please brush them up properly as in real.life interview it will not be easy
@SandeepRajChinnakandukur
@SandeepRajChinnakandukur ะœะตััั† ะฝะฐะทะฐะด
Hi bro, I need some project explanation tips as I have an interview scheduled for the Data engineer role. Appreciate your time. Please DM
@soumyaparida9231
@soumyaparida9231 ะœะตััั† ะฝะฐะทะฐะด
Hi,Maybe I missed that point about one dimension table but do you really think addition of one dimension has any impact on cost? And also transformations are performed in stored procedures here.Only basic level transformations are performed in databricks.Their is nothing to get brushed up here.Do you really think I am working at Accenture without clearing the interview?
ะ”ะฐะปะตะต
Big Data Engineer Live Mock Interview | Topics: Pyspark, Delta Lake, Data Profiling, Data Governance
45:45
ะŸั€ะพัะผะพั‚ั€ะพะฒ 7 ั‚ั‹ั.
Must Watch Live Mock Interview For Data Engineers | System Design | Data Modeling #interview
59:41
ะŸั€ะพัะผะพั‚ั€ะพะฒ 7 ั‚ั‹ั.
My Puzzle Robot is 200x Faster Than a Human
21:21
ะŸั€ะพัะผะพั‚ั€ะพะฒ 7 ะผะปะฝ
Drive through the color๐Ÿš—โ“
00:13
ะŸั€ะพัะผะพั‚ั€ะพะฒ 4,9 ะผะปะฝ
POP CHALLENGE ๐ŸŽˆ
00:36
ะŸั€ะพัะผะพั‚ั€ะพะฒ 631 ั‚ั‹ั.
Big Data Engineer Mock Interview | Big Data Project Pipeline | Managerial #interview #question
31:19
ะŸั€ะพัะผะพั‚ั€ะพะฒ 4,9 ั‚ั‹ั.
Mock Interview for Data Engineers | Spark Optimizations | Real-time Project Challenges and Scenarios
45:21
ะŸั€ะพัะผะพั‚ั€ะพะฒ 11 ั‚ั‹ั.
Data Engineer Mock Interview | ADF | Medallion Architecture | BRONZE, SILVER & GOLD Layer| ADLS GEN2
41:04
ะŸั€ะพัะผะพั‚ั€ะพะฒ 9 ั‚ั‹ั.
Must Watch Live Mock Interview for Aspiring Big Data Engineers | PySpark, Hive & SQL #interview
34:08
ะŸั€ะพัะผะพั‚ั€ะพะฒ 4,6 ั‚ั‹ั.
Big Data Engineer Mock Interview | AWS | Kafka Streaming | SQL | PySpark Optimization #interview
47:48
ะŸั€ะพัะผะพั‚ั€ะพะฒ 10 ั‚ั‹ั.
๐Ÿ”ฅ Top 20 Hadoop Interview Questions 2023 | Big Data Hadoop Interview Questions | Simplilearn
45:45
ะŸั€ะพัะผะพั‚ั€ะพะฒ 5 ั‚ั‹ั.
Live Databricks Developer Mock Interview | Azure | Spark | SQL #coding #interview
47:01
ะŸั€ะพัะผะพั‚ั€ะพะฒ 5 ั‚ั‹ั.
Live Big Data Mock Interview | Technical Round 2 : PySpark | Slowly Changing Dimensions | Data Skew
30:26
ะŸั€ะพัะผะพั‚ั€ะพะฒ 3,8 ั‚ั‹ั.
SQL Mock Interview (Data Analyst): Departments with the Highest Revenues
51:32
ะŸั€ะพัะผะพั‚ั€ะพะฒ 38 ั‚ั‹ั.
Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark
29:08
ะŸั€ะพัะผะพั‚ั€ะพะฒ 4 ั‚ั‹ั.
My Puzzle Robot is 200x Faster Than a Human
21:21
ะŸั€ะพัะผะพั‚ั€ะพะฒ 7 ะผะปะฝ