Тёмный
GK Codelabs
GK Codelabs
GK Codelabs
Подписаться
I am Arpit Singh.
I have been working in IT industry for about 10 years now, and 8+ years in Big Data.
I provide useful content for Big Data aspirants, who need essential interview questions, and use case scenarios all built and explained from scratch.
Please subscribe to my channel for all interesting videos..!!

For any queries write to me at- gkcodelabs@gmail.com

Keep Watching :)
Describe and Summary in Apache spark
8:28
2 года назад
Комментарии
@chinnalearns9565
@chinnalearns9565 16 дней назад
Code plz
@NewThingsAk
@NewThingsAk 16 дней назад
can u please send me github code
@naveenbhandari5097
@naveenbhandari5097 2 месяца назад
really helpful video. it helped me a alot
@BishalKarki-pe8hs
@BishalKarki-pe8hs 2 месяца назад
peovide the code sir
@BishalKarki-pe8hs
@BishalKarki-pe8hs 2 месяца назад
please can you provide the code for this
@ravishankarrallabhandi531
@ravishankarrallabhandi531 2 месяца назад
How can we handle the case where source records are closed / deleted ?
@chandanpatra1053
@chandanpatra1053 2 месяца назад
what about lambda & kappa architecture . Can't we say this in the interview?
@1HourBule
@1HourBule 2 месяца назад
A person on RU-vid who actually knows Spark
@dev4128
@dev4128 3 месяца назад
Thankssss pelase make videos on ur daily work
@pravinmahindrakar6144
@pravinmahindrakar6144 3 месяца назад
Thanks a lot, I think we can use row_number window function to get updated records by using partitions by emp_id and order by date desc. Finally can filter for row_number=1
@soulamazing1228
@soulamazing1228 3 месяца назад
The point of using Apache Spark is to handle large datasets so why are you converting the values to pandas... Pointless video in real world scenarios.
@GKCodelabs
@GKCodelabs 3 месяца назад
Conversion to Pandas is only done on final, aggregated and normalized data where volume is brought down only for what is required for "Visualization" Dont get confused with Data "processing" and "Visualization" You never run huge Spark jobs when a visual report is requested on the fly, that approach becomes "Pointless" Reports are pulled from aggregated data. And Matplotib is one of the approach, there are many other tools and approaches. Thanks for bringing this up btw, this will help others as well, in case someone has similar doubt.
@mallinathbirajdar6610
@mallinathbirajdar6610 3 месяца назад
हे सर्व फुलपाखरांचे फोटो आणि व्हिडिओ कुठे घेतले आहेत.? म्हणजे गावाकडे की पुण्यामध्ये.? आणि पुण्यात घेतले असतील तर नेमका location काय आहे.?
@ravikumart6561
@ravikumart6561 3 месяца назад
Could you please share methods or some pseudo code to implement this concept .. Arpit !!!!
@rkdatalabs404
@rkdatalabs404 4 месяца назад
Very useful information. Clearly explained.Thank you arpit❤
@mohitjain2196
@mohitjain2196 4 месяца назад
best course ❤🔥
@electricalsir
@electricalsir 5 месяцев назад
Thanks man your are amazing 😍❤❤❤
@sonurohini6764
@sonurohini6764 5 месяцев назад
Without any experience in data engineering can we join as a DE from different non IT background. If yes how supportive is the team.
@shashireddy3573
@shashireddy3573 5 месяцев назад
Hi Arpit, do you have any real-time Spark-scale projects in a cloud environment? I searched your playlist but couldn't find any relevant videos.
@user-de6zx5er2s
@user-de6zx5er2s 6 месяцев назад
Hi can you provide the data set we will try from our end
@MerleNader
@MerleNader 6 месяцев назад
Promo-SM 😔
@satyendrakumar4349
@satyendrakumar4349 6 месяцев назад
बहुत अच्छी जानकारी दी ऐसी वीडियो रेगुलर बनाते रहो
@ashutoshojha4244
@ashutoshojha4244 6 месяцев назад
Hey, Arpit! Do u suggest going with databricks after learning pyspark, even though i want to make a career as an AWS data engineer, ..or would it be more apt for me to just go with aws glue instead?
@GKCodelabs
@GKCodelabs 6 месяцев назад
No doubts in going with AWS.. But dont just stick to AWS Glue, its just one convinient way to spark loads. Explore all Data Analytics services.. PS: AWS Data Analytics is now the new term for Data Engineering services. Happy Learning 👍🏻
@akshaygidwani4360
@akshaygidwani4360 6 месяцев назад
Hi Arpit! I have been practicing PySpark for a while now, studied and implemented most of the concepts you mentioned in the video as individual concepts. I am looking for projects/use cases where I can combine all the concepts building a meaningful end result. Please could you suggest some? Thanks!
@ashutoshojha4244
@ashutoshojha4244 6 месяцев назад
Hi , where did u learn pyspark from? I am thinking of buying the udemmy pyspark course by jose portila. Can you share your resources please?
@charangowdamn8661
@charangowdamn8661 6 месяцев назад
Which course did you refer for pyspark
@piashreetalukdar4258
@piashreetalukdar4258 7 месяцев назад
Great Video. Nicely explained. Thank you 😊
@user-lp7sb5dw7l
@user-lp7sb5dw7l 8 месяцев назад
When you do repartition and then partitionby already data is partitioned now based on partitionby column they why no of part file depend on repartition() again?
@ravulapallivenkatagurnadha9605
@ravulapallivenkatagurnadha9605 9 месяцев назад
Nice videos
@kampfer6375
@kampfer6375 10 месяцев назад
Nice explanation
@astropanda1623
@astropanda1623 10 месяцев назад
Very good explanation
@முரளிதரன்
@முரளிதரன் 10 месяцев назад
2 months big data which company is hiring.. Not suitable for freshers and also for some experienced. Join u will know..😂😂
@anurodhpatil4776
@anurodhpatil4776 10 месяцев назад
Wooo great sir.....step by step ❤
@vinitpandey4424
@vinitpandey4424 11 месяцев назад
Its very basic. I would suggest to keep in python code rather than sql.
@JjCSJ
@JjCSJ 11 месяцев назад
beautiful explanation
@FormulaMedia-gl9pi
@FormulaMedia-gl9pi 11 месяцев назад
i am getting ssh connection timeout in gitbash. What should be the reason behind this ?
@absarusain5196
@absarusain5196 11 месяцев назад
GCP
@avinash7003
@avinash7003 Год назад
Is spark + Scala roles does exit?
@yashgupta6684
@yashgupta6684 Год назад
File Sink not discussed I was looking for file sink only
@shashireddy3573
@shashireddy3573 Год назад
Hi I need big data real time projects for adding in my resume.
@subhadipsamanta35
@subhadipsamanta35 Год назад
So life saving 🙌🏻 Thankyou
@Sagar0155
@Sagar0155 Год назад
Playlist is really helpful. Explained from very basic to high level including important minor services
@2002asimanand
@2002asimanand Год назад
Good initiative.. Waiting for gcp...
@MalayaleeYoutuber
@MalayaleeYoutuber Год назад
Olap cubes and data warehouse are Different(in diagram it is marked together ). Data warehouse warehouse persist data , in Bigdata it will be in data lake gold layer.
@srikanthreddy4516
@srikanthreddy4516 Год назад
AWS
@prabhatgupta6415
@prabhatgupta6415 Год назад
azure data enginner is in huge demand as compared to aWS
@my_j.a.r.v.i.s.
@my_j.a.r.v.i.s. Год назад
Sirrrrr.... you are back. I started my data engineering journey from here. I learnt basics of GCP but now in my job AWS will be used. I am happy you chose AWS.
@fenixbros-1
@fenixbros-1 Год назад
GCP
@PrashantKumar-vt2wr
@PrashantKumar-vt2wr Год назад
Fantastic material...very easily and smoothly you describe everything...
@GKCodelabs
@GKCodelabs Год назад
Thanks Prashant 😊👍🏻
@A_Dasgupta
@A_Dasgupta Год назад
Perfect, would definitely like to explore it!
@prabakaran758
@prabakaran758 Год назад
Aws
@sravankumar1767
@sravankumar1767 Год назад
Thank you, am getting duplicate records when i mentioned overwrite mode, prevous records as well as new records also. How can we resolve this issue