Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
Find us on Udemy Visy below link for our Udemy Courses www.learningjournal.guru/courses/
Find us on Oreilly www.oreilly.com/library/view/apache-kafka-for/9781800202054/ www.oreilly.com/videos/apache-kafka/9781800209343/ www.oreilly.com/videos/kafka-streams-with/9781801811422/
Wonderful explanation. I was studying data cloud in salesforce and they were mentioning this data format multiple time. I was clueless but I got clarity from your video. Thank you sir
Hello, Is this page updated ? Can we rely on this by becoming a member and stay updated ? If not, where do all your courses be updated? I took your PySpark course on Udemy. Though the beginning was really good, the later part of the course did not have a continuous flow. How do I enroll to your batch course ?
Spark reserved memory is 300 mb in size and executor memory should be atleast 1.5X times of the spark reserved memory, i.e. 450 mb, which is why we are taking executor memory per core as 4X, that sums up as 512mb per executor per core
In the last step, you did kinit , that pulled the tgt and then dev uer could list the files. At what point of time, the client interacted with TGS with this tgt?
in last question each and every value you took was default only (128mb, 4, 512mb,5 cores) , so lets say the question is for 50 gb of data then still 3gb would be the answer?
ModuleNotFoundError: No module named 'pyspark.streaming.kafka' error using command spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.13:3.5.1 live_processing.py can you help please?
If no. of cores are 5 per executor, At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core. Suppose, that My config is, 2 executor each with 5 core. Now, how it will create 200 partitions if I do a group by operation? There are 10 cores, and 200 partitions are required to store them, right? How is that possible?
Hi , Thanks for the explanation. It really helps. In the above example let's say In right stream we are getting impressionId=4, and we didn't get matching events for id=4 on left stream for long time, Is it possible to get this record also inside foreachbatch() function before it gets dropped by spark?
thank you for explaining i was looking for a start example to get what it is but videos were like explaining to some experts well i figured out to follow your steps , after running the code and done the ncat command i m getting errors and first one is: "chk-point-dir" any help
C:\kafka\bin\windows>kafka-console-producer.bat --topic test2 --broker-list localhost:9092 < ..\data\sample1.csv The system cannot find the path specified. how to fix this error
Kafka doesn't allow more than 2 consumers to read from the same partition to avoid the same message being read multiple times. Isn't this the case when 2 consumers listen to the dans partition ?