A free one-stop shop to learn all tips and techniques about all technologies catering to a wide array of IT professionals or aspirants. At this time focused on Data Engineering on the cloud (Azure, AWS, and GCP) Our Offering: * Training Services * Engineering Services * Managed Services * Staffing Services
how to deploy in prduction with the db.property or any other files .....reading dynaicaly kept in the same path as our jar file is running with the spark submit command .??
@@itversity thanku for the reply ...i want in scala....and run in yarn cluster...i tried many things but my log is not printing in the logs file that i wanted ....in client mode it gets printed but in cluster is difficult ..and also is scala in boom for data engineering ?? in 2024/25??..i purchased your course only in udemy...!!!
in 17th minute count function is referred as wide transformation. why is count wide transformation since we dont need any shuffling of data for counting??
Thanks for the setup video, I was able to install PySpark but facing error while running scrypt sc.textFile("C:\\deckofcards.txt").first() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Spark\spark-3.5.1-bin-hadoop3\python\pyspark dd.py", line 2888, in first rs = self.take(1) ^^^^^^^^^^^^ File "C:\Spark\spark-3.5.1-bin-hadoop3\python\pyspark dd.py", line 2822, in take totalParts = self.getNumPartitions() ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Spark\spark-3.5.1-bin-hadoop3\python\pyspark dd.py", line 952, in getNumPartitions return self._jrdd.partitions().size() ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\java_gateway.py", line 1322, in __call__ File "C:\Spark\spark-3.5.1-bin-hadoop3\python\pyspark\errors\exceptions\captured.py", line 179, in deco return f(*a, **kw) ^^^^^^^^^^^ File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o38.partitions. : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/deckofcards.txt
if i started learning for now than there are very less data engineering internships(if one has data internship it is easy to get a job ,and that person also get hand on experience ) ,,,,so should i do data analyst first only for internship and after that i will study for data engineering for job because it is also a good way , and i need to do an internship because it is in our college criteria , they do not support much if you do not have done internship??