I had a question here sir .... So the logical Plan which is created by Catalyst Optimizer can we called it a Linage and same goes for physical plan can we call it as DAG ?
No, lineage and logical plan are not the same. Lineage in Spark refers to the sequence of transformations that lead to the creation of a DataFrame or RDD (Resilient Distributed Dataset), whereas the logical plan is an abstract representation of the query or operations to be performed, which can be either resolved or unresolved. Yes, the DAG (Directed Acyclic Graph) is based on the physical plan, which is Spark's final execution plan.
Thank you for a video. But I have a question. During the command "airflow scheduler" one operation was repeating and repeating " adopting or resetting orphaned tasks for active dag runs" and it didnt finish, what should it be? Thank you!
ive managed to install and configure airflow but having issues running the DAG when i try to trigger it i keep getting the error that DAG is not found in DagMOdel
Hi , thanks for the video. it's very informative. After airflow scheduler command what shortcut did you use to stop it and go to next airflow webserver command. Thanks in advance.
After running beeline -u jdbc:hive2://, getting this 0: jdbc:hive2://(closed) And if I run any query, getting this Connection is already closed How to resolve this issue?
Instead of using beeline -u jdbc:hive2:// you can invoke hive by using hive command in the terminal. If you again get any error please mail the screenshot to learn@blismosacademy.com
No Pradnyasutar, SQLite is not required for Airflow installation You can mail the screenshot to learn@blismosacademy we will guide you with the installation
This finally cleared up the issues I was having understanding the abstraction layer Hive represents. I understood Hive was a collection of data in HDFS, or some other distributed file system, but I was confused on how that was, why it was and what it did to make that work. I knew the meta store played a role but I was also confused on how data even gets inserted into hive because it feels like a database so again, that abstraction was confusing. This video cleared it all up! You earned a sub. Thanks a ton.
I have followed along with you but I am getting this error any help /opt/spark/spark-3.4.0-bin-hadoop3/bin/spark-class: line 71: /usr/lib/jvm/jdk-11.0.19/bin/java: No such file or directory /opt/spark/spark-3.4.0-bin-hadoop3/bin/spark-class: line 97: CMD: bad array subscript
Hello @moeal5110, Delete the previous spark-3.4.0-bin-hadoop3 file and Install the new Spark File on home i)Instead of this /opt/spark<file_Path> just ii)Click on home -> open the terminal -> get the path of the file using pwd command and paste it in the bashrc file and save it and again start the spark Please do follow the above steps and if you get any error or issues reach out to us: learn@blismosacademy.com
when you were inserting the data with standalone mode (with insert query) into customer/orders table in hive, which file format got created in warehouse dir ? was that ORC ? (the file having name like 000000_0)
It's worth knowing that you can use cat > a_file_name and then type the contents of the file at the command line, like using a heredoc in a script. (I like doing this when writing short scripts as I can see the contents of the terminal as I write it, and it is a good challenge in thinking before you type.)
Couldn’t agree more! It’s the real-life example that get ingrained in the brain helping us remember the technical things very easily! Thank you Blismos!