DataMaking

DataMaking

542
1 026 969

Подписаться

A place where you can learn various Data Engineering, Big Data Technologies, Data Science and Machine Learning with Hands-on Tutorial !!!

How to install MongoDB on Ubuntu 22.04 OS | Data Engineering | Part 18 | DM | DataMaking

7:40

How to install MongoDB on Ubuntu 22.04 OS | Data Engineering | Part 18 | DM | DataMaking

9 месяцев назад

How to install Apache Cassandra on Ubuntu 22.04 OS | Data Engineering | Part 17 | DM | DataMaking

17:14

How to install Apache Cassandra on Ubuntu 22.04 OS | Data Engineering | Part 17 | DM | DataMaking

9 месяцев назад

How to install Apache NiFi on Ubuntu 22.04 OS | Data Engineering | Part 16 | DM | DataMaking

12:22

How to install Apache NiFi on Ubuntu 22.04 OS | Data Engineering | Part 16 | DM | DataMaking

10 месяцев назад

How to install Apache Superset on Ubuntu 22.04 OS | Data Engineering | Part 15 | DM | DataMaking

13:48

How to install Apache Superset on Ubuntu 22.04 OS | Data Engineering | Part 15 | DM | DataMaking

11 месяцев назад

How to install Apache Airflow on Ubuntu 22.04 OS | Data Engineering | Part 14 | DM | DataMaking

14:55

How to install Apache Airflow on Ubuntu 22.04 OS | Data Engineering | Part 14 | DM | DataMaking

11 месяцев назад

How to install Apache Kafka on Ubuntu 22.04 OS | Data Engineering | Part 13 | DM | DataMaking

19:52

How to install Apache Kafka on Ubuntu 22.04 OS | Data Engineering | Part 13 | DM | DataMaking

11 месяцев назад

How to install Apache Spark 3 on Ubuntu 22.04 OS | Data Engineering | Part 12 | DM | DataMaking

19:32

How to install Apache Spark 3 on Ubuntu 22.04 OS | Data Engineering | Part 12 | DM | DataMaking

11 месяцев назад

How to install Apache Hive on Ubuntu 22.04 OS | Data Engineering | Part 11 | DM | DataMaking

23:16

How to install Apache Hive on Ubuntu 22.04 OS | Data Engineering | Part 11 | DM | DataMaking

11 месяцев назад

How to install Microsoft SQL Server as Docker Container on Ubuntu OS | Part 10 | DM | DataMaking

7:23

How to install Microsoft SQL Server as Docker Container on Ubuntu OS | Part 10 | DM | DataMaking

11 месяцев назад

How to install PostgreSQL on Ubuntu 22.04 OS | Data Engineering | RDBMS | Part 9 | DM | DataMaking

10:21

How to install PostgreSQL on Ubuntu 22.04 OS | Data Engineering | RDBMS | Part 9 | DM | DataMaking

11 месяцев назад

How to install MySQL on Ubuntu 22.04 OS | Data Engineering | RDBMS | Part 8 | DM | DataMaking

7:18

How to install MySQL on Ubuntu 22.04 OS | Data Engineering | RDBMS | Part 8 | DM | DataMaking

11 месяцев назад

How to install Apache Hadoop 3 on Ubuntu 22.04 OS | Data Engineering | Part 7 | DM | DataMaking

21:14

How to install Apache Hadoop 3 on Ubuntu 22.04 OS | Data Engineering | Part 7 | DM | DataMaking

11 месяцев назад

How to pull Ubuntu Docker Image and Run Ubuntu OS as Docker Container | Part 6 | DM | DataMaking

9:12

How to pull Ubuntu Docker Image and Run Ubuntu OS as Docker Container | Part 6 | DM | DataMaking

11 месяцев назад

How to install Visual Studio Code IDE on Ubuntu 22.04 OS | Python and All | Part 5 | DM | DataMaking

9:14

How to install Visual Studio Code IDE on Ubuntu 22.04 OS | Python and All | Part 5 | DM | DataMaking

11 месяцев назад

How to install PyCharm Community Edition IDE on Ubuntu 22.04 OS | Python | Part 4 | DM | DataMaking

11:29

How to install PyCharm Community Edition IDE on Ubuntu 22.04 OS | Python | Part 4 | DM | DataMaking

11 месяцев назад

How to install Docker on Ubuntu 22.04 | Containerization | Operating System |Part 3| DM | DataMaking

7:39

How to install Docker on Ubuntu 22.04 | Containerization | Operating System |Part 3| DM | DataMaking

11 месяцев назад

How to install Anaconda Python on Ubuntu 22.04 | Operating System(OS) | Part 2 | DM | DataMaking

7:37

How to install Anaconda Python on Ubuntu 22.04 | Operating System(OS) | Part 2 | DM | DataMaking

11 месяцев назад

How to install Java 8(OpenJDK) on Ubuntu 22.04 | Operating System(OS) | Part 1 | DM | DataMaking

7:13

How to install Java 8(OpenJDK) on Ubuntu 22.04 | Operating System(OS) | Part 1 | DM | DataMaking

Год назад

Spark SQL Concepts | DataFrame | PySpark | Apache Spark | Part 1

34:07

Spark SQL Concepts | DataFrame | PySpark | Apache Spark | Part 1

Год назад

Spark SQL Concepts | DataFrame | PySpark | Apache Spark | Part 2

38:22

Spark SQL Concepts | DataFrame | PySpark | Apache Spark | Part 2

Год назад

Spark SQL Concepts | DataFrame | PySpark | Apache Spark | Part 3

11:01

Spark SQL Concepts | DataFrame | PySpark | Apache Spark | Part 3

Год назад

Spark SQL Concepts | DataFrame | SQL | PySpark | Apache Spark | Part 4

6:24

Spark SQL Concepts | DataFrame | SQL | PySpark | Apache Spark | Part 4

Год назад

Spark SQL Concepts | DataFrame | SQL | PySpark | Apache Spark | Part 5

38:36

Spark SQL Concepts | DataFrame | SQL | PySpark | Apache Spark | Part 5

Год назад

Spark Structured Streaming | PySpark | Apache Spark | Part 1

34:11

Spark Structured Streaming | PySpark | Apache Spark | Part 1

Год назад

Spark Structured Streaming | PySpark | Apache Spark | Part 2

24:10

Spark Structured Streaming | PySpark | Apache Spark | Part 2

Год назад

Spark Structured Streaming | PySpark | Apache Spark | Part 3

33:19

Spark Structured Streaming | PySpark | Apache Spark | Part 3

Год назад

Spark Structured Streaming | PySpark | Apache Spark | Part 4

25:01

Spark Structured Streaming | PySpark | Apache Spark | Part 4

Год назад

Apache Kafka Concepts | Apache Kafka | Hands-On | Part 1

32:46

Apache Kafka Concepts | Apache Kafka | Hands-On | Part 1

Год назад

Apache Kafka Concepts | Apache Kafka | Hands-On | Part 2

31:57

Apache Kafka Concepts | Apache Kafka | Hands-On | Part 2

Год назад

Комментарии

@amitsahuit 6 дней назад

Can I get your github URL where I can download this codes?

@krishnapawanism4226 16 дней назад

Where is the source of the data set that you got from ? is that from UCI MLR or kaggle or Any website or you scraped ??

@DataMaking 14 дней назад

Hi, it is a public dataset only, I missed the source website url, my apologies. Please send us your email address to datamaking.training@gmail.com. We will send you the dataset. Thank you.

@virgilioespina7556 27 дней назад

C:\Windows\System32>spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/09/26 11:27:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 24/09/26 11:27:59 ERROR Main: Failed to initialize Spark session. java.lang.UnsupportedOperationException: getSubject is supported only if a security manager is allowed

@jyotsanachandrakar1527 29 дней назад

Hi , really liked the video,can you send the dataset please

@NamNguyen-re6qp 29 дней назад

I did not receive an email sending the OTP code when registering an datamaking account :(((

@nadeemullahhussani6711 Месяц назад

hi sir, i am getting this error, can you give me the solution? Failed to get schema version. Underlying cause: java.sql.SQLNonTransientConnectionException : Could not create connection to database server. Attempted reconnect 3 times. Giving up. SQL Error code: 0 Use --verbose for detailed stacktrace. *** schemaTool failed *** --verbose org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. Caused by: java.sql.SQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up. Caused by: com.mysql.cj.exceptions.CJException: Public Key Retrieval is not allowed Caused by: com.mysql.cj.exceptions.UnableToConnectException: Public Key Retrieval is not allowed *** schemaTool failed ***

@Nalla-perumal Месяц назад

Could you please share the source code.. it would be better to follow the same steps and understand in a better way. @datamaking

@akshatgupta-educatingall3182 Месяц назад

what is purpose of this if there is no documentation ?

@DataMaking Месяц назад

Hi, sorry for the inconveniences. I will try to recover from my old laptop and publish it in my blog. Appreciate it. Thank you 🙏

@bharatbhojwani4144 Месяц назад

I need vm

@DataMaking Месяц назад

Hi, please reach out to datamaking.training@gmail.com. Team will reach out to you soon. Thank you for showing interest in my video technical content. Appreciate it. Thank you 🙏

@tonatiuhgarciareyes6710 Месяц назад

For those who don.t whan't the anaconda environment by default each time you open a terminal run this 'conda config --set auto_activate_base false', this will let you open terminals as usual and invoke conda only when desired

@himalayanpeace 2 месяца назад

Is it necessary to download in virtual machine?

@DataMaking 2 месяца назад

@@himalayanpeace Hi , not necessarily you can use your own Hadoop/Spark environment or build your own. Thank you.

@himalayanpeace 2 месяца назад

Can we use pyspark and MySQL on jupyter ?

@himalayanpeace 2 месяца назад

@@DataMakingi have few questions that I want to solve and I'm very confused 😭😭

@PavloKalynovskyi 2 месяца назад

for all who gets error "Microsoft ODBC Driver 18 for SQL Server : Client unable to establish connection." after executing "/opt/mssql-tools/bin/sqlcmd...", just add "-C" at the end of the line.

@DataMaking 2 месяца назад

@@PavloKalynovskyi thank you for showing interest and helping the community. I really appreciate it 🙏

@Nalla-perumal 2 месяца назад

@DataMaking #MAIN_SCRIPT from configparser import ConfigParser from pyspark.sql import SparkSession from pyspark.sql.functions import col, lit, from_json from pyspark.sql.types import StringType, StructType import time # Configuration config_file = 'cred.conf' config = ConfigParser() config.read(config_file) # Kafka Configuration kafka_host_name = config.get("kafka", "host") kafka_port_no = config.get("kafka", "port_no") kafka_bootstrap_server = f"{kafka_host_name}:{kafka_port_no}" input_topic_name = config.get("kafka", "input_topic_name") output_topic_name = config.get("kafka", "output_topic_name") # MySQL Configuration mysql_host_name = config.get('mysql', 'host') mysql_port_no = config.get('mysql', 'port_no') mysql_user_name = config.get('mysql', 'username') mysql_password = config.get('mysql', 'password') mysql_database_name = config.get('mysql', 'db_name') mysql_driver = config.get('mysql', 'driver') mysql_jdbc_url = f"jdbc:mysql://{mysql_host_name}:{mysql_port_no}/{mysql_database_name}" # Cassandra Configuration cassandra_host_name = config.get("cassandra", "host") cassandra_port = config.get("cassandra", "port_no") cassandra_keyspace = config.get("cassandra", "keyspace") cassandra_table_name = config.get("cassandra", "table_name") cassandra_username = config.get("cassandra", "username") cassandra_password = config.get("cassandra", "password") def save_to_cassandra(current_df, epoch_id): print(f"Saving batch {epoch_id} to Cassandra") current_df.write \ .format('org.apache.spark.sql.cassandra') \ .mode('append') \ .options(table=cassandra_table_name, keyspace=cassandra_keyspace) \ .save() def save_to_mysql(current_df, epoch_id): db_credentials = {"user": mysql_user_name, "password": mysql_password, "driver": mysql_driver} processed_at = time.strftime("%Y-%m-%d %H:%M:%S") current_final_df = current_df \ .withColumn("processed_at", lit(processed_at)) \ .withColumn("batch_id", lit(epoch_id)) print(f"Saving batch {epoch_id} to MySQL") current_final_df.write.jdbc(mysql_jdbc_url, "table_name", mode="append", properties=db_credentials) def create_spark_session(): return SparkSession.builder \ .appName("data-processing") \ .master("local[*]") \ .config('spark.jar.packages', 'file:///home/nalla/apps/spark-3.2.4-bin-hadoop2.7/jars/spark-cassandra-connector_2.12-3.2.0.jar, file:///home/nalla/apps/spark-3.2.4-bin-hadoop2.7/jars/mysql-connector-java-8.0.33.jar') \ .config('spark.cassandra.connection.host', cassandra_host_name) \ .config('spark.cassandra.connection.port', cassandra_port) \ .config('spark.cassandra.auth.username', cassandra_username) \ .config('spark.cassandra.auth.password', cassandra_password) \ .getOrCreate() def main(): print("Starting Spark Session") spark = create_spark_session() print("Reading data from Kafka topic") # Construct a streaming dataframe that reads from Kafka topic order_df = spark.readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", kafka_bootstrap_server) \ .option("subscribe", input_topic_name) \ .option("startingOffsets", "latest") \ .load() order_df1 = order_df.selectExpr("CAST(value as STRING)", "timestamp") # Define schema for orders order_schema = StructType() \ .add("order_id", StringType()) \ .add("created_at", StringType()) \ .add("discount", StringType()) \ .add("product_id", StringType()) \ .add("quantity", StringType()) \ .add("subtotal", StringType()) \ .add("tax", StringType()) \ .add("total", StringType()) \ .add("customer_id", StringType()) order_df2 = order_df1.select(from_json(col("value"), order_schema).alias("orders"), "timestamp") order_df3 = order_df2.select("orders.*", "timestamp") print("Writing data to Cassandra") order_df3.writeStream \ .trigger(processingTime='15 seconds') \ .outputMode("update") \ .foreachBatch(save_to_cassandra) \ .start() print("Loading customer data") # Load customer data customer_df = spark.read.format('csv') \ .option('header', True) \ .option('inferSchema', True) \ .load(config.get("paths", "customer_data_file_path")) # Join with order data order_df4 = order_df3.join(customer_df, order_df3.customer_id == customer_df.customer_id, 'inner') # Aggregate data order_df5 = order_df4.groupBy("source", "state") \ .agg({'total': 'sum'}).select("source", "state", col("sum(total)").alias("total_sum_amount")) print("Writing aggregate data to console for debugging") # Write to console for debugging order_df5.writeStream \ .trigger(processingTime='15 seconds') \ .outputMode('update') \ .option("truncate", "false") \ .format("console") \ .start() print("Saving aggregate data to MySQL") # Save final results to MySQL order_df5.writeStream \ .trigger(processingTime='15 seconds') \ .outputMode('update') \ .foreachBatch(save_to_mysql) \ .start() spark.streams.awaitAnyTermination() if __name__ == "__main__": main()

@Nalla-perumal 2 месяца назад

#DDL --mysql : CREATE TABLE your_table_name ( source VARCHAR(255), state VARCHAR(255), total_sum_amount DECIMAL(10, 2), processed_at DATETIME, batch_id INT ); --cql : CREATE TABLE your_keyspace.your_table_name ( order_id TEXT PRIMARY KEY, created_at TEXT, discount TEXT, product_id TEXT, quantity TEXT, subtotal TEXT, tax TEXT, total TEXT, customer_id TEXT, timestamp TIMESTAMP ); #configfile - cred.conf [kafka] host = localhost port_no = 9092 input_topic_name = order-events output_topic_name = output-topic [mysql] host = localhost username = root password = datamaking db_name = ecom_db port_no = 3306 driver = com.mysql.cj.jdbc.Driver mysql_salesbycardtype_tbl = charts_salesbycardtype mysql_salesbycountry_tbl = charts_salesbycountry [cassandra] host = cassandra-listener-hostname port_no = 9042 keyspace = table_name = username = cassandra password = cassandra

@Nalla-perumal 2 месяца назад

Hi, Thanks for your wonderful content !, somehow actual video code is not in the downloaded resource. and save_to_mysql is not cleared could you please upload the actual content what you taught on this video, In the latest script of realtime_data_processing.py "cassandra part" is totally missing. please consider @DataMaking

@DataMaking 2 месяца назад

@@Nalla-perumal Hi, thank you for showing interest. Unfortunately I don't have this source since I created very long back. I am planning to re-create in future. Thank you.

@Nalla-perumal 2 месяца назад

@@DataMaking Hi thanks for your response, I recreated this code, I will share the code snippet. 🙂

@Nalla-perumal 2 месяца назад

@@DataMaking from configparser import ConfigParser from pyspark.sql import SparkSession from pyspark.sql.functions import col, lit, from_json from pyspark.sql.types import StringType, StructType import time # Configuration config_file = 'cred.conf' config = ConfigParser() config.read(config_file) # Kafka Configuration kafka_host_name = config.get("kafka", "host") kafka_port_no = config.get("kafka", "port_no") kafka_bootstrap_server = f"{kafka_host_name}:{kafka_port_no}" input_topic_name = config.get("kafka", "input_topic_name") output_topic_name = config.get("kafka", "output_topic_name") # MySQL Configuration mysql_host_name = config.get('mysql', 'host') mysql_port_no = config.get('mysql', 'port_no') mysql_user_name = config.get('mysql', 'username') mysql_password = config.get('mysql', 'password') mysql_database_name = config.get('mysql', 'db_name') mysql_driver = config.get('mysql', 'driver') mysql_jdbc_url = f"jdbc:mysql://{mysql_host_name}:{mysql_port_no}/{mysql_database_name}" # Cassandra Configuration cassandra_host_name = config.get("cassandra", "host") cassandra_port = config.get("cassandra", "port_no") cassandra_keyspace = config.get("cassandra", "keyspace") cassandra_table_name = config.get("cassandra", "table_name") cassandra_username = config.get("cassandra", "username") cassandra_password = config.get("cassandra", "password") def save_to_cassandra(current_df, epoch_id): print(f"Saving batch {epoch_id} to Cassandra") current_df.write \ .format('org.apache.spark.sql.cassandra') \ .mode('append') \ .options(table=cassandra_table_name, keyspace=cassandra_keyspace) \ .save() def save_to_mysql(current_df, epoch_id): db_credentials = {"user": mysql_user_name, "password": mysql_password, "driver": mysql_driver} processed_at = time.strftime("%Y-%m-%d %H:%M:%S") current_final_df = current_df \ .withColumn("processed_at", lit(processed_at)) \ .withColumn("batch_id", lit(epoch_id)) print(f"Saving batch {epoch_id} to MySQL") current_final_df.write.jdbc(mysql_jdbc_url, "table_name", mode="append", properties=db_credentials) def create_spark_session(): return SparkSession.builder \ .appName("data-processing") \ .master("local[*]") \ .config('spark.jar.packages', 'file:///home/nalla/apps/spark-3.2.4-bin-hadoop2.7/jars/spark-cassandra-connector_2.12-3.2.0.jar, file:///home/nalla/apps/spark-3.2.4-bin-hadoop2.7/jars/mysql-connector-java-8.0.33.jar') \ .config('spark.cassandra.connection.host', cassandra_host_name) \ .config('spark.cassandra.connection.port', cassandra_port) \ .config('spark.cassandra.auth.username', cassandra_username) \ .config('spark.cassandra.auth.password', cassandra_password) \ .getOrCreate() def main(): print("Starting Spark Session") spark = create_spark_session() print("Reading data from Kafka topic") # Construct a streaming dataframe that reads from Kafka topic order_df = spark.readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", kafka_bootstrap_server) \ .option("subscribe", input_topic_name) \ .option("startingOffsets", "latest") \ .load() order_df1 = order_df.selectExpr("CAST(value as STRING)", "timestamp") # Define schema for orders order_schema = StructType() \ .add("order_id", StringType()) \ .add("created_at", StringType()) \ .add("discount", StringType()) \ .add("product_id", StringType()) \ .add("quantity", StringType()) \ .add("subtotal", StringType()) \ .add("tax", StringType()) \ .add("total", StringType()) \ .add("customer_id", StringType()) order_df2 = order_df1.select(from_json(col("value"), order_schema).alias("orders"), "timestamp") order_df3 = order_df2.select("orders.*", "timestamp") print("Writing data to Cassandra") order_df3.writeStream \ .trigger(processingTime='15 seconds') \ .outputMode("update") \ .foreachBatch(save_to_cassandra) \ .start() print("Loading customer data") # Load customer data customer_df = spark.read.format('csv') \ .option('header', True) \ .option('inferSchema', True) \ .load(config.get("paths", "customer_data_file_path")) # Join with order data order_df4 = order_df3.join(customer_df, order_df3.customer_id == customer_df.customer_id, 'inner') # Aggregate data order_df5 = order_df4.groupBy("source", "state") \ .agg({'total': 'sum'}).select("source", "state", col("sum(total)").alias("total_sum_amount")) print("Writing aggregate data to console for debugging") # Write to console for debugging order_df5.writeStream \ .trigger(processingTime='15 seconds') \ .outputMode('update') \ .option("truncate", "false") \ .format("console") \ .start() print("Saving aggregate data to MySQL") # Save final results to MySQL order_df5.writeStream \ .trigger(processingTime='15 seconds') \ .outputMode('update') \ .foreachBatch(save_to_mysql) \ .start() spark.streams.awaitAnyTermination() if __name__ == "__main__": main()

@sankark3822 2 месяца назад

thank you, bro. what version you have setup. Do you have the superset4 version.

@GaneshSundarachary 2 месяца назад

super bro

@DataMaking 2 месяца назад

@@GaneshSundarachary thank you for your feedback 🙏

@lucasgonzalezsonnenberg3204 2 месяца назад

DO you have another video, which explains how to use superset?

@DataMaking 2 месяца назад

@@lucasgonzalezsonnenberg3204 thank you for showing on my technical content. I have another video but little older. FYI. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-NY3IqbQSa-k.htmlsi=YP_weidRL7LbUeWM

@gamingclips5874 2 месяца назад

finally after two days of searching found best video about spark streaming of my interest! Thnak You !❤

@DataMaking 2 месяца назад

Thank you for your message and feedback. Appreciate it. I hope it helps you in someway 🙏

@RajAtagaraha 2 месяца назад

where have you installed all the necessary software, if you have infra setup video for the same

@siddharthat88 2 месяца назад

how to setup the connection if the spark cluster is not local, there is no option to enter username and password except host and port in the UI

@DataMaking 2 месяца назад

@@siddharthat88 Hi, if you want to connect remote Spark cluster, you can use spark standalone cluster URL or resource manager URL based on the setup. Additionally some security property also need to provide. You can provide more details accordingly we can discuss here. Thank you 🙏

@sabyasachimanna1206 2 месяца назад

I am a new bee in the Data Engineering domain. I have idea on the most of the tech stacks of big data application but I was looking to connect all the dots. Your explanation was really very nice and smooth and the example also very relatable with current scenario world. Awesome content. Please keep uploading such good stuffs 🙏🙏

@DataMaking 2 месяца назад

Glad it was helpful!

@akindia8519 2 месяца назад

@@DataMaking hello sir. Can this playlist be used for the role of data analyst, or is it only relevant for data engineers?

@DataMaking 2 месяца назад

@@akindia8519 Hi, this is more on data engineering, but you can use PySpark technology as one of skill set for data analyst job when you want to analysis big data(large data set). Thank you 🙏

@ajaykalal2831 3 месяца назад

Sir how to save this current work are we able to continue from this step afterwards ?

@DataMaking 3 месяца назад

@@ajaykalal2831 Hi, I didn't understand your question?

@AmitPanda-p9s 3 месяца назад

i have a huge data after doing this it was 20 millions rows , which is taking lot of time

@alexmanners1540 3 месяца назад

my ubuntu don´t recognize the bash comand

@DataMaking 3 месяца назад

@@alexmanners1540 Hi, could you please try with sh command.

@rahulparihar4153 3 месяца назад

I'm getting the following error can somebody help hadoop@rahulparihar-Inspiron-N5010:~$ hive Hive Session ID = 661e5791-5e5d-497e-83cc-41167373ad0a Exception in thread "main" java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap') at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:413) at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:389) at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.util.RunJar.run(RunJar.java:328) at org.apache.hadoop.util.RunJar.main(RunJar.java:241)

@kingchudi5759 4 месяца назад

Majority of user uses windows so why not also do for windows installation

@DataMaking 4 месяца назад

Hi, thank you for your feedback. I will try it out in future video series. At the same time, few things I want to mention 1. Real time we work in Linux environment only except few development tasks. 2. I don't have windows environment with me I respect your ask, i will try it out in future video series 🙏

@Music13. 4 месяца назад

error despues de --verbose nano hive-site.xml <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&autoReconnect=true&useSSL=false&allowPublicKeyRetrieval=true</value> </property>

@nabeelasyed1034 4 месяца назад

How to fix the error externally-managed-environment

@DataMaking 4 месяца назад

Hi, could you please share error message and on which step you are getting this error. Thank you 🙏

@nabeelasyed1034 4 месяца назад

@@DataMaking error: externally-managed-environment got this error while installing apache airflow

@thebinarymanual 4 месяца назад

what ip addr was it sir ?

@abarmagi2922 4 месяца назад

Sir, can you kindly post the next video by showing demo for consumer and producer

@binh152 4 месяца назад

Thanks for vid

@DataMaking 4 месяца назад

Thank you for your feedback, appreciate it. Happy Learning 🎉🙏

@vohoang6693 4 месяца назад

Now, it's still working. Thank you for this video I can see many insteresting about set up big data tool, I will see it in the future

@vohoang6693 4 месяца назад

Update: the command in 12:06 is wrong with me. I use CREATE USER '....'@'%' IDENTIFIED WITH mysql_native_password BY '....'; instead

@hansherreralopez1753 4 месяца назад

tks 👌

@DataMaking 4 месяца назад

You are welcome

@lucasgonzalezsonnenberg3204 5 месяцев назад

It worked for me!!!!! Thank you!

@DataMaking 5 месяцев назад

You are welcome. Thank you for letting us know. Appreciate it 👍

@adityakishore5946 5 месяцев назад

Sir, please provide the commands text file. It will be very helpfull for us.

@DataMaking 5 месяцев назад

Hi, you can download from here www.datamaking.com/download

@selene8721 5 месяцев назад

It is saying unsupported file type when i'm trying to add cassandra rep to system packages

@selene8721 4 месяца назад

got it i uninstalled it and manually installed the cassandra and it's working fine as of now.

@sivoneher8663 5 месяцев назад

Hi. Your video is very good. You clearly provided the steps to be easy understand. I need your text editor which give every step. Can you upload to the subscribe link.

@DataMaking 5 месяцев назад

Hi, thank you for trying and reaching out to us. Please check it out here www.datamaking.com/download Please let me know if you face any issues 🙏

@DSMarktine 5 месяцев назад

can you give the instruction file ?

@DataMaking 5 месяцев назад

Hi, please download it from here: datamaking.com/download/ Please let me know if any other information is required.

@chadbinsigma6487 5 месяцев назад

Pin this For those of you who got an error after the --verbose part. here's the fix. First off one of the xml file he provided has 'Hive' as user and 'Datamaking' as password. Go back and change that to your username and password (your mysql user and pass) After that its likely still failing because of the permissions to retrieve key are denied. To fix that, 1. Connect to mysql cli in ur terminal: mysql -u your_username -p #enter that password 2. Once you're logged into the MySQL command-line client, execute the ALTER USER command to change the authentication plugin for the desired user. Here's the command: ALTER USER 'your_user'@'localhost' IDENTIFIED WITH mysql_native_password BY 'your_password'; 3. Thats it. Run the command again and after like 2 minutes of scrolling and generating shit, it will complete the beeline thing.

@haiduongnguyen883 5 месяцев назад

Hello, Thanks you for the series, but I cant get the source code by following the link. Can you give me anothor link such as github?

@muhammedjasir4671 6 месяцев назад

Thanks Brother, worked fine :)

@DataMaking 6 месяцев назад

You are welcome brother. Appreciate for trying out and providing your feedback.

@WilliamKibirango 6 месяцев назад

Thanks for the video, and for being human! Keep making data!

@DataMaking 6 месяцев назад

Thanks, will do!

@pratapmajumder8630 6 месяцев назад

Hi, I am not able to sign up into your website ,no otp is coming in mail.

@DataMaking 6 месяцев назад

Hi, sorry for the inconveniences. Are you getting any issues or error, please share with me at indiacloudtv@gmail.com Meanwhile I will also try to check it.

@pratapmajumder8630 6 месяцев назад

@@DataMaking OTP came now, it worked today many thanks,

@flosrv3194 6 месяцев назад

I signed up to your website to get the dataaset, but i never received the confirmation email...

@DataMaking 6 месяцев назад

Hi, sorry, not sure why that behavior, please reach out to me at datamaking.training@gmail.com for further assistance. Thank you.

@omghorpade99 7 месяцев назад

please share that xml code

@DataMaking 6 месяцев назад

Hi, you should be able to get it if you sign up and sign in here: datamaking.com/download/

@purvendrakushvansh311 6 месяцев назад

Getting the error after verbose please resolve it make a new video @@DataMaking

@HungNguyen-hf8dq 7 месяцев назад

Is there any problem if streaming data is continuously pushed to the database?

@shobhitupadhyay4711 7 месяцев назад

NO depends on the storage and how often u need it

@DataMaking 6 месяцев назад

It is also depends on what output mode you are using: spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes Complete mode is bound to break if you data is growing.

@НикитаУколов-в7л 7 месяцев назад

Short and very helpfull. Thank u dude!

@DataMaking 7 месяцев назад

Thank you for showing interest on my technical content and your feedback, appreciate it. Thank you 💐🙏

@harshjain4256 7 месяцев назад

Can you provide the names of all jar files with their versions which you used in config of saprksession

@Florianengineering 7 месяцев назад

I clicked on the ressources link and it gives page not found. Did you stop your project?

@lavineawinja5579 7 месяцев назад

Finally able to install it, thank you🎉👏

@DataMaking 6 месяцев назад

Thank you are welcome.