Subbed and Thank you A TON for helping me personally to set up Spark in my PC , you went an extra mile to get it for me ( even though you don't know me ).. Kudos again.
Thanks for you explanation. BUt I'm getting below error can you please help me ERROR FileFormatWriter: Aborting job.................. raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o32.csv.
Thanks for the tutorial, I'm getting this error saying that "py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM".is there any way to over come?
Hi, I followed your steps but getting below error. ModuleNotFoundError : No module name 'pyspark.sql'; pyspark is not a package and also added python lib folder and logfile in python structure-- add content root software versions are pycharm - 2019.3.1 python - 3.8 spark 3.0.0 I tried all possible option but no luck. can you please help me . Note : i am able to run pyspark using CMD prompt
Bom dia. Pode me ajudar com esse ERROR? Desde já agradeço. ============================================================================================== C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\Scripts\python.exe C:\Users\prsan\PycharmProjects\pjtLotofacil\src\main.py Até aqui nos ajudou o Senhor! Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Traceback (most recent call last): File "C:\Users\prsan\PycharmProjects\pjtLotofacil\src\main.py", line 5, in import persiste_dados File "C:\Users\prsan\PycharmProjects\pjtLotofacil\src\persiste_dados.py", line 11, in .getOrCreate() File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\pyspark\sql\session.py", line 272, in getOrCreate session = SparkSession(sc, options=self._options) File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\pyspark\sql\session.py", line 307, in __init__ jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options) File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\py4j\java_gateway.py", line 1585, in __call__ return_value = get_return_value( File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\py4j\protocol.py", line 330, in get_return_value raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace: py4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179) at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196) at py4j.Gateway.invoke(Gateway.java:237) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Unknown Source) Process finished with exit code 1
@@stream2learn ======================================= In Machine ======================================= C:\Users\prsan>pyspark --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.3 /_/ Using Scala version 2.12.15, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_102 Branch HEAD Compiled by user sunchao on 2022-11-14T17:20:20Z Revision b53c341e0fefbb33d115ab630369a18765b7763d Url github.com/apache/spark Type --help for more information. ============================================================ In PyCharm ============================================================ 3.3.1 I must then switch to 3.2.3. Thanks
It worked! =========================================================== The enemy is now another! Help me please! Thank you very much is advance ========================================================== C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\Scripts\python.exe C:\Users\prsan\PycharmProjects\pjtLotofacil\src\main.py Até aqui nos ajudou o Senhor! Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Até aqui nos ajudou o Senhor! Traceback (most recent call last): File "C:\Users\prsan\PycharmProjects\pjtLotofacil\src\main.py", line 5, in import persiste_dados File "C:\Users\prsan\PycharmProjects\pjtLotofacil\src\persiste_dados.py", line 21, in .load() File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\pyspark\sql eadwriter.py", line 164, in load return self._df(self._jreader.load()) File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\py4j\java_gateway.py", line 1321, in __call__ return_value = get_return_value( File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\pyspark\sql\utils.py", line 111, in deco return f(*a, **kw) File "C:\Users\prsan\PycharmProjects\ambientes_virtuais\venvDataScience\lib\site-packages\py4j\protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o34.load. : java.sql.SQLException: No suitable driver at java.sql.DriverManager.getDriver(Unknown Source) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$2(JDBCOptions.scala:107) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:107) at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:39) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:33) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Unknown Source) Process finished with exit code 1