Next videos in this series: Automate ETL Pipeline (Airflow): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html ETL Load Reference Data: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-W-8tEFAWD5A.html ETL Incremental Data Load (Source Change Detection): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-32ErvH_m_no.html&t ETL Incremental Data Load (Destination Change Comparison): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-a_T8xRaCO60.html How to connect to SQL Server via Python: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t How to connect to Postgres via Python: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-h-u1ML-OWok.html
I'm not sure how I found your site, but holly cow this is just fantastic. I appreciate all the work you have put into this and I will enjoy learing from your examples!!!
I’m glad you liked it. Will try and come up with a real-time scenario. If you re interested in data streaming (continuous data flow) then I check out my video on Kafka: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gPvwvkCVSnY.html
Hello! That's a great explanation, thanks! Please tell me how the data transfer is carried out? Are we using the RAM of the server where Python is installed or are we using the RAM of the server where PostgreSQL is installed? I want to understand if this scenario is suitable if there is a table with 30 million rows on the SQL Server side?
Hey there, in this use we utilized Pandas and it loads the data in memory of the server where Python is installed. So you would need to make sure either data fits in the server's memory/load in batches or use the chunking strategy to load your data. Hope this helps.
amazing Video, I followed all but I had this error and could not find solution Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]Neither DSN nor SERVER keyword supplied (0) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute (0)') Error while extracting data: local variable 'src_conn' referenced before assignment
Hi Nabil, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Hope this helps.
Yes, you can achieve this using the Source Change Detection technique. I have covered the incremental data load in the following videos. Feel free to check them out. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-32ErvH_m_no.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-a_T8xRaCO60.html
Thank you for the video! I would appreciate it if answer to my question. Why this method is better than traditional approach with ETL tools like SSIS, IBM DataStage, SAP Data Services. What I can make with Python ETL which I can't with other tools. Could you please give me some examples.
Glad you like the content. Oh boy where to start… anyone who dealt with traditional tools and mapping columns manually, casting the data formats to traditional tools formats and to pick up new updates in the source will tell you that this solution handles all of these challenges gracefully! Try developing a similar solution in one of those tools and you ll see!
This is cool! But isn't this more of an Extract and Load process than Extract, Transform and Load process? Most of the time while creating the Data Marts transform scripts is a heck and long. Also in the same scenario the Load procedure is mainly for insert query of records than tables. Nevertheless good video!
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
why we used tbl[0] ? in for tbl in src_tables: #query and load save data to dataframe df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) load(df, tbl[0]) Shouldn't we use tbl only? for current reference?
Hi Tejas, you can use above approach but you'd need to perform further action to get the actual value. The "tbl" is a pyodc.row and not a straight forward list. The Row object in pyodbc seems to be a combination of tuple, list and dict objects.
hi, it was wonderfull tutorial, it help me a lot for understanding the ETL concept. but i'm struggle to follow your tutorial, it start from the Environment Variables. i can't found the similar variable like in your video in my laptop. i hope you can explain me, how to create that variable, or mybe what the solution to see the UID event without open the environment variables...
You can define the system variable under System > Advance and environment variables. I show the variables and their content at 4:10. They contain your database username and password.
That’s too broad of a question. Is there a specific area you want to optimize? Here are some broad tips to optimize the ETL pipeline. Hope this helps. * Eliminate database Reads/Writes. ... * Cache the Data. ... * Use Parallel Processing. ...look into paper-mill * Filter Unnecessary Datasets. ... * Integrate Only What You Want.
Hi bro, very informative session.. am working on the ETL QA framework creation using pyspark. for that I have to create a directory structure in pycharm.. is there any reference video you have created? Please share
I have covered Pytest recently as a testing framework for data engineering pipeline. You can check out videos on that topic. If you are using Pytest then you can use the following folder structure. Here is the docs for Pytest: docs.pytest.org/en/7.1.x/explanation/goodpractices.html pyproject.toml src/ mypkg/ __init__.py app.py view.py tests/ test_app.py test_view.py
In this same flow is it possible to use excel file as a data source and then load the data in the database then showcase it to a frontend as a report ?
Of course you can. Here is a video on how to load data from excel file(s) to a database. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-W-8tEFAWD5A.html
for tbl in src_tables: #print(tbl) df = pd.read_sql_query(f''select * from {tbl[0]}', src_conn) Hi, I'm getting invalid syntax in the df row. I may not be using the f string correctly, but looking for ideas. Thank you Morgan
Hi Morgan, it seems you have two quotes after the f. f''select * from {tbl[0]}', src_conn Remove one of the quotes and you should be good. Here is the original code.. for tbl in src_tables: #print(tbl) df = pd.read_sql_query(f'select * from {tbl[0]}', src_conn)
Hi again, I used your code from github and changed the name of the server server = "LAPTOP-8SESKAVH\SQLEXPRESS" . I'm still getting: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections I did substitute in AdventureWorksDW2019 Not sure what is going on Morgan Morgan's Attempt # -*- coding: utf-8 -*- """ Created on Sat Nov 5 14:34:16 2022 @author: mtman """ #import needed libraries from sqlalchemy import create_engine import pyodbc import pandas as pd import os #get password from environmnet var pwd = os.environ['PGPASS'] uid = os.environ['PGUID'] #sql db details driver = "{SQL Server Native Client 11.0}" server = "LAPTOP-8SESKAVH\SQLEXPRESS" #"haq-PC" database = "AdventureWorksDW2019;" #extract data from sql server def extract(): try: src_conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server + '\SQLEXPRESS' + ';DATABASE=' + database + ';UID=' + uid + ';PWD=' + pwd) src_cursor = src_conn.cursor() # execute query src_cursor.execute(""" select t.name as table_name from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """) src_tables = src_cursor.fetchall() for tbl in src_tables: #query and load save data to dataframe df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) load(df, tbl[0]) except Exception as e: print("Data extract error: " + str(e)) finally: src_conn.close() #load data to postgres def load(df, tbl): try: rows_imported = 0 engine = create_engine(f'postgresql://{uid}:{pwd}@{server}:5432/AdventureWorks') print(f'importing rows {rows_imported} to {rows_imported + len(df)}... for table {tbl}') # save df to postgres df.to_sql(f'stg_{tbl}', engine, if_exists='replace', index=False) rows_imported += len(df) # add elapsed time to final print out print("Data imported successful") except Exception as e: print("Data load error: " + str(e)) try: #call extract function extract() except Exception as e: print("Error while extracting data: " + str(e)) @@BiInsightsInc
@@mtmanalyst make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Did you create this user in your environment? If not you will need to create it. Here is the script for it. Hope this helps. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/SQL%20Scripts/SQL%20Server/create%20etl%20login%20and%20role%20-%20SQLServer.sql
Data sources are sap hana and sql server. My target table is same sql server. How can I perform upsert. I do have primary key on target table but it might be possible other columns might update in future . Please help
Hi Shrutika, you can use the following video as the guide. In this video we perform upsert based on a primary key column. Happy coding: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-a_T8xRaCO60.html
Really nice explanation. However Im getting the following error: 27: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Any leads to tackle the issue?
Thanks. You need to install the 'psycopg2' module in your environment. Also, we have updated the code to use SQLAlchemy to get rid of the future warning you're seeing in the code. Here is the link: github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
nvm, I managed to solve the issue. it seems that this video is not enough to come up with the whole migration, redirection to other tutorials is needed. Still, I think you have done an amazing job. Thanks!
a really great content!!!!, can you help me with the last step? i am having trouble for the last step Test ETL Pipeline, when using the cmd it says like this" '\postgres' is not recognized as an internal or external command, operable program or batch file." if i am using my C:, but when using D: its "Access is denied." so what can i do to do the last step? thank youuu
his is a very common error "is not recognized as an internal or external command". It comes up when command prompt does not know the location of Python or the script you are executing. First we make sure, is the executable actually installed? If yes, continue with the rest, if not, install it first. If you have any executable which you are attempting to run from cmd.exe then you need to tell cmd.exe where this file is located.
Hello, Sir! Thank you for the great learning material! But I would like to ask help in running the ETL python script. I'm getting an error and it says 'Login failed for user etl'. Could this be because of the permissions?
Thanks. You will need to create the 'etl' user in both of the databases. Here are the SQL scripts to create them. github.com/hnawaz007/pythondataanalysis/tree/main/ETL%20Pipeline/SQL%20Scripts Also, here is the basic video on how to test your connection via Python using the user/password. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t
Thank you for the reply, Sir! Apparently I had to change my authentication method and it solved the problem. Thank you for the additional video as well! @@BiInsightsInc
I have made few videos on how to schedule or trigger your Python ETL scripts. You can schedule them via Windows task scheduler, Airflow or Dagster. Feel free to check them out. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-t8QADtYdWEI.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-IsuAltPOiEw.html&t
Hi,bro. Video so cool I have trouble, can you help me. i installed msodbcsql. Tks you so much Extract error:('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)') Data Extract Error: 'src_connect' is not defined
I have covered how to connect to a SQL Server database using Python. This is a common question that comes up in the ETL series. So, I decided to cover it and direct viewers to it if they are facing this issue: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html
@@BiInsightsInc Hi Friend, I followed the video and the data appeared on Jupyter, but it gave me the new error: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)') Error while extracting data: name 'src_conn' is not defined. Help me !!
THIS IS AMAZING THANKS!!! HELP!!!! 1)I did the same thing and it got importeed into postgres, But i need the column data types to be in specif cformat for example Datetime, VARCHAR, DOUBLE ,INT not these text,bigint etc. Please suggest what should i do!! 2) Data is same but the data types are different, i tried changing the data type in the data frame and created custom columns with CREATE sql query but it doesnt matter as pd.to_sql always replaces that table adn creates a new one. So what should i do? Thank you
Hi Arin, you can define data types for each of the columns in your dataframe in dictionary and while importing set the data type for each column. Here is an example to get you started. type_dict = {'Col_A': 'category', 'Col_B': 'int16', 'Col_C': 'float16', 'Col_D': 'float32'} df = pd.read_csv(myfile, dtype=type_dict)
Hello, I am getting the following error. Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)") Error while extracting data: local variable 'src_conn' referenced before assignment driver is installed and I have done the other troubleshooting steps found in the previous comments.
Please checkout the following video. It goes over the connection setup and common errors. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t In addition, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
@@BiInsightsInc I am getting another error. I even tried copying the script directly from your repo changing only the server variable and Data extract error: ('42000', "[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ':'. (102) (SQLExecDirectW)")
hai friend.. I have problem. in the sql server name is ENGINEERDATA\ENGDATA and when I'm declared variable ='ENGINEERDATA\ENGDATA' in code when I run the scripts always appear pycopg2.operationalError couldn't translate host name "engineerdata\ENGDATA" to address :unknow host. please help me
You can concatenate the server and instance as a text if the slash is causing an error. I have covered the how to connect to SQL Server and common issues while establishing a connection in this video here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html
Could you please explain below part as am getting error like mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'sakila.tables' doesn't exist. what is t and sys here,,,,,am using mysql src_cursor.execute(""" select t.name as table_name from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """)
Hey Siva, t is the alias for the tables and sys is the system schema of the SQL Server. You can find the sys equivalent of the MYSQL and query the table info.
@@BiInsightsInc oh, I have fixed the error, in the target url you have declared the user and password but in the load() function you still continue to option userid with password
I encountered this error while running the file: raise KeyError(key) from None at code 'uid' and 'pwd'. Could you help me with this? Also a very nice video! Thank you
Hi Dyu, script is not able to find the uid and pwd variables. If you do have these as environment variables then simple declare them in your script i.e. uid=“username”. Same for the pwd which is the database password.
@@BiInsightsInc Thanks for your help, I have done with that Error. And I have this new one: Data extract error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)') Error while extracting data: local variable 'src_conn' referenced before assignment. I have changed from SQL server Native Client 11, SQL server and ODBC Driver 17for SQL Server, but none of it help me with this bug. Thank you !
@@duynguyenduc1255 this is one of the common issues. I have covered the causes in this video below. You need to debug your SQL Server connection prior to attempting the etl pipeline. Happy coding. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html
Thanks. The transform step happen after the extract step. I have few examples of the transformations in the following video. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@@avinash7003 I have covered the Data Engineering role in the following video. Tools can vary depending on the Tech Stack the company is using. But I will do a broader video on what are the base requirements for Data Engineering role. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-fwkLcp8dbic.html
hello sir I got some error and I don't get any clue through it after trying other alternative. Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")error Error while extracting data: local variable 'src_conn' referenced before assignment I really appreciate your help!!!!
Hi Noor, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to SQL Server port 1433. Hope this helps.
@@BiInsightsInc After all method executed, I got another error c:\Users\User\Desktop\ETL\etl.py:28: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) importing rows 0 to 606... for table DimProduct Data load error: (psycopg2.OperationalError) connection to server at "LAPTOP-3C9TIKCE" (fe80::d8c0:aa56:7702:d5b5), port 5432 failed: FATAL: no pg_hba.conf entry for host "fe80::d8c0:aa56:7702:d5b5%19", user "etl", database "AdventureWorks", no encryption
@@nooraliikhwan8139 you need to edit your pg_hba.conf file on the machine. It is located on the following directory on window: C:\Program Files\PostgreSQL\14\data. Add below entries or whichever ones are missing on your pc. # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only local all all md5 # IPv4 local connections: host all all 127.0.0.1/32 trust host all all 0.0.0.0/0 trust # IPv6 local connections: host all all ::1/128 trust host all all 0.0.0.0/0 trust
Hey, great video. I was wondering if this would be a good guide for creating a basic pipeline project for job interviews’s like Data Analysis/Engineering? Thank you.
Yes, this can a good example for a basic pipeline. However, I'd advise to go with the following video as it presents a complete picture of ETL (Extract, Transform and Load). ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@@BiInsightsInc reading data from DB2 and insert to postgres , if in the middle pandas df is to get the result from DB2 and insert into postgres then int in DB2 auto convert to float
Followed the tutorial step by step and I am facing the following error. Even Googling couldn't help. Please help me. Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
Hi Dan, it looks like you're not able to connect to SQL Server. Make sure SQL Server is running you're able to connect to it via SQL Server authentication. Also, you need a SQL Server driver installed: www.microsoft.com/en-us/download/details.aspx?id=36434 I'd suggest the test your connection first to make sure you're able to connect.
@@DhanunjayaSrisailamTTTWill try and cover ETL with modern tools. If you want to setup SQL Server and Postgres then I have covered the installation here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-e5mvoKuV3xs.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-fjYiWXHI7Mo.html
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
I think you need an introduction to ETL and/or and Data Engineering and why we are needed in the world of day. I’d say pick up a book in either subject and you will see the benefits of it. Here an intro to data engineering. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-fwkLcp8dbic.html
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
Hi Anthonius, make sure you have the sql server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to sql server port 1433. Hope this helps.
@@isbakhullail6693 here is a stack overflow’s link that shows how to determine If SQL Server’s client is installed. stackoverflow.com/questions/10499643/check-if-sql-server-client-is-installed
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)') Error while extracting data: local variable 'src_conn' referenced before assignment
I have done a video on this topic with Airflow. Feel free to check it out! How to build and automate your Python ETL pipeline with Airflow | Data pipeline | Python ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
im having a new problem is that, Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 18 for SQL Server]Named Pipes Provider: Could not open a connection to SQL Server [67]. (67) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to CALVIN/SQLEXPRESS. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (67)') what do i do?
I have covered the commom SQL Server connection issues in this video here. With so many questions on this topic I had to cover it separately. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t
Yes, there are various tools out there that will convert MySQL to Postgres syntax. There is online tool you can use. It will allow you to select source and target. Then there is mysql2postgres tool. If you're dealing with table DDL and want to convert MySQL data types to Postgres then you can write a simple script to swap the data types. www.sqlines.com/online github.com/maxlapshin/mysql2postgres
importing rows 0 to 10... for table data load error: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")... (I am using sql server)
You are selecting from "sqlite_master" that's not the system schema in SQL Server. The SQL Server has a "sys" schema and table that stores table the information is called tables. Please refer to the repo and copy the script from there. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@@BiInsightsInc def load(df, tbl): try: rows_imported = 0 engine = create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;') con = engine.connnect() print(f'importing rows {rows_imported} to {rows_imported+len(df)}... for table {tbl}') df.to_sql(f'sys_{tbl}',con, if_exists='replace',index=False) rows_imported +=len(df) print("Data imported succesfully.") except Exception as e: print("data load error: " + str(e)) try: extract() except Exception as e: print("Error while extracting data: " + str) This is are my configs for loading to sql server but im still getting the error
@@skipa9906 your connection details do not seems to be complete. If you run only the following line: create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;') This will throw an error. Here is what I used to successfully connect and persist data to SQL Server. #user and password pwd = 'demopass' uid = 'etl' #sql db details dr = "SQL Server Native Client 11.0" srvr = "localhost\SQLEXPRESS" db = "AdventureWorksDW2019" engine = create_engine(f"mssql+pyodbc://{uid}:{pwd}@{srvr}:1433/{db}?driver={dr}") df.to_sql("customer_retention", engine ,if_exists='replace', index=False)