Тёмный

How to build an ETL pipeline with Python | Data pipeline | Export from SQL Server to PostgreSQL 

BI Insights Inc
Подписаться 14 тыс.
Просмотров 187 тыс.
50% 1

Опубликовано:

 

26 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 203   
@BiInsightsInc
@BiInsightsInc 2 года назад
Next videos in this series: Automate ETL Pipeline (Airflow): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html ETL Load Reference Data: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-W-8tEFAWD5A.html ETL Incremental Data Load (Source Change Detection): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-32ErvH_m_no.html&t ETL Incremental Data Load (Destination Change Comparison): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-a_T8xRaCO60.html How to connect to SQL Server via Python: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t How to connect to Postgres via Python: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-h-u1ML-OWok.html
@js913
@js913 2 года назад
But you table data type is text by default. How do you dynamically define that ?
@severtone263
@severtone263 2 года назад
My kind of guy! No frills or thrills, just straight to the point.
@seedaccount3217
@seedaccount3217 Год назад
I am fortunate to find this channel on the RU-vid ocean.... Can't thank you enough mate🙏🙏
@katy807
@katy807 2 года назад
So easy tutorial with clear insight. Please keep every tutorial so easy !
@flyingdutchman9804
@flyingdutchman9804 Год назад
Fantastic videos, short and to the point! Great work, thank you for sharing!
@afk4dyz
@afk4dyz Месяц назад
Password as an environment variable is an absolute game changer.
@l.kennethwells2138
@l.kennethwells2138 9 месяцев назад
I'm not sure how I found your site, but holly cow this is just fantastic. I appreciate all the work you have put into this and I will enjoy learing from your examples!!!
@BiInsightsInc
@BiInsightsInc 9 месяцев назад
Glad you found the channel and thanks for the kind words. Welcome to the community. Happy coding!
@aparab82
@aparab82 Год назад
Thanks very much for your invaluable contribution towards eternal learning process.
@deanmarucci4467
@deanmarucci4467 Год назад
Such an easy to follow video, fantastic tutorial!
@GopalSharma-mv9jy
@GopalSharma-mv9jy 4 месяца назад
Lovely. short and crisp and to the point. amazing video
@BiInsightsInc
@BiInsightsInc 4 месяца назад
Glad you liked it!
@hsk7715
@hsk7715 7 месяцев назад
i am aspiring data engineer this video is really helpful
@nicolposthumus4243
@nicolposthumus4243 Год назад
Thank you for this please continue to make these informative videos, you are a very good illustrator
@BoubacarBilloBah-j5g
@BoubacarBilloBah-j5g 7 месяцев назад
Fantastic! Thanks a lot for all these series of videos
@kai13man
@kai13man 2 года назад
Fantastic stuff, you really filled in some unknown gaps for me. Thanks.
@essickallen2133
@essickallen2133 2 года назад
Great tutorial, thanks for the information.
@shiblyaziz4528
@shiblyaziz4528 Год назад
Exactly what I was looking for.Very well done! Thank you
@anandagarwal291
@anandagarwal291 2 года назад
Thanks for such an informative video
@danielvoss2483
@danielvoss2483 3 месяца назад
Great job, keep going 👍
@gethorsegh
@gethorsegh 8 месяцев назад
great information, thankyou
@mukulraj6184
@mukulraj6184 Год назад
Perfect explanation. 👍👍
@pspointssara8472
@pspointssara8472 2 года назад
Excellent presentation. Keep it up.
@mdhasnat8395
@mdhasnat8395 2 года назад
Awesome! Thanks you so much for sharing your knowledge. Please keep it up.
@SUMITKUMAR-xu8zr
@SUMITKUMAR-xu8zr 26 дней назад
Very useful
@Ferruccio_Guicciardi
@Ferruccio_Guicciardi Год назад
Thanks for sharing 😀
@rafaelg8238
@rafaelg8238 2 года назад
Great video. Congrats
@grahamlindsay9798
@grahamlindsay9798 Год назад
I'm impressed!
@vedanthasm2659
@vedanthasm2659 2 года назад
fantastic video. Highly appreciated!. On request, please make a video on Real time Data Engineering projects!. No good content in RU-vid
@BiInsightsInc
@BiInsightsInc 2 года назад
I’m glad you liked it. Will try and come up with a real-time scenario. If you re interested in data streaming (continuous data flow) then I check out my video on Kafka: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gPvwvkCVSnY.html
@data_dave
@data_dave Год назад
great great video
@ThuanNguyen-yl4qd
@ThuanNguyen-yl4qd 7 месяцев назад
thank you very much for this!!!
@kumudinigandhesiri7149
@kumudinigandhesiri7149 23 дня назад
Thankyou so much...!
@VictorHugo-bd3bf
@VictorHugo-bd3bf 2 года назад
Fantastic!, Thank you so much.
@adilsaju
@adilsaju 2 месяца назад
AMAZING
@ashishshethia6549
@ashishshethia6549 6 месяцев назад
Great Video
@BiInsightsInc
@BiInsightsInc 6 месяцев назад
Glad you enjoyed it
@mateenfoster4595
@mateenfoster4595 Год назад
best tutorial on youtube. Do you have any courses?
@ivanl7786
@ivanl7786 Месяц назад
Hello! That's a great explanation, thanks! Please tell me how the data transfer is carried out? Are we using the RAM of the server where Python is installed or are we using the RAM of the server where PostgreSQL is installed? I want to understand if this scenario is suitable if there is a table with 30 million rows on the SQL Server side?
@BiInsightsInc
@BiInsightsInc Месяц назад
Hey there, in this use we utilized Pandas and it loads the data in memory of the server where Python is installed. So you would need to make sure either data fits in the server's memory/load in batches or use the chunking strategy to load your data. Hope this helps.
@nabilrahmouni297
@nabilrahmouni297 2 года назад
amazing Video, I followed all but I had this error and could not find solution Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]Neither DSN nor SERVER keyword supplied (0) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute (0)') Error while extracting data: local variable 'src_conn' referenced before assignment
@BiInsightsInc
@BiInsightsInc 2 года назад
Hi Nabil, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Hope this helps.
@NobixLee
@NobixLee 2 года назад
Top tier staff this!!!
@CookieMonsteeerrr
@CookieMonsteeerrr 2 года назад
is there possibility to only fetch records which has been modified on the source and update only them on destination side?
@BiInsightsInc
@BiInsightsInc 2 года назад
Yes, you can achieve this using the Source Change Detection technique. I have covered the incremental data load in the following videos. Feel free to check them out. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-32ErvH_m_no.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-a_T8xRaCO60.html
@ВенциславКолев-с8р
Thank you for the video! I would appreciate it if answer to my question. Why this method is better than traditional approach with ETL tools like SSIS, IBM DataStage, SAP Data Services. What I can make with Python ETL which I can't with other tools. Could you please give me some examples.
@BiInsightsInc
@BiInsightsInc 2 года назад
Glad you like the content. Oh boy where to start… anyone who dealt with traditional tools and mapping columns manually, casting the data formats to traditional tools formats and to pick up new updates in the source will tell you that this solution handles all of these challenges gracefully! Try developing a similar solution in one of those tools and you ll see!
@kinelekaite3878
@kinelekaite3878 2 года назад
great stuff thanks
@CodingByAmp
@CodingByAmp 2 года назад
thankyou bro.
@oberdansantos5802
@oberdansantos5802 2 года назад
Explicação muito boa. Vc tem algum tutorial extraindo dados de um ERP?
@Emotekofficial
@Emotekofficial Год назад
This is cool! But isn't this more of an Extract and Load process than Extract, Transform and Load process? Most of the time while creating the Data Marts transform scripts is a heck and long. Also in the same scenario the Load procedure is mainly for insert query of records than tables. Nevertheless good video!
@BiInsightsInc
@BiInsightsInc Год назад
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@TejasKatba
@TejasKatba Год назад
why we used tbl[0] ? in for tbl in src_tables: #query and load save data to dataframe df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) load(df, tbl[0]) Shouldn't we use tbl only? for current reference?
@BiInsightsInc
@BiInsightsInc Год назад
Hi Tejas, you can use above approach but you'd need to perform further action to get the actual value. The "tbl" is a pyodc.row and not a straight forward list. The Row object in pyodbc seems to be a combination of tuple, list and dict objects.
@gralleg9634
@gralleg9634 Год назад
Thanks a lot !
@nicolposthumus4243
@nicolposthumus4243 Год назад
Thanks
@royranggarofiulazmi9708
@royranggarofiulazmi9708 9 месяцев назад
hi, it was wonderfull tutorial, it help me a lot for understanding the ETL concept. but i'm struggle to follow your tutorial, it start from the Environment Variables. i can't found the similar variable like in your video in my laptop. i hope you can explain me, how to create that variable, or mybe what the solution to see the UID event without open the environment variables...
@BiInsightsInc
@BiInsightsInc 9 месяцев назад
You can define the system variable under System > Advance and environment variables. I show the variables and their content at 4:10. They contain your database username and password.
@arunasingh8617
@arunasingh8617 2 года назад
Nice video! I have a question, how can we optimize the ETL process while creating extract and load methods in notebook itself?
@BiInsightsInc
@BiInsightsInc 2 года назад
That’s too broad of a question. Is there a specific area you want to optimize? Here are some broad tips to optimize the ETL pipeline. Hope this helps. * Eliminate database Reads/Writes. ... * Cache the Data. ... * Use Parallel Processing. ...look into paper-mill * Filter Unnecessary Datasets. ... * Integrate Only What You Want.
@anpham7108
@anpham7108 2 года назад
Thank you !
@balajisivam1875
@balajisivam1875 Год назад
Hi bro, very informative session.. am working on the ETL QA framework creation using pyspark. for that I have to create a directory structure in pycharm.. is there any reference video you have created? Please share
@BiInsightsInc
@BiInsightsInc Год назад
I have covered Pytest recently as a testing framework for data engineering pipeline. You can check out videos on that topic. If you are using Pytest then you can use the following folder structure. Here is the docs for Pytest: docs.pytest.org/en/7.1.x/explanation/goodpractices.html pyproject.toml src/ mypkg/ __init__.py app.py view.py tests/ test_app.py test_view.py
@snehithjshiju1332
@snehithjshiju1332 11 месяцев назад
Hi this video was useful, can u make a video on Data migration from Mongodb to mysql using apache airflow with an example
@stormjay1810
@stormjay1810 10 месяцев назад
Hi, Please is there any video or how do i change my directory to C:\postgres> Thanks
@BiInsightsInc
@BiInsightsInc 10 месяцев назад
You can type the following command in command prompt: cd c:\postgres
@AbhinavPandey-dt8pn
@AbhinavPandey-dt8pn 5 месяцев назад
In this same flow is it possible to use excel file as a data source and then load the data in the database then showcase it to a frontend as a report ?
@BiInsightsInc
@BiInsightsInc 5 месяцев назад
Of course you can. Here is a video on how to load data from excel file(s) to a database. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-W-8tEFAWD5A.html
@AbhinavPandey-dt8pn
@AbhinavPandey-dt8pn 5 месяцев назад
@@BiInsightsInc do provide consultancy ? i have few doubts regarding postgres and ETL.
@mtmanalyst
@mtmanalyst Год назад
for tbl in src_tables: #print(tbl) df = pd.read_sql_query(f''select * from {tbl[0]}', src_conn) Hi, I'm getting invalid syntax in the df row. I may not be using the f string correctly, but looking for ideas. Thank you Morgan
@BiInsightsInc
@BiInsightsInc Год назад
Hi Morgan, it seems you have two quotes after the f. f''select * from {tbl[0]}', src_conn Remove one of the quotes and you should be good. Here is the original code.. for tbl in src_tables: #print(tbl) df = pd.read_sql_query(f'select * from {tbl[0]}', src_conn)
@mtmanalyst
@mtmanalyst Год назад
Hi, Thank you for the note! I'll give it a try tomorrow AM. Great course Morgan
@mtmanalyst
@mtmanalyst Год назад
Hi again, I used your code from github and changed the name of the server server = "LAPTOP-8SESKAVH\SQLEXPRESS" . I'm still getting: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections I did substitute in AdventureWorksDW2019 Not sure what is going on Morgan Morgan's Attempt # -*- coding: utf-8 -*- """ Created on Sat Nov 5 14:34:16 2022 @author: mtman """ #import needed libraries from sqlalchemy import create_engine import pyodbc import pandas as pd import os #get password from environmnet var pwd = os.environ['PGPASS'] uid = os.environ['PGUID'] #sql db details driver = "{SQL Server Native Client 11.0}" server = "LAPTOP-8SESKAVH\SQLEXPRESS" #"haq-PC" database = "AdventureWorksDW2019;" #extract data from sql server def extract(): try: src_conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server + '\SQLEXPRESS' + ';DATABASE=' + database + ';UID=' + uid + ';PWD=' + pwd) src_cursor = src_conn.cursor() # execute query src_cursor.execute(""" select t.name as table_name from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """) src_tables = src_cursor.fetchall() for tbl in src_tables: #query and load save data to dataframe df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) load(df, tbl[0]) except Exception as e: print("Data extract error: " + str(e)) finally: src_conn.close() #load data to postgres def load(df, tbl): try: rows_imported = 0 engine = create_engine(f'postgresql://{uid}:{pwd}@{server}:5432/AdventureWorks') print(f'importing rows {rows_imported} to {rows_imported + len(df)}... for table {tbl}') # save df to postgres df.to_sql(f'stg_{tbl}', engine, if_exists='replace', index=False) rows_imported += len(df) # add elapsed time to final print out print("Data imported successful") except Exception as e: print("Data load error: " + str(e)) try: #call extract function extract() except Exception as e: print("Error while extracting data: " + str(e)) @@BiInsightsInc
@BiInsightsInc
@BiInsightsInc Год назад
@@mtmanalyst make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Did you create this user in your environment? If not you will need to create it. Here is the script for it. Hope this helps. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/SQL%20Scripts/SQL%20Server/create%20etl%20login%20and%20role%20-%20SQLServer.sql
@mtmanalyst
@mtmanalyst Год назад
Thank you I'll get on this. Morgan
@ashwinrawat9622
@ashwinrawat9622 4 месяца назад
4:17 you have your gmail password exposed
@BiInsightsInc
@BiInsightsInc 4 месяца назад
Thanks for pointing that out. Much appreciated. It has been updated:)
@ranvijaymehta
@ranvijaymehta Год назад
Thank you so much
@motamarriajaykumar8885
@motamarriajaykumar8885 Год назад
If my destination is Dimentions and fact table then I need to write separate method for table or is there any other way
@BiInsightsInc
@BiInsightsInc Год назад
Yes, you need to perform subsequent transformation steps after extracting data to shape to fit your target tables.
@abnormalirfan
@abnormalirfan 2 года назад
Please tutorial sql server to bigquery with airflow
@BiInsightsInc
@BiInsightsInc 2 года назад
Hi Irfan, I am planning to do more videos on Airflow. Will cover Airflow to BigQuery as well. Stay tuned.
@irfanalfathramadhan4797
@irfanalfathramadhan4797 2 года назад
@@BiInsightsInc thankyou sir, i wait for that
@CaribouDataScience
@CaribouDataScience Год назад
Why not use the SQL functions in pandas?
@BiInsightsInc
@BiInsightsInc Год назад
You can use SQL functions. However, I am extracting and loading data therefore, there is no need for them in this context.
@shutzzzzzz
@shutzzzzzz Год назад
Data sources are sap hana and sql server. My target table is same sql server. How can I perform upsert. I do have primary key on target table but it might be possible other columns might update in future . Please help
@BiInsightsInc
@BiInsightsInc Год назад
Hi Shrutika, you can use the following video as the guide. In this video we perform upsert based on a primary key column. Happy coding: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-a_T8xRaCO60.html
@KESHA1982
@KESHA1982 Год назад
Thanks! Subscribe and like buttons done.
@jorozcobe
@jorozcobe 8 месяцев назад
Really nice explanation. However Im getting the following error: 27: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Any leads to tackle the issue?
@BiInsightsInc
@BiInsightsInc 8 месяцев назад
Thanks. You need to install the 'psycopg2' module in your environment. Also, we have updated the code to use SQLAlchemy to get rid of the future warning you're seeing in the code. Here is the link: github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@jorozcobe
@jorozcobe 8 месяцев назад
@@BiInsightsInc that works but now I'm getting a new error: importing rows 0 to 606... for table DimProduct Data load error: (psycopg2.errors.InsufficientPrivilege) permission denied for schema public LINE 2: CREATE TABLE "stg_DimProduct" ( ^ [SQL: CREATE TABLE "stg_DimProduct" ( "ProductKey" BIGINT, "ProductAlternateKey" TEXT, "ProductSubcategoryKey" FLOAT(53), "WeightUnitMeasureCode" TEXT, "SizeUnitMeasureCode" TEXT, "EnglishProductName" TEXT, "SpanishProductName" TEXT, "FrenchProductName" TEXT, "StandardCost" FLOAT(53), "FinishedGoodsFlag" BOOLEAN, "Color" TEXT, "SafetyStockLevel" BIGINT, "ReorderPoint" BIGINT, "ListPrice" FLOAT(53), "Size" TEXT, "SizeRange" TEXT, "Weight" FLOAT(53), "DaysToManufacture" BIGINT, "ProductLine" TEXT, "DealerPrice" FLOAT(53), "Class" TEXT, "Style" TEXT, "ModelName" TEXT, "LargePhoto" TEXT, "EnglishDescription" TEXT, "FrenchDescription" TEXT, "ChineseDescription" TEXT, "ArabicDescription" TEXT, "HebrewDescription" TEXT, "ThaiDescription" TEXT, "GermanDescription" TEXT, "JapaneseDescription" TEXT, "TurkishDescription" TEXT, "StartDate" TIMESTAMP WITHOUT TIME ZONE, "EndDate" TIMESTAMP WITHOUT TIME ZONE, "Status" TEXT ) Any workaround for this?
@jorozcobe
@jorozcobe 8 месяцев назад
nvm, I managed to solve the issue. it seems that this video is not enough to come up with the whole migration, redirection to other tutorials is needed. Still, I think you have done an amazing job. Thanks!
@calvinallen8425
@calvinallen8425 11 месяцев назад
a really great content!!!!, can you help me with the last step? i am having trouble for the last step Test ETL Pipeline, when using the cmd it says like this" '\postgres' is not recognized as an internal or external command, operable program or batch file." if i am using my C:, but when using D: its "Access is denied." so what can i do to do the last step? thank youuu
@BiInsightsInc
@BiInsightsInc 11 месяцев назад
his is a very common error "is not recognized as an internal or external command". It comes up when command prompt does not know the location of Python or the script you are executing. First we make sure, is the executable actually installed? If yes, continue with the rest, if not, install it first. If you have any executable which you are attempting to run from cmd.exe then you need to tell cmd.exe where this file is located.
@md.khademulaminsohel2892
@md.khademulaminsohel2892 6 месяцев назад
I have also the same problem. I have python installed and gave the directory but it doesn’t work. What should i do
@hellowillow_YT
@hellowillow_YT 8 месяцев назад
Hello, Sir! Thank you for the great learning material! But I would like to ask help in running the ETL python script. I'm getting an error and it says 'Login failed for user etl'. Could this be because of the permissions?
@BiInsightsInc
@BiInsightsInc 8 месяцев назад
Thanks. You will need to create the 'etl' user in both of the databases. Here are the SQL scripts to create them. github.com/hnawaz007/pythondataanalysis/tree/main/ETL%20Pipeline/SQL%20Scripts Also, here is the basic video on how to test your connection via Python using the user/password. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t
@hellowillow_YT
@hellowillow_YT 8 месяцев назад
Thank you for the reply, Sir! Apparently I had to change my authentication method and it solved the problem. Thank you for the additional video as well! @@BiInsightsInc
@stocktradingeducationthrou6750
@stocktradingeducationthrou6750 2 года назад
wonderful bro..but how to trigger these code automatically on some condition, pls make vedio for schedulling also using some shell script etc
@BiInsightsInc
@BiInsightsInc 2 года назад
I have made few videos on how to schedule or trigger your Python ETL scripts. You can schedule them via Windows task scheduler, Airflow or Dagster. Feel free to check them out. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-t8QADtYdWEI.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-IsuAltPOiEw.html&t
@linhtuan8603
@linhtuan8603 Год назад
Hi,bro. Video so cool I have trouble, can you help me. i installed msodbcsql. Tks you so much Extract error:('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)') Data Extract Error: 'src_connect' is not defined
@BiInsightsInc
@BiInsightsInc Год назад
I have covered how to connect to a SQL Server database using Python. This is a common question that comes up in the ETL series. So, I decided to cover it and direct viewers to it if they are facing this issue: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html
@linhtuan8603
@linhtuan8603 Год назад
@@BiInsightsInc Hi Friend, I followed the video and the data appeared on Jupyter, but it gave me the new error: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)') Error while extracting data: name 'src_conn' is not defined. Help me !!
@arinbose5778
@arinbose5778 Год назад
THIS IS AMAZING THANKS!!! HELP!!!! 1)I did the same thing and it got importeed into postgres, But i need the column data types to be in specif cformat for example Datetime, VARCHAR, DOUBLE ,INT not these text,bigint etc. Please suggest what should i do!! 2) Data is same but the data types are different, i tried changing the data type in the data frame and created custom columns with CREATE sql query but it doesnt matter as pd.to_sql always replaces that table adn creates a new one. So what should i do? Thank you
@BiInsightsInc
@BiInsightsInc Год назад
Hi Arin, you can define data types for each of the columns in your dataframe in dictionary and while importing set the data type for each column. Here is an example to get you started. type_dict = {'Col_A': 'category', 'Col_B': 'int16', 'Col_C': 'float16', 'Col_D': 'float32'} df = pd.read_csv(myfile, dtype=type_dict)
@ooossssss
@ooossssss Год назад
Hello, I am getting the following error. Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)") Error while extracting data: local variable 'src_conn' referenced before assignment driver is installed and I have done the other troubleshooting steps found in the previous comments.
@BiInsightsInc
@BiInsightsInc Год назад
Please checkout the following video. It goes over the connection setup and common errors. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t In addition, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
@ooossssss
@ooossssss Год назад
@@BiInsightsInc I am getting another error. I even tried copying the script directly from your repo changing only the server variable and Data extract error: ('42000', "[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ':'. (102) (SQLExecDirectW)")
@DiogoSBastos
@DiogoSBastos Год назад
txs man!!!
@ihab6796
@ihab6796 Год назад
hai friend.. I have problem. in the sql server name is ENGINEERDATA\ENGDATA and when I'm declared variable ='ENGINEERDATA\ENGDATA' in code when I run the scripts always appear pycopg2.operationalError couldn't translate host name "engineerdata\ENGDATA" to address :unknow host. please help me
@BiInsightsInc
@BiInsightsInc Год назад
You can concatenate the server and instance as a text if the slash is causing an error. I have covered the how to connect to SQL Server and common issues while establishing a connection in this video here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html
@ihab6796
@ihab6796 Год назад
@@BiInsightsInc thank you so much friend
@nadeschdat5858
@nadeschdat5858 Год назад
Is this also working if ODBC Driver will be used?
@BiInsightsInc
@BiInsightsInc Год назад
Yes, you can use ODBC driver to connect to SQL server therefore, you can use this approach to read and load data.
@sivahanuman4466
@sivahanuman4466 Год назад
Could you please explain below part as am getting error like mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'sakila.tables' doesn't exist. what is t and sys here,,,,,am using mysql src_cursor.execute(""" select t.name as table_name from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """)
@BiInsightsInc
@BiInsightsInc Год назад
Hey Siva, t is the alias for the tables and sys is the system schema of the SQL Server. You can find the sys equivalent of the MYSQL and query the table info.
@sivahanuman4466
@sivahanuman4466 Год назад
@@BiInsightsInc Thanks for the reply sir let me check
@motamarriajaykumar8885
@motamarriajaykumar8885 Год назад
Hi sir, Can you make a video how to migrate data from SQL server to snowflake
@BiInsightsInc
@BiInsightsInc Год назад
Hello Motamarri, thanks for stopping by. Will try and cover Snowflake ❄️ soon.
@vandc1684
@vandc1684 5 месяцев назад
rg.postgresql.util.PSQLException: ERROR: permission denied for schema public I run exact function but I encounter this error, please help me :((
@BiInsightsInc
@BiInsightsInc 5 месяцев назад
Grant your user permission on the schema with following statement. GRANT USAGE ON SCHEMA public TO your_user;
@vandc1684
@vandc1684 5 месяцев назад
@@BiInsightsInc oh, I have fixed the error, in the target url you have declared the user and password but in the load() function you still continue to option userid with password
@BiInsightsInc
@BiInsightsInc 5 месяцев назад
@@vandc1684 That's great. There are two databases (source and target) and we connect to both therefore, two connections.
@vandc1684
@vandc1684 5 месяцев назад
@@BiInsightsInc I know, thanks for your pj, have a nice day
@duynguyenduc1255
@duynguyenduc1255 Год назад
I encountered this error while running the file: raise KeyError(key) from None at code 'uid' and 'pwd'. Could you help me with this? Also a very nice video! Thank you
@BiInsightsInc
@BiInsightsInc Год назад
Hi Dyu, script is not able to find the uid and pwd variables. If you do have these as environment variables then simple declare them in your script i.e. uid=“username”. Same for the pwd which is the database password.
@duynguyenduc1255
@duynguyenduc1255 Год назад
@@BiInsightsInc Thanks for your help, I have done with that Error. And I have this new one: Data extract error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)') Error while extracting data: local variable 'src_conn' referenced before assignment. I have changed from SQL server Native Client 11, SQL server and ODBC Driver 17for SQL Server, but none of it help me with this bug. Thank you !
@BiInsightsInc
@BiInsightsInc Год назад
@@duynguyenduc1255 this is one of the common issues. I have covered the causes in this video below. You need to debug your SQL Server connection prior to attempting the etl pipeline. Happy coding. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html
@ernestmukaru1070
@ernestmukaru1070 Год назад
So cool!! where does transform happen?
@BiInsightsInc
@BiInsightsInc Год назад
Thanks. The transform step happen after the extract step. I have few examples of the transformations in the following video. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@ernestmukaru1070
@ernestmukaru1070 Год назад
@@BiInsightsInc thank you!
@avinash7003
@avinash7003 Год назад
What is the ETL tool you used data injection and extraction?
@BiInsightsInc
@BiInsightsInc Год назад
I am using Python to extract and load data. The IDE used is Pycharm.
@avinash7003
@avinash7003 Год назад
@@BiInsightsInc can you make video about tools and programing required for Data engineer
@BiInsightsInc
@BiInsightsInc Год назад
@@avinash7003 I have covered the Data Engineering role in the following video. Tools can vary depending on the Tech Stack the company is using. But I will do a broader video on what are the base requirements for Data Engineering role. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-fwkLcp8dbic.html
@nooraliikhwan8139
@nooraliikhwan8139 Год назад
hello sir I got some error and I don't get any clue through it after trying other alternative. Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")error Error while extracting data: local variable 'src_conn' referenced before assignment I really appreciate your help!!!!
@BiInsightsInc
@BiInsightsInc Год назад
Hi Noor, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to SQL Server port 1433. Hope this helps.
@nooraliikhwan8139
@nooraliikhwan8139 Год назад
@@BiInsightsInc thank you so much for your speedy reply...I will try it and get back to you later.😄
@nooraliikhwan8139
@nooraliikhwan8139 Год назад
@@BiInsightsInc After all method executed, I got another error c:\Users\User\Desktop\ETL\etl.py:28: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) importing rows 0 to 606... for table DimProduct Data load error: (psycopg2.OperationalError) connection to server at "LAPTOP-3C9TIKCE" (fe80::d8c0:aa56:7702:d5b5), port 5432 failed: FATAL: no pg_hba.conf entry for host "fe80::d8c0:aa56:7702:d5b5%19", user "etl", database "AdventureWorks", no encryption
@BiInsightsInc
@BiInsightsInc Год назад
@@nooraliikhwan8139 you need to edit your pg_hba.conf file on the machine. It is located on the following directory on window: C:\Program Files\PostgreSQL\14\data. Add below entries or whichever ones are missing on your pc. # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only local all all md5 # IPv4 local connections: host all all 127.0.0.1/32 trust host all all 0.0.0.0/0 trust # IPv6 local connections: host all all ::1/128 trust host all all 0.0.0.0/0 trust
@Divine_Serpent_Geh
@Divine_Serpent_Geh Год назад
Hey, great video. I was wondering if this would be a good guide for creating a basic pipeline project for job interviews’s like Data Analysis/Engineering? Thank you.
@BiInsightsInc
@BiInsightsInc Год назад
Yes, this can a good example for a basic pipeline. However, I'd advise to go with the following video as it presents a complete picture of ETL (Extract, Transform and Load). ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@SMCGPRA
@SMCGPRA Год назад
Sometimes int will auto convert to float in pandas and get error
@BiInsightsInc
@BiInsightsInc Год назад
Please share an example of this scenario.
@SMCGPRA
@SMCGPRA Год назад
@@BiInsightsInc reading data from DB2 and insert to postgres , if in the middle pandas df is to get the result from DB2 and insert into postgres then int in DB2 auto convert to float
@BiInsightsInc
@BiInsightsInc Год назад
@@SMCGPRA yes Pandas can convert numbers to float. You can convert it in your transformation layer. So what error it causes?
@SMCGPRA
@SMCGPRA Год назад
@@BiInsightsInc cast back to int will solve it but unnecessary process
@BiInsightsInc
@BiInsightsInc Год назад
You can declare data types for each column and provide it to pandas and it will adhere to your data types.
@DhanunjayaSrisailamTTT
@DhanunjayaSrisailamTTT 2 года назад
Followed the tutorial step by step and I am facing the following error. Even Googling couldn't help. Please help me. Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
@BiInsightsInc
@BiInsightsInc 2 года назад
Hi Dan, it looks like you're not able to connect to SQL Server. Make sure SQL Server is running you're able to connect to it via SQL Server authentication. Also, you need a SQL Server driver installed: www.microsoft.com/en-us/download/details.aspx?id=36434 I'd suggest the test your connection first to make sure you're able to connect.
@DhanunjayaSrisailamTTT
@DhanunjayaSrisailamTTT 2 года назад
@@BiInsightsInc I am very new to SQL server, I am trying to learn ETL with mysql background. Please make a playlist on ETL and ELT.
@BiInsightsInc
@BiInsightsInc 2 года назад
@@DhanunjayaSrisailamTTTWill try and cover ETL with modern tools. If you want to setup SQL Server and Postgres then I have covered the installation here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-e5mvoKuV3xs.html&t ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-fjYiWXHI7Mo.html
@davidcorona644
@davidcorona644 2 года назад
Extracting and Loading are the easiest parts. There's no transforming data in this video.
@BiInsightsInc
@BiInsightsInc 2 года назад
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@Daily_Dose_010
@Daily_Dose_010 Год назад
Where is the transform part
@BiInsightsInc
@BiInsightsInc Год назад
I have done a follow up to this with transformations in the following video. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@HeroicKhalid
@HeroicKhalid Год назад
I'm still don't see benefit from it why need to make it like this if we can extract from source dirctly
@BiInsightsInc
@BiInsightsInc Год назад
I think you need an introduction to ETL and/or and Data Engineering and why we are needed in the world of day. I’d say pick up a book in either subject and you will see the benefits of it. Here an intro to data engineering. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-fwkLcp8dbic.html
@anthoniusadi5754
@anthoniusadi5754 2 года назад
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
@BiInsightsInc
@BiInsightsInc 2 года назад
Hi Anthonius, make sure you have the sql server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to sql server port 1433. Hope this helps.
@isbakhullail6693
@isbakhullail6693 2 года назад
@@BiInsightsInc how to make sure SQL Server Native Client Installed in our machine ?
@BiInsightsInc
@BiInsightsInc 2 года назад
@@isbakhullail6693 here is a stack overflow’s link that shows how to determine If SQL Server’s client is installed. stackoverflow.com/questions/10499643/check-if-sql-server-client-is-installed
@isbakhullail6693
@isbakhullail6693 2 года назад
@@BiInsightsInc thanks sir
@isbakhullail6693
@isbakhullail6693 2 года назад
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)') Error while extracting data: local variable 'src_conn' referenced before assignment
@djamier9334
@djamier9334 2 года назад
do with airflow please sir
@BiInsightsInc
@BiInsightsInc 2 года назад
I have done a video on this topic with Airflow. Feel free to check it out! How to build and automate your Python ETL pipeline with Airflow | Data pipeline | Python ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@rangarajann1781
@rangarajann1781 Год назад
This is just EL.
@BiInsightsInc
@BiInsightsInc Год назад
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eZfD6x9FJ4E.html
@matheusbispo2428
@matheusbispo2428 Год назад
houly shit
@Anroth98
@Anroth98 2 года назад
Thank you!
@calvinallen8425
@calvinallen8425 11 месяцев назад
im having a new problem is that, Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 18 for SQL Server]Named Pipes Provider: Could not open a connection to SQL Server [67]. (67) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to CALVIN/SQLEXPRESS. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (67)') what do i do?
@BiInsightsInc
@BiInsightsInc 11 месяцев назад
I have covered the commom SQL Server connection issues in this video here. With so many questions on this topic I had to cover it separately. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zdezE6TWSQQ.html&t
@kishoribhalsing
@kishoribhalsing 2 года назад
purchased soft soft was because there are so many resources and remake templates out there to help understand how people have
@MrPrateek008
@MrPrateek008 Год назад
Any suggestions or available code which can copy the compatible data type from MySQL to postgres..?? Please respond
@BiInsightsInc
@BiInsightsInc Год назад
Yes, there are various tools out there that will convert MySQL to Postgres syntax. There is online tool you can use. It will allow you to select source and target. Then there is mysql2postgres tool. If you're dealing with table DDL and want to convert MySQL data types to Postgres then you can write a simple script to swap the data types. www.sqlines.com/online github.com/maxlapshin/mysql2postgres
@SigdelSanjog
@SigdelSanjog Год назад
Helpful. Thank you for this video
@trieutruong2410
@trieutruong2410 2 года назад
Thank you so much for the insight process
@asimkhas3663
@asimkhas3663 2 года назад
thanks 🙂
@skipa9906
@skipa9906 2 года назад
importing rows 0 to 10... for table data load error: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")... (I am using sql server)
@BiInsightsInc
@BiInsightsInc 2 года назад
You are selecting from "sqlite_master" that's not the system schema in SQL Server. The SQL Server has a "sys" schema and table that stores table the information is called tables. Please refer to the repo and copy the script from there. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@skipa9906
@skipa9906 2 года назад
@@BiInsightsInc thank you
@skipa9906
@skipa9906 2 года назад
@@BiInsightsInc def load(df, tbl): try: rows_imported = 0 engine = create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;') con = engine.connnect() print(f'importing rows {rows_imported} to {rows_imported+len(df)}... for table {tbl}') df.to_sql(f'sys_{tbl}',con, if_exists='replace',index=False) rows_imported +=len(df) print("Data imported succesfully.") except Exception as e: print("data load error: " + str(e)) try: extract() except Exception as e: print("Error while extracting data: " + str) This is are my configs for loading to sql server but im still getting the error
@BiInsightsInc
@BiInsightsInc 2 года назад
@@skipa9906 your connection details do not seems to be complete. If you run only the following line: create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;') This will throw an error. Here is what I used to successfully connect and persist data to SQL Server. #user and password pwd = 'demopass' uid = 'etl' #sql db details dr = "SQL Server Native Client 11.0" srvr = "localhost\SQLEXPRESS" db = "AdventureWorksDW2019" engine = create_engine(f"mssql+pyodbc://{uid}:{pwd}@{srvr}:1433/{db}?driver={dr}") df.to_sql("customer_retention", engine ,if_exists='replace', index=False)
@skipa9906
@skipa9906 2 года назад
@@BiInsightsInc thank you. i found a solution similar to how you do. great videos, im enjoying them.
Далее