Тёмный
No video :(

AWS Tutorials - Incremental Data Load from JDBC using AWS Glue Jobs 

AWS Tutorials
Подписаться 13 тыс.
Просмотров 12 тыс.
50% 1

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 22   
@susilpadhy9553
@susilpadhy9553 Год назад
Please make a video on how to handle the incremental load using timestamp column that will really helpful thanks in advance,i watched so many videos of yours it really helps.
@hitesh1907
@hitesh1907 10 месяцев назад
Please create
@rajatpathak4499
@rajatpathak4499 Год назад
great tutorial, keep brings us more video on real time scenarion. if you can cover video on glue workflow , which can include source,then lambda invocation, which triggers glue job for cataloging and then another trigger for transformation , after that insert into db which will then trigger lambda for archiving.
@AWSTutorialsOnline
@AWSTutorialsOnline Год назад
Please check my vide for event based pipeline. I have explained there what you are talking about.
@jnana1985
@jnana1985 Год назад
Is it only for inserting new records or does it also work with update and delete records also?
@canye1662
@canye1662 Год назад
awesome vid...100%
@AWSTutorialsOnline
@AWSTutorialsOnline Год назад
Glad you enjoyed it
@manishchaturvedi7908
@manishchaturvedi7908 7 месяцев назад
Please add a video which leverages timestamp in the source table to incrementally load data
@federicocremer7677
@federicocremer7677 Год назад
Excelent tutorial and great explanation. Thank you, you got my sub!. Just to be sure, if I have one "updated_at" field in my schema and in my data source (let's say JDBC - Postgres instance) are daily updated rows but rather not inserted new rows, those updated rows will be catched by the new job with bookmark enabled? If that is correct, do I have to add not only my "id" field but also my "updated_at" field in jobBookmarkKeys?
@AWSTutorialsOnline
@AWSTutorialsOnline Год назад
You can use key(s) for job bookmark as long as they meet certain requirements. here are the rules For each table, AWS Glue uses one or more columns as bookmark keys to determine new and processed data. The bookmark keys combine to form a single compound key. You can specify the columns to use as bookmark keys. If you don't specify bookmark keys, AWS Glue by default uses the primary key as the bookmark key, provided that it is sequentially increasing or decreasing (with no gaps). If user-defined bookmarks keys are used, they must be strictly monotonically increasing or decreasing. Gaps are permitted. AWS Glue doesn't support using case-sensitive columns as job bookmark keys.
@fredygerman_
@fredygerman_ 10 месяцев назад
Great video but can you show an example where you connect to an external database using jdbc connection, I.e a database from superbase
@tcsanimesh
@tcsanimesh Год назад
Beautiful video!! Can you please add use case for update and delete as well
@AWSTutorialsOnline
@AWSTutorialsOnline Год назад
In data lake, you generally do not perform update and delete. You only insert. But if want CRUD operation then you should be thinking to use Iceberg, Hudi or Delta Lake on S3.
@gulo101
@gulo101 Год назад
Great video, thank you! Couple questions: I will be using the data I copy from JDBC DB to S3 for staging, before it's moved to Snowflake. After I move it to Snowflake, is it safe to delete it from S3 bucket without any negative impact on the bookmark progress? Also, is there any way to see what the current value of the bookmark is, or manually change it in case of load issues? Thank you
@user-on5zy2gc2u
@user-on5zy2gc2u Год назад
Great Content. I'm facing an issue while loading data from ms sql into redshift using glue, scenario is I have multiple tables regarding customers with customer id as primary key I want to get output as when we update any phone number or address related to a customer id I have to write it into redshift with a new row and if any new entry comes it should get inserted as new row is there any solution for this?
@AWSTutorialsOnline
@AWSTutorialsOnline Год назад
You can create a job which filters data from RDS based on last run datetime and pick records (based on created / modified date greater than last run datetime). Then insert picked records into target database.
@brockador_93
@brockador_93 6 месяцев назад
hello, how are you? One question, I created a bookmark job, based on the primary key of the source table, when making an update to an already processed record, it was not changed in the destination file, how can I make the job understand that there was a change in this record? for example, the table key is the ID field, and the changed field was the "name" field.
@basavapn6487
@basavapn6487 4 месяца назад
Can you please make a video on delta files to achieve scd type 1, because in this scenario it was full file ,but i want to process on incremental files
@shrishark
@shrishark 8 месяцев назад
what is the best approach to read huge volume of data from any on perm sql dbs , identify sensitive data, replace with fake data and to push to aws s3 bucket for specific criteria.?
@victoriwuoha3081
@victoriwuoha3081 6 месяцев назад
redact the data using kms during processing before storage.
@helovesdata8483
@helovesdata8483 Год назад
I can't get my jdbc data source to connect with glue. The only error I get is test connection failed
@AWSTutorialsOnline
@AWSTutorialsOnline Год назад
test connection fail because of many reasons - 1) not using the right VPC, Subnet and Security Group associated with the JDBC source 2) Security Group is not configured with right rules 3) Not having VPC Endpoints (S3 Gateway and Glue Interface) in the VPC of the JDBC
Далее
AWS Tutorials - Using Job Bookmarks in AWS Glue Jobs
36:14
AWS Tutorials - Data Ingestion Services in AWS
47:10
AWS Tutorials - Using AWS Glue Workflow
30:55
Просмотров 13 тыс.