Good tutorial, but the audio fades in and out. AWS Glue has been updated enough to make some of this information irrelevant. I would update with the latest UI and correct the audio issues. Thank you.
Sir, is there any way were we can set a trigger for S3 and Glue Job? What I mean is , whenever a new file upload in S3 one trigger should get active and it run the Glue job and same thing for Crawler also. So whenever new file upload in S3 it active trigger for crawler and job. Thank you
You can do it. Configure event for S3 bucket which gets trigger on put and post event. On the raise of the event, call a Lambda function. In the lambda function, use Python Boto3 API to start glue job and crawler.
Apologies for the late response due to my summer break. There is no concept of global variables. But jobs can maintain states between them in the workflow - here is a video about it - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-G6d6-abiQno.html Hope it helps,
Thanks for sharing knowledge... I am not sure why we should use workflow instead of stepfunction... we do have better control in stepfunction... can you please advise ?
You raised a very good question. Simple answer is - use Glue Workflow only when you are orchestrating jobs and crawlers only. If you have need to orchestrate other AWS services, StepFunction is more suited. I personally believe - over period of time, StepFunction would become main orchestrator service for Glue as well.
You are doing great work. Please keep making videos on glue. Your content is best. Can you make video on reading from rds with secure ssl connection using glue.
Thank for sharing knowledge but can you create video on read data from s3 and writing to database while we need to handle bad records while reading and only insert good records in rds table and badrecords in s3 location
@@AWSTutorialsOnline if record don't not match schema I mean data type is like datatype is int like 1,2,3 are coming but sometimes it comes as four ,five i will share you example link
Basically i m looking for whenever any corrupt record found so I want write in S3 path and normal record I want to write in database ,i don't want my job to stop corrupt record found then it must continue my job running in AWS glue
I need to see some example of corrupt data in order to understand how to check for the same. But once you know whether dataset is corrupt or not; you can use dynamic frame write method to write to S3 bucket or database.
Hi, Is it possible to move an s3 file(csv) after it has been imported to RDS mysql table by a glue job to an processed S3 folder? Great content as always.
Nice and clear explanation. I have query here, how can we run one after another workflow (not job/crawlers) i.e. one workflow for dim and another for fact. once dimension is loaded it should another workflow for fact.
Nested workflow is not available. The best approach will be - at the end dimension workflow, you run a job (using Python Shell) which simply starts the workflow for fact. You can also use other mechanism such as orchestration using Lambda based business logic or Step Function but it will be little complicated because between dimension and fact workflow you need to make API call to check successful end of the dimension workflow before you start the fact workflow. So probably - the first approach I talked about is the best way.
@@AWSTutorialsOnline Thanks for your time. I really appreciate it. you answered my query and i got an idea what to do, let me try create one specific job to call fact workflow at the end of dimension workflow using python scripts.
Hi @@AWSTutorialsOnline, I tried some blogs and google, I don't find code to call AWS workflow using python shell, is that possible to share any our blog and git where I can find some info regarding to execute the workflow using python. Thanks in advance.
@@venkateshanganesan2606 Hi, basically - you need to use boto3 Python SDK in python shell based job. You can google plenty of examples for that. if not let me know. In this job, you use Glue API to start the workflow. API for this method is here - docs.aws.amazon.com/glue/latest/dg/aws-glue-api-workflow.html#aws-glue-api-workflow-StartWorkflowRun Hope it helps. Otherwise - let me know,
@@AWSTutorialsOnline Thanks a lot, it works as you suggested. I used the below piece of code in end of my dimension job to invoke the fact workflow. I really appreciate that your sharing your knowledge wisely. import boto3 glueClient = boto3.client(service_name='glue', region_name='eu-west-1', aws_access_key_id='access_key', aws_secret_access_key='secret_access_key' ) response = glueClient.start_workflow_run(Name = 'wfl_load_fact') Thanks again for sharing your knowledge.
Hello Sir, Thanks for the wonderful session. I have a quick question: I was able to create 2 different data loads in the same glue job and it was successfully loading 2 targets. But i would like to know how we can configure the target load plan(similar to Informatica ) in a AWS Glue studio job.?.