Thank you to Darshil Parmar!. Please note that you deployed the lambda function at 39:00 minutes of the video. It is not mentioned specifically in your explanation. If not deployed it will only run the default code which will anyway run successfully with hello-world print.
thanks for the solution! i was stuck here for a long time 😅 Darshil Parmar thanks so much for the video! hopefully this solution can be pinned for others to refer to :)
What according to you will be the best resource to understand lambda in depth? I need help on that. I am working on bigdata project, but this was not my domain, learning new things and I need to learn faster. Any leads will be helpful for me. Thanks in advance. Also, please keep producing such awesome contents. Thanks a lot!!
Hi Darsheel .. great video as always and yes i did click on like 😀 ... Can you please make a video on how to create a project in Dev environment and then switch to production environment in AWS ..... Basically how to manage the code Lifecycle in AWS from Dev to Production.... or may be you can point to a resource ... Thanks
For people watching this tutorial now, AWS DataWrangler has been changed to AWS SDK for Pandas. Name has been changed but core functionality remains same
@@vishnuvardhan9082 hii, When I changed the layer to AWS SDKPandas and modified the code I found the same error Error : { "errorMessage": "Unable to import module 'lambda_function': No module named 'AWSSDKPandas'", "errorType": "Runtime.ImportModuleError", "stackTrace": [] }
Thank you to Darshil. This is for those who facing issues - 1) Replace awswrangler with awssdkpandas in the code. The rest code remains the same. 2) Add Layer : AWSDataWrangler-Python3.8 replaced it with AWSSDKPandas-Python3.8 version 10 3) Create db_youtube_cleaned db using Glue or Athena before running the code. 4) For Task timed out issue - increasing the memory along with time, for eg. time = 5 min, memory = 512 MB Hope this helps :) Tip: Guys, please go through the comments, if you are stuck. You will be able to find a solution for sure.
@snehakadam16 @Darshilparmar facing issues like "errorMessage": "Unable to import module 'lambda_function': No module named 'awssdkpandas'", "errorType": "Runtime.ImportModuleError", "stackTrace": [] help me out
it worked i think the database name should be = ""db_youtube_cleansed""" even awswrangler with AWSSDKPandas-Python3.8 version 10 and memory 256 mb is working fine for me. Thank you. But as per the video the database should get created automatically
Thank you Darshil for this wonderful video. One thing I would like to point out that might help others following this tutorial is whenever you update your lambda function, click on deploy first to actually test your changes. In my case I wasn't getting any errors and later realized that the default hello world code was still running.
Yes, I might have made mistake while editing the video, I did click on deploy and lot of people missed it. I will keep this in my mind Thank you for the feedback
@@DarshilParmar while we're at it, could you please give a solution for the EntityNotFoundException that somebody else also pointed out. I'm also getting the same error and haven't been able to resolve it. Tried creating the cleansed database in glue manually but still it is not working. Hope to get a reply. Thanks in advance :)
@@DarshilParmar Function Logs START RequestId: 2b020a60-532e-4b33-9933-7cc87b5406cc Version: $LATEST An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found. Error getting object youtube/raw_statistics_reference_data/CA_category_id.json from bucket de-on-youtube-raw-useast1dev. Make sure they exist and your bucket is in the same region as this function. LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html [ERROR] EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.
Just wanted to say thank you to Darshil Parmar for these projects. It's hard to find anything online that helps to this extent from end-to-end. This is great stuff! Cheers! :)
I am unable to proceed further after clicking on test getting err0r:"errorMessage": "'s3_cleansed_layer'", "errorType": "KeyError", can anyone pls tell what's the problem?
Great Project Documentation to try for yourself. One little thing to add would be a rough aws cost estimate. Definitely a thing I would be looking for if I was starting.
(edited) Important note on missing libraires : - AWSDataWrangler-Python3.8 is not still available - I replaced it with AWSSDKPandas-Python3.8 version 1
Amazing job - I'm just starting to use AWS because I would like to become a Cloud Engineer and this just incredible. Thank you a lot for your effort !!
Amazing @darshil ....It is clearly visible how much effort u have put in for ppt , video reording , storyboarding and including small small nuances and error that could be potentially faced. It can't express in words how valuable it is and how much information you are providing for the community. Really inspiring and motivating. Someone in other comment rightly mentioned It is a pure gem on RU-vid
Hi Darshil! First of all, I would like to say "Thank you" for this tutorial. I need to mention something, I was following each steps but AWS is now different and some options are no longer available or they are so different. I can't believe that AWS platform changed so much in just one year. My question is: will you update this tutorial in the future?
I am unable to proceed further after clicking on test getting err0r:"errorMessage": "'s3_cleansed_layer'", "errorType": "KeyError", can anyone pls tell what's the problem?
Dear Darshil, Could you please let us know which architecture have you used in the demo -- Lambda architecture or Kappa Architecture. Wanted to understand more on architecture prospective. Please share your thoughts.
This was a great help..one question though..when you executed this project using different AWS services S3, Athena, Glue etc.. what was the approx. cost you got after full project execution...Thanks
Most likely there won't be any charge if you are under free trial but even if they charge you it will be max 3-5$ You can raise support ticket stating you were just trying to learn about service and they won't charge you
prajwal, I am getting this error: { "errorMessage": "Glue table does not exist in the catalog. Please pass the `path` argument to create it.", "errorType": "InvalidArgumentValue", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 40, in lambda_handler raise e ", " File \"/var/task/lambda_function.py\", line 27, in lambda_handler wr_response = wr.s3.to_parquet( ", " File \"/opt/python/awswrangler/_config.py\", line 735, in wrapper return function(**args) ", " File \"/opt/python/awswrangler/_utils.py\", line 178, in inner return func(*args, **kwargs) ", " File \"/opt/python/awswrangler/s3/_write_parquet.py\", line 719, in to_parquet return strategy.write( ", " File \"/opt/python/awswrangler/s3/_write.py\", line 313, in write raise exceptions.InvalidArgumentValue( " ] } Please help.
Thank you Darshil for this amazing video, it was very helpful. Just completed this whole project plus did some extra work of moving data to redshift using glue job as well while creation connection and enabling vpc endpoint.:)
Thank you so much Darshil for the video! I am having an issue when trying to create a crawler, getting error : "The following crawler failed to create: "name of the crawler" Here is the most recent error message: Account 'Number of account' is denied access." Tried to check the IAM roles created, deleted recreated again, however still receiveing the same message. Would you have an idea what could be the issue?
hello guys,you might be getting error at the point of testing that is because of db name has been not changed in environment variable, please take care he has forget to change db name , if you notice in athena database name is db_youtube_cleaned but it should be de_youtube_cleaned, which is giving error in lamda final testing as "Entity not found"
@@geekyprogrammer4831 facing same issue. I see a parquet file being generated in the gcs bucket, but the lambda function is timing out. Were you able to rectify it ?
This is the first video I'm watching from your channel sir. Even though I didn't understand much, I watched till the end. Your explanation of the stuff just kept me going. I'm a 2nd year engineering student. I have done web development earlier and have basic knowledge of SQL. This data engineering field seems to be very interesting field to dive in. Can you please guide me through sir? So that I can learn the topics and build some good projects using necessary and relevant technology.
Hey great video. I wanted to ask whether I will be charged for using AWS Athena coz it mentioned additional charges for using athena query when I opened it. Thanks for the video.
Thank you so much for making this video!!! This would be 6-7th video of yours which I've added to my playlist. I request you to post more such project videos in different domains.
Hi Darshill, good video and thanks very much. I learned a lot. Please in your subsequent videos, do try to zoom in more often so we can get to see what you’re doing on the screen. Thanks.
Great Stuff, You Rock boey! The entire video was very much intuitive and I must say that without a shadow of doubt that all the nitty gritties of Data is discussed in this, heading for the second part now. Worth a ⌚
Hi Darshil, great video, could you please let me know how to save to table db_youtube_cleaned as I am getting the error : "An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found." Thanks in advance :)
Thank you so much Darshil for the video! I am having an issue when trying to create a crawler, getting error : "The following crawler failed to create: "name of the crawler" Here is the most recent error message: Account 'Number of account' is denied access." Tried to check the IAM roles created, deleted recreated again, however still receiveing the same message. Would you have an idea what could be the issue?
hey darshil, everything is fine accept getting an error says "errorMessage": "An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.", "errorType": "EntityNotFoundException", how to resolve
Yeah, I got the same issue. It has to do w/ the Lambda "Configuration" tab > "Environment variables" we input @39:35. But I'm not entirely sure where the "Value" we input for each "Key" came from or is associated to?
@@naveenkonda395 just create a table in athena by tying the SQL query : create database db_youtube_cleaned first one has to create a table then only lambda will update it with the data
This is one of the biggest mistakes I found, It should be corrected because we were not told to create a new database and then the path is coming to be different. It took me 3 hours to debug this.
If you are getting a runtime error when running the lambda function even after 3 minutes. Make sure to add import pandas as pd This will solve the issue as the AWS wrangler changed to AWS SDK Pandas
Hi Darshil while running the Athena job getting HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1] This query ran against the "de_database_raw" database, unless qualified by the query. error
@anusha kamath Solved it by : - Deleting data in the s3 bucket : youtube-cleaned-useast1-dev - Deleting "db_youtube_cleaned" database in AWS Glue - Recreating database in AWS Glue and name it : db_youtube_clean - Updating environment "glue_catalog_db_name" variable, rename it : db_youtube_clean - Updating environment "s3_cleansed_layer" variable in the lambda function by adding a / at the end of the path THEN -Refresh all and re execute the lambda function. - Then run the SQL query in Athena It worked by magic I don't know what was wrong, force it some time, delete, re upload, re run :)
Hi Darshil, I always think of starting your project videos but I always got stuck whether aws cloud services willl be charged or it's free or is there any other alternatives
Hi Darshil, I'm confused to choose the career between backend developer and data engineer both finds interesting to me. I like more coding than SQL queries which will be better.
for those trying it now- 1. awswrangler name has been changed to awssdkpandas. Rest code wise - it remains the same 2. you need to have glue database created before hand, otherwise it throws error .
Hi Percy, while trying to add Aws layers, I only get 3 options - AppConfig Extension, Lambda Insights Extension, Parameters and Secret Lambda extension. Not sure what I am missing. Please help
I got {{ An error occurred (AccessDenied) when calling the PutObject operation: Access Denied }} when i tried to upload the files to the s3 bucket. I tried to give permission but that also gave me another error {{ You either don’t have permissions to edit the bucket policy, or your bucket policy grants a level of public access that conflicts with your Block Public Access settings. To edit a bucket policy, you need the s3:PutBucketPolicy permission }}
Thank you so much Darshil for this video. I have a question , does the glue table created automaticaly ? because i get a timout , and i think the problem is whit that . can u please provide with more information like should i create new crawel or what to run the create the cleand_table automaticly
faced the same issue , so i found out that you need to create the cleaned catalog_db in Glue then the cleaned_table will be created automaticly ; for timeout try to increase the memory to along with the time . hope it helps
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'awswrangler' For this add layer AWSSDKPandas-Python38 version 19
Thanks for this great content. I'm getting errors with the lambda function: 1. Video is missing to indicate to hit the "Deploy" button. 2. After adding the layers, increasing timeout and granting permission to lambda function, I still get this: Test Event Name lambdaTestEvent Response { "errorMessage": "An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.", "errorType": "EntityNotFoundException", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 39, in lambda_handler raise e ", " File \"/var/task/lambda_function.py\", line 26, in lambda_handler wr_response = wr.s3.to_parquet( ", " File \"/opt/python/awswrangler/_config.py\", line 450, in wrapper return function(**args) ", " File \"/opt/python/awswrangler/s3/_write_parquet.py\", line 666, in to_parquet catalog._create_parquet_table( # pylint: disable=protected-access ", " File \"/opt/python/awswrangler/catalog/_create.py\", line 301, in _create_parquet_table _create_table( ", " File \"/opt/python/awswrangler/catalog/_create.py\", line 152, in _create_table client_glue.create_table(**args) ", " File \"/var/runtime/botocore/client.py\", line 391, in _api_call return self._make_api_call(operation_name, kwargs) ", " File \"/var/runtime/botocore/client.py\", line 719, in _make_api_call raise error_class(parsed_response, operation_name) " ] } Function Logs START RequestId: e124c5fb-a734-417c-a227-f1ac36b93a11 Version: $LATEST An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found. Error getting object youtube/raw_statistics_reference_data/US_category_id.json from bucket de-on-youtube-raw-useast1-7011-dev. Make sure they exist and your bucket is in the same region as this function. [ERROR] EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found. Traceback (most recent call last): File "/var/task/lambda_function.py", line 39, in lambda_handler raise e File "/var/task/lambda_function.py", line 26, in lambda_handler wr_response = wr.s3.to_parquet( File "/opt/python/awswrangler/_config.py", line 450, in wrapper return function(**args) File "/opt/python/awswrangler/s3/_write_parquet.py", line 666, in to_parquet catalog._create_parquet_table( # pylint: disable=protected-access File "/opt/python/awswrangler/catalog/_create.py", line 301, in _create_parquet_table _create_table( File "/opt/python/awswrangler/catalog/_create.py", line 152, in _create_table client_glue.create_table(**args) File "/var/runtime/botocore/client.py", line 391, in _api_call return self._make_api_call(operation_name, kwargs) File "/var/runtime/botocore/client.py", line 719, in _make_api_call raise error_class(parsed_response, operation_name)END RequestId: e124c5fb-a734-417c-a227-f1ac36b93a11 REPORT RequestId: e124c5fb-a734-417c-a227-f1ac36b93a11 Duration: 8167.89 ms Billed Duration: 8168 ms Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 3626.32 ms Can you please take a look at it?
Hey, Thanks for sharing this I made mistake in editing and missed that part For the error I’d say just create table on Athena directly Also you can join discord channel for futures queries
@@DarshilParmar Thank you. It worked. As a reference for who are getting the same error: the video is missing the creation of a database. To do so, go to Athena, create a new query with this SQL "create database de_youtube_cleaned" and run it. Lambda function should work fine after that
@@guillermojastrzebski954 massive thanks for this!! I've been searching the internet for couple of hours to rectify this. Presumed someone must have got into same issue and checked the comments section. Saved me a lot of trouble!
Great Darshil i sometime fear these errors ,, bcs there are lot of services each time a service may through a error I feel difficulties in debugging these errors could you please tell how better can I understand these errors nd resolve Thanks in advance🙌
Hi Darshil I have followed your end to end Data Engineering project on covid data Analysis it helped me to learn about different services on Aws and what exactly a data engineer does. Can u please make a end to end Data Engineering project using MS AZURE and Databricks. Thanks again 👍
Hi Darshil, Thank you so much for this amazing video. I have started following your channel recently and I like all the videos. I am stuck in Lambda function. Since AWSDataWrangler has been updated to AWSSDKPandas-Python312 . I have added AWSSDKPandas-Python312 in layer and also updated code with 'import awssdkpandas as wr' but I still get error saying "Unable to import module 'lambda_function': No module named 'awssdkpandas'". Can you please explain this?
Hi Darshil, Amazing content. I have a question. I am not able to find AWSwragler layer in options. Could you provide a link for downloading it so that I can custom it.
@Darshil Parmar - In part-2 of this video, "region=us/" folder is not created for me; only ca and gb folders are created upon running the ETL job. PS: I added "predicate_pushdown = "region in ('ca','gb','us')" as well but folder is missing for "us" region. Can you please take a look at this?
I can't find the Lambda function's AWS Datawrangler layer option. I can't even find the right arn for us-east-1. You did it at this timestamp: 45:08 Edit: AWS Datawrangler is now called AWS SDK for pandas
Hi Darshil , thank you for this beautiful and easy to understand concept , but while adding lambda layer , i can't find AWSDataWrangler for Virginia resgoin . I tried to deploy existing AWSDataWrangler layer and then add it through custom layer. I did succeed in that , but my data is not getting cleaned and getting stored on cleaned S3 bucket.
Thanks Darshil for this detailed video on project .just needed to know in free trial do we need to pay charges for using aws athena? kindly confirm as when I opened the athena console a msg popped regarding charges for usage.kindly confirm
Absolute gem! Thank you for making this video. Learned a lot today. And if possible, Although I know you have your job, please try to make more of such content in future. Lots of love💛💛
I am unable to proceed further after clicking on test getting err0r:"errorMessage": "'s3_cleansed_layer'", "errorType": "KeyError", can anyone pls tell what's the problem?
@DarshilParmar I tried using flatten transform in ETL job but it didn't work is it because json contains array? can you suggest me how to proceed with ETl in few words so that I can work on that
Thank you Darshil for this awesome video! I have issues viewing the cleaned date in athena. i got "HIVE_UNKNOWN_ERROR: Path missing in file system location: [my path] This query ran against the "[cleaned db]" database, unless qualified by the query. " but i checked the path name are correct and i can access the parquet file locally. Can anyone help with this issue?
Hi @DarshilParmar my lambda function is timing out, i increased the time to 15 minutes which is the max time but still it is not completing and my lambda function throwing an error. I followed the exact same steps shown in the video. Can you or someone suggest to me where I am going wrong?
hello thanks for putting out amazing content! will we incur any charges on aws for this course? for example, if i run the query on athena, how much will we get charged, if any?
You can set a billing alarm and get notified. Also if you raise a ticket to aws and tell them you were just using services to learn after getting charged then you will not charge you
Hi Darshil, thank you for putting up much effort in creating these tutorial videos.. I'm learning so much from watching the videos. But now I'm stuck in the Lambda portion.. Im not getting errors in testing the lambda function and this is the response: Response { "statusCode": 200, "body": "\"Hello from Lambda!\"" } I checked the cleansed folder in S3 but no file has been written.. Can't fighure out how to deal with this problem.
After deploy: If you are getting a runtime error when running the lambda function even after 3 minutes. Make sure to add import pandas as pd This will solve the issue as the AWS wrangler changed to AWS SDK Pandas
If this was production project then we might have used Glue for everything but THIS IS LEARNING project and goal was to include different services and connect them together to show how they work
Hi Drashil. Thank u for the amazing work. Right now AWS doesnot have datawrangler lambda layer. so i am not able to execute the function. is there any other way to execute the function
same i am also unble to execute the lamda function. not able to test it . after the test ish shows { "statusCode": 200, "body": "\"Hello from Lambda!\"" } in the response Any suggestions ?
"errorMessage": "Unable to import module 'lambda_function': No module named 'awssdkpandas'", "errorType": "Runtime.ImportModuleError", "stackTrace": [] getting this error even after adding layer "AWSSDKPandas-Python38" pls provide solution
I am getting errors while querying the data with id 1. SYNTAX_ERROR: line 1:82: Column '1' cannot be resolved This query ran against the "db_youtube_cleaned" database, unless qualified by the query. Please post the error message on