Ajay, I need some help please! I've been trying all day and I can't. I cataloged a parquet file that was saved after handling a Job with Spark. I'm doing another job to insert data from the parquet into an RDS MYSQL database, I need the data to be inserted in the same order as the parquet to ensure the primary keys, I've tried several ways, but the data is always inserted in a random way in the database table, can you tell me what I can do?
Nice Tutorial Ajay. I have one question. I have a requirement to copy data which is of 4 million records in Dynamo (2017 version) to another table (2019 version). I don't want downtime. Can you please suggest will glue help me in this usecase? If yes then what things i have to consider.
Awesome video!! I have a query: - I wanted to push s3 data(csv) to redshift tables. Can I anyhow use table schema created by crawler to create table in redshift? In every tutorial instructor 1st hand creates a table in redshift, then uses crawler again to create schema in glue then pushes the data to reshift...then what is the use of creating schema using crawler?
Hey Kishlaya, You have to try this. Just search if Glue Data Catalog can be used directly in Redshift. I am aware that Redshift Spectrum can directly use the schema created by crawlers
Hello Ajay, I saw your LinkedIn post of aws data analytics certification Pls explain us the detailed learning path that yoy have taken to pass pls make a video on this out would be very helpful to anyone looking to pursue that exam
Hello Ajay, Your videos was very helpfull. Can we get similar videos for AWS LAMBDA. Is it possible for you to put all your videos relates to AWS(S3, ATHENA, GLUE, KINESIS, LAMBDA, EMR) in UDEMY so that we can buy it for you. Please share your thoughts on this.
Nice explanation on AWS Glue Crawlers, which was very much helpful... Thanks for that. If any in between column is get deleted in newest file the the earlier file & the schema is modified by crawler, then in the earlier files the deleted column is available but the data got shifted ( as I can see the data is disturbed). So is there any configuration in crawler to validate the column names in any files available in S3 location.
Hi, Nice explanation on AWS Glue Crawlers, which was very much helpful... Thanks for that. I have some queries about GLue crawler and Athena First try : In my S3 bucket I have put two different files one is Stock table and other is employee table and run glue crawler. Two different tables are generated but with empty data. Is it correct ? Second try : In my S3 bucket I have put two different files one is Stock table and other is employee table and run crawler with Exclude patterns and mentioned employee.csv after that also single table is generated but data is merged from both table. Is it correct ? or I have done some thing wrong. Please let me know.
Hi Saurabh, You have to segregate the data to two different folders. If data is not returning from query, better check if schema is matching in Glue Catalog
I have upoaded a csv into S3 bucket .Crawler is creating the Data Catalog in Glue but when Im trying to view the content of the csv file in Athena using a query, its showing blank,but the columns are present without the values
yes same with me if we do with single csv file then we can se the data but when we crawl multiple files from same folder it is showing blank pls help me out if you get the solution
Hi Ajay, Is there any way to automate through CI/CD, like I wanted to upload bunch of crawlers files and trigger them automatically and then store inferred schema in local file system. Thanks in advance.
Hi Surya, You can CI/CD to automate things. Also consider Scheduled Lambda function. You can upload your files the either trigger or schedule the processing. Hope this helps!!
No, there is a list that you can find on aws Docs website. Common types such as Csv, tsv, DBs, logs, json, parquet are supported. You can write custom crawler also but that would not cover images. Try using AWS Rekognition
Hi Ajay, I uploaded the CSV file to the S3 bucket and created a crawler, I see the table in the database but trying preview data I don't see the data in the table, could you please let me know why I don't see the data?