Hello sir. I tried to import 400k data into big query sandbox. But ended with more errors. Is this possible to import those data. Pls anyone help me it's urgent ( interview assignment)
This is nice but not that impressive. Obviously, the table is being stored using Columnstore Compression techniques. So you only need to query the columns in the select list. And they are typically grouped in blocks of 1 M or more. These header pages keep rowcount values. So you are not reading every row. Just the block headers of a single column. If your query forced the scan of all rows in the "block" asking it to be combined with other fields in the same row or in other tables before you could filter it. You will no longer be in the columnstore sweet spot. and the difference in query speed would be more striking. Still good thou, as that is a common use case.
what to do when I want to overwrite 100 millions of rows into new table, in minutes? df.write.mode("overwrite").saveAsTable("FINAL"), if you could please help with this?
And the answer would be "because I work with a multinational enterprise customer". If you have a large market share in China (1 bill people) , India (1 Bill people), Europe 0.75 Bill, USA (350M people) it doesn't take long to get to 100 BIllion transactions. If you want to do Financial Year on Year comparisons, you need to keep at least 24 months of data, usually 36 months. .
Great video. Thank you so much for such a clear and clean explanation. One question I had was that I am planning to use a notebook environment for my senior design project during college. Our group is planning to use this for implementing GAN and YOLO models to detect defects in 3D printing. Since we are planning to use at least 1000 images for the training dataset for each type of defect, we wanted to know which notebook is good. Do you think Google Colab will be a good fit for such a project?
Please for 50 days I am looking for this i wanna to create 2000 users in mysql and set the phone number as user name and password my be say me how can i create most users with default password? That's
Hi sir, I want to ingest data, using an API from a 3 party vendor into GCP. The data comes once in a month. Can you suggest me a way to ingest data to GCP and which services should I use to ingest data and need a Scheduler or a trigger that run automatically after 30 days
Apologies for the late reply. You can check out cloud run to run your api calls and you can schedule them using something like cloud scheduler or cloud composer on GCP.
Very helpful video. I just have a question. These techniques look like they are more for data upload, and not really ingestion. When we talk about "ingestion", we usually are referring to data pipelines, DataFlow, DataProc, Cloud SQL, BigQuery, etc. And as far as "streaming ingestion" is concerned, Pub/Sub is probably the first thing that comes to our mind. I fully agree that there are no hard and fast rules and its perfectly ok to call copying data to Cloud Storage as 'ingestion'.
He refers the actions of 1:1 data ingestion.. if we think of any transformation necessary then we do use data pipeline using DF/DP/BQ-procedures.. or any combination of it..
@@selvapalani9727 you are right. Given that S3 buckets can indeed act as data lakes and that Athena, BigQuery, Redshift and other such services can directly query data from such object storages, there is no harm in calling the process of pushing data into them as 'ingestion'. They are not 'files' any more. They could be 'data' out of which insights can be derived.