Тёмный
Dremio
Dremio
Dremio
Подписаться
The Unified Lakehouse Platform for Self-Service Analytics

Bring users closer to the data with lakehouse flexibility, scalability, and performance at a fraction of the cost. Dremio's intuitive Unified Analytics, high-performance SQL Query Engine, and Lakehouse Management service for next-gen dataops let you shift left for the fastest time to insight

Quick Answers - What is Apache Arrow?
0:31
Месяц назад
Комментарии
@zmihayl
@zmihayl 4 дня назад
Your voice is like an angel to fall asleep😇
@santhoshreddykesavareddy1078
@santhoshreddykesavareddy1078 6 дней назад
Hi thanks this is really a great information to start with Apache Iceberg. But I have a question, when modern databases are already doing it with so much advance technology to prune and scan the data, why would we need to store the data in files format instead of directly loading them to a table ?
@Dremio
@Dremio 5 дней назад
When you start talking about 10TB+ datasets yo run into issues on whether database can hold the dataset and performantly. Also different purposes need different tools so you need your data in a way that be used by different teams with different tools.
@Dremio
@Dremio 5 дней назад
Also with data lakehouse tables there doesn’t have to be any running database server when no one is querying the dataset since they are just files in storage while traditional database tables need a persistently running environment.
@santhoshreddykesavareddy1078
@santhoshreddykesavareddy1078 5 дней назад
@@Dremio wow! Now I have got full clarity. Thank you so much for your response.
@santhoshreddykesavareddy1078
@santhoshreddykesavareddy1078 5 дней назад
@@Dremio cost saving. Thanks for the tip 😀.
@intjprogrammer3877
@intjprogrammer3877 7 дней назад
Thanks for the great video. Question: when we first the DELETE command in the lesson2 branch, does the data also appear in minio ? Like, does minio object storage shows both lesson2 branch and main branch separately ? I am curious this because on minio, there is only data and metadata partition, and there is not directory for main vs lesson2 branch.
@intjprogrammer3877
@intjprogrammer3877 7 дней назад
I think I got it now. Storage layer does not have this concept of branches, so in the waraehouse/data/ directory, it stores parquet files both lesson2 branch and main branches. I can tell this because there are files with different timestamp associated with my sql operations in each branch.
@nooh_jl
@nooh_jl 8 дней назад
Thank you so much! I have a question. I'm wondering if there might be any way to do these procedures automatically in Iceberg. Do I have to do these things in person every time?
@Dremio
@Dremio 7 дней назад
Dremio Cloud has the ability to automate these types of operations
@nooh_jl
@nooh_jl 8 дней назад
it's really helpful for me!! Thank you so much
@ZaidAlig
@ZaidAlig 14 дней назад
Hi Alex, Really tankful to you for such nice explanation and handson. I got stuck at 'CREATE BRANCH IF NOT EXISTS lesson2 IN nessie' . This keeps failing with error message "syntax error at or near 'BRANCH'". Am I missing something? Kindly assist.
@Dremio
@Dremio 13 дней назад
If you want pm me (Alex Merced) your spark configs. Usually it’s a typo or an update that needs to be made the spark configs. Spark can be very touchy on he config side which is one reason using Dremio for a lot of iceberg operations is so nice (much easier).
@kenhung8333
@kenhung8333 17 дней назад
Awsome Video !! At 3:18 when explaining different delete format I have question regards to the implementation : As the delete mode only accept MOR or COW , how exactly do I specify the delete operation to use Equality delete or Positional delete ??
@Dremio
@Dremio 17 дней назад
It’s mainly based on the engine, most engines will use position delete but streaming platforms like Flink will use equality deleted to keep write latency to a minimum
@mdafazal12
@mdafazal12 22 дня назад
very well explained...great job Dipankar
@agrohe21
@agrohe21 22 дня назад
Great explanation and details
@joeingle1745
@joeingle1745 24 дня назад
Great article Alex. Slight issue creating a view in Dremio, I get the following exception "Validation of view sql failed. Version context for table nessie.names must be specified using AT SQL syntax". Nothing obvious in the console output, any ideas?
@AlexMercedCoder
@AlexMercedCoder 23 дня назад
That means the table is in Nessie and it needs to know which branch your using so it would be AT BRANCH main
@joeingle1745
@joeingle1745 23 дня назад
@@AlexMercedCoder Thanks Alex. This would seem to be a limitation of the 'Save as View' dialogue, as it doesn't allow me to do this and it doesn't default to the branch you're in the context of currently.
@vinu11sharma
@vinu11sharma Месяц назад
Thanks for this video
@aesthetic_mard
@aesthetic_mard Месяц назад
we cant able to read files direcly from minio bucket to appache spark . How can we can read file from mino bucket and process in spark ?
@Dremio
@Dremio Месяц назад
If your following this tutorial sometimes Spark has some weird dns issues with the docker network. The solution is to use the ip address of the Nessie container which you can find by inspecting the network in the docker desktop ui or inspecting the network using the docker CLI to find the ip address of the Nessie container. If you run into a "Unknown Host" issue using minio:9000 then there may be an issue with the DNS in your Docker network that watches the name minio with the ip address of the image on the docker network. In this situation replace minio with the containers ip address. You can look up the ip address of the container with docker inspect minio and look for the ip address in the network section and update the STORAGE_URI variable for example STORAGE_URI = "172.18.0.6:9000"
@Dremio
@Dremio Месяц назад
This tutorial does the same thing without spark www.dremio.com/blog/intro-to-dremio-nessie-and-apache-iceberg-on-your-laptop/
@user-xg5po3nx7n
@user-xg5po3nx7n Месяц назад
how come iceberg can read csv file? I thought you can just use parquet,orc,avro. is it just work in dremio like vendor thing? because in trino, you just need to use parquet,orc,avro
@Dremio
@Dremio Месяц назад
The CSV file is not part of the iceberg table, in this example we taking a CSV file and adding the content of it to an Iceberg table but new parquet files are being written and a new metadata snapshot being created.
@MrJ0mmy
@MrJ0mmy Месяц назад
i enjoyed this video
@ecmiguel
@ecmiguel Месяц назад
Great!!!
@srinivasanrajagopal9062
@srinivasanrajagopal9062 Месяц назад
Please remove that idiotic pullover
@esmob4140
@esmob4140 Месяц назад
Great explanation!
@Dremio
@Dremio Месяц назад
Thank you
@tarun99507
@tarun99507 Месяц назад
Hi, Can you send me the query regarding the update command because I am getting an error regarding this we cannot use update command is it true? Or any other command we can use?
@Dremio
@Dremio Месяц назад
You can use update on iceberg tables, this could be a spark or iceberg version issue. What version of both are you using?
@tarun99507
@tarun99507 18 дней назад
@@Dremio apache spark 3
@philippm8445
@philippm8445 Месяц назад
Nice format
@Dremio
@Dremio Месяц назад
Thank you
@jordanmessec5332
@jordanmessec5332 Месяц назад
Thank you but wow terrible microphone
@oscardelacruz3087
@oscardelacruz3087 Месяц назад
Hello nice video. is it possible to have video like this working with unstructured data (video, image, audio, docs)?
@Dremio
@Dremio Месяц назад
Dremio is particularly designed for structured and semi structured data. Although in the future different AI tools can help turn unstructured data into structured data for analytics.
@multitaskprueba1
@multitaskprueba1 Месяц назад
You are a genius! Fantastic explanations! Thanks!
@MrShockalex
@MrShockalex Месяц назад
Hi, thanks for the video! I work in a small company, with 10 dasboards. Can I use Dremio as a centralized way to quickly access data without using DBT (i dont know dbt)? Dremio As a fast data lake that I can use SQL in my parquet files and various databases to create my dasboards? Again, thanks for the video.
@Dremio
@Dremio Месяц назад
Yes, dbf was just demonstrated to show the integration but it isn’t required for using Dremio. You can do everything via the UI. Here is an exercise to show you just that -> bit.ly/am-sqlserver-dashboard
@rafaelg8238
@rafaelg8238 Месяц назад
great video. how orchestration all that?
@WeeQqq
@WeeQqq Месяц назад
😢
@samsoneromonsei9368
@samsoneromonsei9368 2 месяца назад
How is the env variables link to the AWS Glue. Is there any session define. Can you please share how you are using the .env in the script
@Dremio
@Dremio 2 месяца назад
In this video we are using AWS Glue studio but a docker container with a notebook server and am configuring the environmental variables in the docker run command. AWS glue is just being ci figured as the catalog in the Spark session. In AWS glue you should be able to specify env variables on the job settings page. Find examples here: github.com/developer-advocacy-dremio/quick-guides-from-dremio
@emonymph6911
@emonymph6911 2 месяца назад
Can you filter categorical variables via the heading like Excel dropdowns? If not, is that coming? You can in PowerBI and if you aim it at those people they may not want to SQL every filter action.
@Dremio
@Dremio 2 месяца назад
There is a text to Sql feature you can use express filters like this us using plain text.
@emonymph6911
@emonymph6911 2 месяца назад
@@Dremio Thank you but could not find text to SQL on the local dremio client, is it exclusive to cloud? I feel like dropdown headers could easily generate SQL.
@Dremio
@Dremio 2 месяца назад
@@emonymph6911 yes, text-to-sql is an exclusive to cloud feature. Both cloud and software have no-code features that can be accessed by clicking on a column to generate calculates columns, joins, data type changes and others.
@emonymph6911
@emonymph6911 2 месяца назад
@@Dremio Thank you. My only feedback is that Excel style filters for unique names that are part of your column headers would be really convenience and nice to see in a future release. Apart from this I think the software is amazing, lots of respect for the team.
@ManishJindalmanisism
@ManishJindalmanisism 2 месяца назад
Hey Dremio team !! How can we programmatically ingest data in iceberg table built using CTAS in dremio? If I have already built a iceberg table in dremio, and now on a schedule or event I want to append rows from a file into this table using some program and scheduling tool like airflow, how is that achievable? Most of your demos show DML operations from the sql editor, but thats not the production way to go/
@Dremio
@Dremio 2 месяца назад
SQL is fine you can use airflow to send SQL to Dremio to insert records into the desired table. In this tutorial I give an example of doing an an append only insert. bit.ly/am-sqlserver-dashboard
@boonga585
@boonga585 2 месяца назад
26:23
@boonga585
@boonga585 2 месяца назад
5:54
2 месяца назад
Hey Alex! Nice video! Today I use apache Nifi to retrieve data from APIs, DBs and mariadb is my main DW. I've been testing dremio/nessie/minIO using docker-compose and I still have doubts about the best way to ingest data in Dremio. There are databases and APIs that cannot be connected directly to it. I tested sending parquet files directly to the storage, but the upsert/merge is very complicated and the jdbc connection with Nifi didn't help me either. What would you recommend for these cases?
@Dremio
@Dremio 2 месяца назад
Shoot a message to Alex Merced on linked in so he can also some follow up questions for the best recommendation.
@emonymph6911
@emonymph6911 2 месяца назад
Why did Dremio go with Iceberg over Hudi? Hudi seems more intuitive and flexible with the timeline approach.
@Dremio
@Dremio 2 месяца назад
www.dremio.com/blog/exploring-the-architecture-of-apache-iceberg-delta-lake-and-apache-hudi/
@emonymph6911
@emonymph6911 2 месяца назад
@@Dremio It's exactly that article which made me ask the question. ^.^ Don't get me wrong I'm trying Dremio right now in local docker looks amazing. But I still thought Hudi with timeline is more suitable for BI considering dates ties well together with graphs, event streams and Data Vault methodology as well. Going to watch the Xtable presentation at subsurface, looking forward to it! PS: Alex your customer care videos and docs are the best in the world for a software application. I like how you guys go at a moderate pace and cover terminologies in the tutorial before showing the ropes. Makes it an easy barrier of entry. Please keep that up. 10/10 waves!
@Dremio
@Dremio 2 месяца назад
@@emonymph6911 I think this may be answering the opposite question but this article may be helpful too: www.dremio.com/blog/dremios-commitment-to-being-the-ideal-platform-for-apache-iceberg-data-lakehouses/ I do think there is a tremendous benefit to the reusability of Iceberg's metadata structure along it's partitioning evolution and hidden partitioning features which are unique to the format.
@csabarikannan
@csabarikannan 2 месяца назад
How to create new table plz share the video
@Dremio
@Dremio 2 месяца назад
www.linkedin.com/posts/learniceberg_pyiceberg-write-support-is-here-links-in-activity-7165735714381312002-21-1?
@anandsharma213
@anandsharma213 2 месяца назад
Lovely presentation!
@abdelmoughitaityoub4093
@abdelmoughitaityoub4093 3 месяца назад
what a great tutorial. One thing that I didn't get is how did you just convert the string json object constructed by airbyte to get the columns with their values. Thanks in advance
@Dremio
@Dremio 3 месяца назад
In this blog you can see the sql in more detail, but essentially I turn the json string into an object and access the properties via the keys. www.dremio.com/blog/how-to-create-a-lakehouse-with-airbyte-s3-apache-iceberg-and-dremio/
@abdelmoughitaityoub4093
@abdelmoughitaityoub4093 2 месяца назад
Yeah thanks@@Dremio. I would like to ask a question. what if we want to use Project nessie as a catalogue for iceberg tables directly Is there any option for this !
@BardanPokhrel
@BardanPokhrel 3 месяца назад
Hi. I have been testing with Dremio OSS version 24.2.6. I have been looking into Dremio to find the solution of providing roles and privileges. However, its not available anywhere. Upon going through the documentation on Dremio's website it mentions this feature is available on Dremio v16.0+ Enterprise Edition only. My dremio runs on a docker container on a single server along with Nessie, postgres and Spark. In your video, I can see you are also using localhost. How did you manage to have privileges and access control? Is there anyway, I can do the same with the open source version? Is there any roadmap to include it in the OSS version?
@Dremio
@Dremio 3 месяца назад
Mike in the demo is running Enterprise Edition. In Dremio Cloud, many of the security features are available on the free tier.
@gfinleyg
@gfinleyg 3 месяца назад
Is there a new link for the article? The Flink+Nessie article is still available, but the "Blog Tutorial" link is dead.
@Dremio
@Dremio 3 месяца назад
both links still seem to be working for me.
@KartheekNallamala
@KartheekNallamala 3 месяца назад
How do we resolve merge conflicts Ex :- main branch moved a head and added/deleted somedata temp branch have some changes and i'm trying to merge temp to main branch how does nessie handle this case do we need to manually resolve the merge conflict
@Dremio
@Dremio 3 месяца назад
Nessie has the ability in its rest api to force merges or ignore certain objects. These aspects of its features should be coming very soon to its SQL support. In future iterations it will become more context aware to auto reconcile such conflicts further down the road,
@DharmeshPatel-el2xh
@DharmeshPatel-el2xh 3 месяца назад
Hey Alex, I'm also getting this error 1 of 2 START sql table model warehouse-dbt.test2.my_first_dbt_model ............ [RUN] 11:57:41 1 of 2 ERROR creating sql table model warehouse-dbt.test2.my_first_dbt_model ... [ERROR in 4.22s] 11:57:41 2 of 2 SKIP relation test2.my_second_dbt_model ................................. [SKIP] 11:57:42 11:57:42 Finished running 1 table model, 1 view model in 0 hours 0 minutes and 11.60 seconds (11.60s). 11:57:42 11:57:42 Completed with 1 error and 0 warnings: 11:57:42 11:57:42 Runtime Error in model my_first_dbt_model (models/example/my_first_dbt_model.sql) 11:57:42 ERROR: Validation of view sql failed. No match found for function signature my_first_dbt_model(type => <CHARACTER>) How to solve this error? Any help will be appreciated Thanks
@Dremio
@Dremio 3 месяца назад
Is there a way to message me on linked the code your using in a git repo so I can inspect it, just message Alex Merced on linked in.
@alisahibqadimov7659
@alisahibqadimov7659 3 месяца назад
From video, what I understand Nessie is the best catalog for datalake house. It is easy to manage and goes beyond the its capabilities giving gitlike environment.
@alisahibqadimov7659
@alisahibqadimov7659 3 месяца назад
Great
@weiding6570
@weiding6570 3 месяца назад
Hi there, thanks for the awesome video! Any reasons why s3.endpoint was setup an ip address rather than a host name when creating the catalog? I found the hostname style could also work in the demo with s3.path-style-access=true.
@Dremio
@Dremio 3 месяца назад
I think it was just my particular env at the time i kept running into an issue with the host name is my docker env so i just used the ip to be safe.
@subhasishsarkar5106
@subhasishsarkar5106 3 месяца назад
Alex Merced is fire!
@analyticimpact
@analyticimpact 3 месяца назад
This worked fine. The only detail I had to look at twice was the using the project id, which is a long GUID instead of the name of the project. The dbt-dremio plugin does not return meaningful errors if there are any problems. Another UI thing that was odd to figure through is that you have to click on an artic catalog then see the tiny button in the upper right of the screen to create a folder. No actions are possible from the folder tree on the left for Artic related functions. Those things made this less straight forward than it could have been, but it still ended up working great.
@user-uf7ie5pt9e
@user-uf7ie5pt9e 3 месяца назад
Hey alex, nice video. I have a question for you. when convert any file (csv, json or parquet) from a datalake to iceberg table, data will duplicate, will be a copy on the iceberg table?
@Dremio
@Dremio 3 месяца назад
If using COPY INTO or INSERT INTO to populate iceberg tables, new data files are written and the added on new snapshot on the iceberg table.
@sjob12
@sjob12 3 месяца назад
Hello Alex. How can we view the videos at the end where you have the QR code?
@Dremio
@Dremio 3 месяца назад
Is scanning the QR codes on the phone not working, both videos should also be searchable on Dremio you tube channel. I think the next video in the first QR code is ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-bvXj4ANMy10.htmlsi=KrthtZQr_Dve9Ter
@joshuajames7231
@joshuajames7231 3 месяца назад
I got an error Failed to load class "org.slf4j.impl.StaticLoggerBinder", when running the script for spark
@Dremio
@Dremio 3 месяца назад
I'd have to see the whole log output and catalog settings to determine the issue. If you want message me on LinkedIn and I can examine further. - Alex Merced
@NohaElguindy
@NohaElguindy 3 месяца назад
hey Alex, really appriciate your work. I am quit beginner here and I have a basic question. why didn't you include DBT-Dremio in your docker containers ? why did you configure it seperately in a virtual python env. would really appreciate the clarification.
@Dremio
@Dremio 3 месяца назад
1. dbt-dremio needs to be installed in the context your dbt-models exist which is usually not on the same system Dremio is running on (you wouldn’t want both processes fighting over resources) 2. The virtual environment is to isolate dependencies from other projects like web apps, so that way I can more easily make the environment more replicatable.
@subhasishsarkar5106
@subhasishsarkar5106 3 месяца назад
This was an amazing talk! Thanks so much Alex!
@brahyamalmonteruiz9984
@brahyamalmonteruiz9984 3 месяца назад
loved this series of videos!