Тёмный
DVCorg
DVCorg
DVCorg
Подписаться
An open source project for bringing DevOps to data science. We maintain Data Version Control (DVC), a tool for extending Git version control to datasets and models, and Continuous Machine Learning (CML), a tool for adapting continuous integration systems like GitHub Actions & GitLab CI fo machine learning.
DVC Extension for VS Code Update: Plots Wizard
1:30
10 месяцев назад
Комментарии
@pratyakshagarwal-iw1es
@pratyakshagarwal-iw1es 8 дней назад
well explained. Can you also make a video this much explained on how to make a pipeline using dvc?
@dvcorg8370
@dvcorg8370 3 дня назад
@pratyakshagarwal-iw Please take a look at this Pipeline video and let us know what you think! ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-71IGzyH95UY.htmlsi=xMty7q4YbJmI7f8G
@pratyakshagarwal-iw1es
@pratyakshagarwal-iw1es День назад
@@dvcorg8370 Thanks
@maxtaubert3044
@maxtaubert3044 Месяц назад
Missed the opportunity to say "Do not get lost in your machine learning sauce". I totally get experiments though too lol fits the "experiments" y'all offer.
@betonniere8202
@betonniere8202 Месяц назад
Il est dur de trouver des ressources qui apportent vraiment des informations pertinentes sur youtube, dans le domaine des systèmes LLM. Merci, c'est beaucoup de valeur que vous partagez
@BasitJawed
@BasitJawed Месяц назад
I am from Pakistan and currently using Chroma DB to store vector embeddings.
@curlrain
@curlrain Месяц назад
I've never been able to authenticate i always get a 400 invalid request.
@saeidseyfi
@saeidseyfi Месяц назад
I want to use a Linux server as a remote server for storing and managing my datasets and models using DVC. How can I set this up?
@iPondrio
@iPondrio 2 месяца назад
Do you have any video showing how to configure the token ? I’m having a hard time with that config
@albertkim1809
@albertkim1809 2 месяца назад
Thank you so much for this tutorial. I was struggling to understand how exactly DVC connected to cloud services, and your Google Drive example was extremely clear and simple.
@thewaterborne8
@thewaterborne8 2 месяца назад
Curious why AWS OpenSearch KNN is not on the list :O
@mehrdat
@mehrdat 2 месяца назад
thank you very much. but why i have errors. i couldn't run after first commit. i tried nearly everything. it is deom the the line of the importance plot. what it could be?
@basicmachines
@basicmachines 2 месяца назад
It says in the description that the command dvc run has now been replaced with 'dvc stage add' but as far as I can see stage add does not actually run the new pipeline stage. Would 'dvc exp run -n' work, or is the current procedure to do 'dvc stage add -n' followed by 'dvc exp run'?
@stopznak86
@stopznak86 3 месяца назад
Great stuff, I'm learning
@istvandarvas3372
@istvandarvas3372 3 месяца назад
@25:20 I think DSPy optimise only the prompt not the weights of the model, but feel free to correct me. Anyway this was good. Thanks! - you could do more!
@efexzium
@efexzium 3 месяца назад
Love your work! Elle and Deevee
@salsabilaaurora99
@salsabilaaurora99 3 месяца назад
sorry is that in cmd?
@dvcorg8370
@dvcorg8370 3 месяца назад
Yes! You use the command line for DVC, just like Git!
@awds121
@awds121 4 месяца назад
Thanks for the tutorial! If you want this to work today, make these changes in your train.yaml file: change "--show-md" -> "--md" change "cml-send-comment" -> "cml comment create" add "permissions: write-all" at the same level before "runs-on: [ubuntu-latest] add " git config --global --add safe.directory '*' " after "dvc repro" Hope this helps!
@dvcorg8370
@dvcorg8370 4 месяца назад
Thanks for the tip!
@crazyidiot101
@crazyidiot101 4 месяца назад
I’ve read that weaviate and pinecone are the only commercially viable databases. From a legal and compliance standpoint, which database is more ready to be deployed for use cases outside of tech, such as operational efficiency apps for other industries?
@curdyco
@curdyco 5 месяцев назад
isn't there a way to perfom retraining in pipeline using google colab or kaggle?
@curdyco
@curdyco 5 месяцев назад
these things are easy , Are there any tutorials about deploying deep learning models with large datasets with retraning with feedback on custom host like a kaggle notebook?
@curdyco
@curdyco 5 месяцев назад
@@dvcorg8370 but Google colab is cost friendly if I need GPU access for training, using local PC is not worth it because of no GPU and sagemaker kind of services burn a hole in the pocket. What is the alternative according to you?
@trainer9948
@trainer9948 5 месяцев назад
It looks like all the metadata is stored in git? Is it true that the studio app is just reading and writing to the git repo? Very cool
@dvcorg8370
@dvcorg8370 5 месяцев назад
That's correct. It enables you to use your existing Git infrastructure while being able to adequately view and use your ML models and experimentation flows.
@NLogSpace
@NLogSpace 5 месяцев назад
8:23 I like that the remote storage is called "Someone's PC" :)
@Buhassan5656
@Buhassan5656 6 месяцев назад
I was searching for tutorial videos on how to set up this tool & use it in vscode, I tried it & it was a bit complicated hope to see one soon
@dvcorg8370
@dvcorg8370 5 месяцев назад
@buhassan5656 Please check out this video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-6KtIRVfr61E.html It should be more helpful! We welcome feedback! Let us know how we can make it better!
@Buhassan5656
@Buhassan5656 6 месяцев назад
I'm new to this product, trying to learn it from scratch. Please share a tutorial or series of videos on how to set up, run, experiment with this product .
@dvcorg8370
@dvcorg8370 5 месяцев назад
@buhassan5656 There are a few options. Best MLOps Practices for Building End-to-End Computer Vision Projects with Alex Kim is our most recent, highly liked video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-E26IaD7bNXg.html We also have an free online course that can help you get started at learn.iterative.ai We are in the process of updating that course as somethings have progressed since it was created over two years ago. And as always please visit our docs here: dvc.org/doc Finally if you get stuck, you are welcome to join our Discord server where you can ask questions: discordapp.com/invite/dvwXA2N
@riftsassassin8954
@riftsassassin8954 6 месяцев назад
South Africa. Played with the one built into Gemini python SDK, but want to learn more to use it in open source projects.
@macildur95
@macildur95 6 месяцев назад
Thank you for making this so clear! 😁
@Eriddoch
@Eriddoch 6 месяцев назад
Elle, you were the OG DevRel before that role got popular :D
@babakhos
@babakhos 6 месяцев назад
Hi Elle. Is there a possible way to add a new file to dvc via python script? Currently I run "DVC add <sample.file>" using subprocess in python when I want to track new data with dvc.
@dvcorg8370
@dvcorg8370 5 месяцев назад
For python you could use: from dvc.repo import Repo repo = Repo(".") repo.add("abc.csv")
@babakhos
@babakhos 6 месяцев назад
Really interesting feature. Does it mean now we do not need to use mlflow for model and experiment tracking when we already have dvc in our project?
@babakhos
@babakhos 6 месяцев назад
Thank you for sharing these usefull hands-on tutorials about DVC. I wondered how we can compare DVC to MLflow or airflow? DVC performs pipeline orchestration and some sort of experiments tracking. Can we say one wouldn't need MLflow or airflow anymore if he/she uses DVC?
@dvcorg8370
@dvcorg8370 5 месяцев назад
@babak21x You could replace MLflow with DVC, but they can also work together. DVC provides more thorough reproducibility. You can see info about that in this video (precise time provided): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-E26IaD7bNXg.html Regarding Airflow - usually DVC cannot replace this tool except in cases where it's overkill and not really needed (like in the case of running everything on a single machine, then DVC can)
@urimtefiki226
@urimtefiki226 6 месяцев назад
which vectors those of my matrix since 2019?
@dvcorg8370
@dvcorg8370 6 месяцев назад
Hi @urimtefiki226! Can you provide more context on your question? Adding a reference here that came across our radar recently and will likely be in our February newsletter. Vector datbase comparison: vdbs.superlinked.com/
@saliexplore3094
@saliexplore3094 6 месяцев назад
Alex Kim is a dope instructor! Thanks for sharing.
@wayne7936
@wayne7936 6 месяцев назад
Is there a higher resolution version of this video?
@dvcorg8370
@dvcorg8370 6 месяцев назад
@wayne7936 Thanks for pointing this out! We need to fix this! Take a look at the one directly from the conference here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-J3vUMwG8dks.html. In the meantime, I will try to get a better version up on our channel!
@CRCE__Hardik_Prajapati
@CRCE__Hardik_Prajapati 7 месяцев назад
-d flag is not working in 'v3.42.0' at 3:33. --default flag worked perfectly also the authentication is changed by little bit. This is for those who are watching in 2024.
@codewithbrogs3809
@codewithbrogs3809 7 месяцев назад
Bad explanation
@dvcorg8370
@dvcorg8370 7 месяцев назад
Thanks for the feedback @codewithbrogs3809. Can you be specific about what parts are confusing! We'd like to make it more clear! 🙏
@wabisudoiuasbdo
@wabisudoiuasbdo 7 месяцев назад
What’s the main benefit of using dvc instead of bucket versioning again?
@davidaliaga4708
@davidaliaga4708 7 месяцев назад
it seems dvc run is not supported anymore, is it?
@dvcorg8370
@dvcorg8370 7 месяцев назад
@davidaliaga4708 You are correct sir! We love an astute viewer! ❤️ dvc run was deprecated and replaced with dvc stage add to set up your stages with dependencies and outputs. You can find the documentation here: dvc.org/doc/start/data-management/data-pipelines Once your pipeline is set up, you can run dvc repro to run only the stages that have changed!
@bg-mq5hz
@bg-mq5hz 7 месяцев назад
An elegant product and a great tutorial. Thank you.
@dvcorg8370
@dvcorg8370 7 месяцев назад
Thanks for watching!
@abheysethi2536
@abheysethi2536 7 месяцев назад
I want this DV for my gf, can you gift me so I can gift her🥹 plzz
@AnonymousIguana
@AnonymousIguana 8 месяцев назад
Very helpful, thanks for the presentation :)
@dvcorg8370
@dvcorg8370 7 месяцев назад
Glad it was helpful!
@BUY_YOUTUB_VIEWS_286
@BUY_YOUTUB_VIEWS_286 8 месяцев назад
I appreciate how positive and uplifting your vlogs are.🌟
@cboyda
@cboyda 8 месяцев назад
Really like the full walkthrough and example scenarios, nice share!
@dvcorg8370
@dvcorg8370 8 месяцев назад
Glad you enjoyed!
@izainonline
@izainonline 8 месяцев назад
How to choose which vector database we will use. Chroma,Qdrant
9 месяцев назад
Thanks a lot. The completion of the dvc.yaml looks great.
@dvcorg8370
@dvcorg8370 8 месяцев назад
Glad you like it ❤️
@drm8164
@drm8164 9 месяцев назад
i love u
@dvcorg8370
@dvcorg8370 8 месяцев назад
🦉 We love you too!
@Aliviagal2061
@Aliviagal2061 9 месяцев назад
Wow, so cool! 👏👏👏
@dvcorg8370
@dvcorg8370 8 месяцев назад
Thank you! Cheers!
@mdabuzar9300
@mdabuzar9300 10 месяцев назад
why iterative studio is betterr than mlflow?
@dvcorg8370
@dvcorg8370 9 месяцев назад
@mdavuzar9300 Thank you for the question! Both tools indeed accomplish many of the same things, but the key differentiator is that DVC Studio (name has been changed) is Git-based. You are building your end-to-end MLOps process on infrastructure you already use (Git) instead of saving your ML workflows and processes in another server. This enables you and your team to be set up for success and reproducibility through every step of the process to production.
@umeshtiwari9249
@umeshtiwari9249 10 месяцев назад
nice tutorial makes it easy to understand
@jainamdoshi7109
@jainamdoshi7109 10 месяцев назад
Can I build this with a local pc rather than AWS as I am a student and don't have an AWS account
@dvcorg8370
@dvcorg8370 10 месяцев назад
@jainamdoshi7109 Thanks for the question! Yes you can! Check out this doc to set up a local remote: dvc.org/doc/user-guide/data-management/remote-storage#file-systems-local-remotes
@douglasemsantos
@douglasemsantos 11 месяцев назад
I enjoyed the video, but I have a question: isn't Git LFS accomplishing the same goal? My understanding is that we can already use Git LFS to store large files outside of our repositories, but still track their versioning. What would be the advantage of using DVC instead of Git LFS in this case?
@dvcorg8370
@dvcorg8370 10 месяцев назад
@douglasmsantos Thanks for the question! Here's a great blog post from one of our Community members that addresses the issue and why they switched: mlops.systems/tools/redactionmodel/computervision/mlops/2022/05/24/data-versioning-dvc.html And you can check out our docs around the issue here: dvc.org/doc/user-guide#comparison-with-related-technologies
@douglasemsantos
@douglasemsantos 10 месяцев назад
@@dvcorg8370 thank you for clarifying it!
@PioneeringML
@PioneeringML 11 месяцев назад
why is the code repo deleted from Git Hub?
@maximmanchenko6660
@maximmanchenko6660 11 месяцев назад
This is the correct link github.com/iterative/terraform-provider-iterative
@dvcorg8370
@dvcorg8370 11 месяцев назад
@shortspeeches1455 you can find the repo here: github.com/iterative/magnetic-tiles-defect