An open source project for bringing DevOps to data science. We maintain Data Version Control (DVC), a tool for extending Git version control to datasets and models, and Continuous Machine Learning (CML), a tool for adapting continuous integration systems like GitHub Actions & GitLab CI fo machine learning.
@pratyakshagarwal-iw Please take a look at this Pipeline video and let us know what you think! ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-71IGzyH95UY.htmlsi=xMty7q4YbJmI7f8G
Missed the opportunity to say "Do not get lost in your machine learning sauce". I totally get experiments though too lol fits the "experiments" y'all offer.
Il est dur de trouver des ressources qui apportent vraiment des informations pertinentes sur youtube, dans le domaine des systèmes LLM. Merci, c'est beaucoup de valeur que vous partagez
Thank you so much for this tutorial. I was struggling to understand how exactly DVC connected to cloud services, and your Google Drive example was extremely clear and simple.
thank you very much. but why i have errors. i couldn't run after first commit. i tried nearly everything. it is deom the the line of the importance plot. what it could be?
It says in the description that the command dvc run has now been replaced with 'dvc stage add' but as far as I can see stage add does not actually run the new pipeline stage. Would 'dvc exp run -n' work, or is the current procedure to do 'dvc stage add -n' followed by 'dvc exp run'?
@25:20 I think DSPy optimise only the prompt not the weights of the model, but feel free to correct me. Anyway this was good. Thanks! - you could do more!
Thanks for the tutorial! If you want this to work today, make these changes in your train.yaml file: change "--show-md" -> "--md" change "cml-send-comment" -> "cml comment create" add "permissions: write-all" at the same level before "runs-on: [ubuntu-latest] add " git config --global --add safe.directory '*' " after "dvc repro" Hope this helps!
I’ve read that weaviate and pinecone are the only commercially viable databases. From a legal and compliance standpoint, which database is more ready to be deployed for use cases outside of tech, such as operational efficiency apps for other industries?
these things are easy , Are there any tutorials about deploying deep learning models with large datasets with retraning with feedback on custom host like a kaggle notebook?
@@dvcorg8370 but Google colab is cost friendly if I need GPU access for training, using local PC is not worth it because of no GPU and sagemaker kind of services burn a hole in the pocket. What is the alternative according to you?
That's correct. It enables you to use your existing Git infrastructure while being able to adequately view and use your ML models and experimentation flows.
@buhassan5656 Please check out this video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-6KtIRVfr61E.html It should be more helpful! We welcome feedback! Let us know how we can make it better!
I'm new to this product, trying to learn it from scratch. Please share a tutorial or series of videos on how to set up, run, experiment with this product .
@buhassan5656 There are a few options. Best MLOps Practices for Building End-to-End Computer Vision Projects with Alex Kim is our most recent, highly liked video: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-E26IaD7bNXg.html We also have an free online course that can help you get started at learn.iterative.ai We are in the process of updating that course as somethings have progressed since it was created over two years ago. And as always please visit our docs here: dvc.org/doc Finally if you get stuck, you are welcome to join our Discord server where you can ask questions: discordapp.com/invite/dvwXA2N
Hi Elle. Is there a possible way to add a new file to dvc via python script? Currently I run "DVC add <sample.file>" using subprocess in python when I want to track new data with dvc.
Thank you for sharing these usefull hands-on tutorials about DVC. I wondered how we can compare DVC to MLflow or airflow? DVC performs pipeline orchestration and some sort of experiments tracking. Can we say one wouldn't need MLflow or airflow anymore if he/she uses DVC?
@babak21x You could replace MLflow with DVC, but they can also work together. DVC provides more thorough reproducibility. You can see info about that in this video (precise time provided): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-E26IaD7bNXg.html Regarding Airflow - usually DVC cannot replace this tool except in cases where it's overkill and not really needed (like in the case of running everything on a single machine, then DVC can)
Hi @urimtefiki226! Can you provide more context on your question? Adding a reference here that came across our radar recently and will likely be in our February newsletter. Vector datbase comparison: vdbs.superlinked.com/
@wayne7936 Thanks for pointing this out! We need to fix this! Take a look at the one directly from the conference here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-J3vUMwG8dks.html. In the meantime, I will try to get a better version up on our channel!
-d flag is not working in 'v3.42.0' at 3:33. --default flag worked perfectly also the authentication is changed by little bit. This is for those who are watching in 2024.
@davidaliaga4708 You are correct sir! We love an astute viewer! ❤️ dvc run was deprecated and replaced with dvc stage add to set up your stages with dependencies and outputs. You can find the documentation here: dvc.org/doc/start/data-management/data-pipelines Once your pipeline is set up, you can run dvc repro to run only the stages that have changed!
@mdavuzar9300 Thank you for the question! Both tools indeed accomplish many of the same things, but the key differentiator is that DVC Studio (name has been changed) is Git-based. You are building your end-to-end MLOps process on infrastructure you already use (Git) instead of saving your ML workflows and processes in another server. This enables you and your team to be set up for success and reproducibility through every step of the process to production.
@jainamdoshi7109 Thanks for the question! Yes you can! Check out this doc to set up a local remote: dvc.org/doc/user-guide/data-management/remote-storage#file-systems-local-remotes
I enjoyed the video, but I have a question: isn't Git LFS accomplishing the same goal? My understanding is that we can already use Git LFS to store large files outside of our repositories, but still track their versioning. What would be the advantage of using DVC instead of Git LFS in this case?
@douglasmsantos Thanks for the question! Here's a great blog post from one of our Community members that addresses the issue and why they switched: mlops.systems/tools/redactionmodel/computervision/mlops/2022/05/24/data-versioning-dvc.html And you can check out our docs around the issue here: dvc.org/doc/user-guide#comparison-with-related-technologies