This is my second time going through your Ray videos, about 6mo after first watching them. I just want to say not only is this some of the most thorough and digestible Ray content I've come across, its also some of the best technical tutorial/explanation content I've come across. You have a real knack for this stuff.
Listen man... this can't be an "I made five videos and its done" kinda thing. You should pick this back up! This series is one of the best explanations of a topic like this I've ever seen on RU-vid. I know it probably took a lot of work to make these even though they're "simple" videos, but they are so well thought out and planned and executed. You have a talent for explaining this kind of stuff.
Hi Jonathan, I am wondering will serverless replace flink and spark in the future? I am thinking if beam can be a thing that a serverless platform can use to replace flink and spark.
I see them as somewhat complementary. Serverless is really an infrastructure concept where Spark/Flink/Ray/etc. are more programming models (kind of analogous to the distinction of user space vs kernel space for personal computers). So you can in theory have a serverless deployment of Flink/Spark/etc. and is exactly what product like AWS Serverless EMR provide.
@@JonathanDinu Thanks for the reply! Btw, if i have beam running on top of a Faas framework which provides alternative runners other than Spark/Flink, does this make sense? Is it an alternative and direct competitor with them?
@@wilsonnybinghamton without knowing the exact FaaS platform internals it is hard to say for sure, but for if you are writing/running Beam it shouldn't really matter what the underlying runners are.
are there any guarantees when you call ray.get() to get age, that all of the grow_older invocations have resolved? I presume yes for each actor (keeping in-line with each actor is syncronous)
yeah that is the case for a specific actor but not guaranteed between actors necessarily. To synchronize between actors you usually pass object references between them using remote() function calls.
Hey Jonathan, this is great! Thank you! Quick question, the GH repo that you posted is not taking me to the one on your video, I also tried to look up the content using GH search but I can't find it!
sorry about that, I changed the repo name (so the video is outdated) but the code and examples should be the same as what is in github.com/jonathandinu/spark-ray-data-science. Let me know if that doesn't work
Amazing i was searching for this explanation. is there a way to make only a method inside a class remote? maybe thinking to override the remote method accordingly
hmmmm, I actually haven't tried that ever. You can try using a regular (un-decorated) Python class but then use the @ray.remote decorator just on a single method of the class. The thing to keep in mind though is what variables/state from the main class the remote method might access.
Awesome explanation! (Someone in the comments section said this is an understatement, but I can't think of any other adjective!😅) But seriously, great videos.
Hi ! I have setup a local cluster and connected another laptop but the client node the second laptop won’t do any work at all no cpu usage or memory what is the issue ? Please help all the work is done by my host laptop (laptop1) how can I do load sharing ?
Hey, when I start ray on my own system it gets started and I connect another laptop using ray start -address=‘xxxxx’ -redis-password=‘xxxxxx’ The laptop gets connected to my computer but when I run heavy task only I see my computer is using 80% memory and 80% cpu while the laptop (worker) doesn’t use much cpu and memory it’s just getting connected but not using all the resources. I wonder why Do you have any idea where am I going wrong? Thanks
it is hard to say without knowing more about the specifics of the machines and code that is running. My guess is that the task might not need more resources than the computer already has (hence only 80% utilization). So Ray might be using the resources it needs only from the single computer and since it is more efficient to avoid communication over the network it never uses the "cluster"
Hi Jonathan - This is your 450th subscriber speaking. Just found this series and absolutely loving it! Please continue making more content on Ray (and maybe RLLib!) Lets get you to 100k soon!
conceptually they are a little similar, but Ray has a higher level API that feels very Pythonic, has built in fault tolerance, and likely different performance characteristics for different types of jobs. From a technology standpoint, Ray is much more similar to Akka than it is to MPI.
I believe that the default is to use Ray's native scheduling usually referred to as the autoscaler. In practice though you usually run Ray on a cloud provider (like GCP or AWS) or on-premise using Kubernetes. Ray has a nice built in module to manage all the complexity of this for you though docs.ray.io/en/latest/cluster/deploy.html
Thank you for this video. I was wondering: were Ray Datasets released after you made this video? It must be the case, because otherwise I'm sure you'd have mentioned them.
yeah at the time of the video the only official Ray modules were the ones from the diagram and the ecosystem has actually changed pretty dramatically in the time since
This is very well explained. I particularly liked the second section as few people actually talk about sustainable/scalable machine learning deployments!
let me know if the Docker setup earlier in the video doesn't work for you,. Unfortunately with the dashboard it is somewhat dependent on your individual machine setup but hopefully Docker helps. I also setup a Discord server for general chat and Q&A: discord.gg/nbyZ6EpUum
@@JonathanDinu Actually I have had docker installed in my machine previously. But the process to install ray in docker seems too complicated for me. I don't have that much experience with docker tbh. So I was just using Ray on my anaconda server - jupyter notebook. Thanks for the Discord server!
Thank you. This is by the far the best overview of the distinction between these platforms. I'm a long time Spark user and Ray newbie and this break down rings true. I really like Ray for hyperparameter tuning and model serving.
Ray has a pretty nice library called Ray Cluster that manages all the coordination needed to setup the library on multiple machines (or deploy to the cloud). docs.ray.io/en/master/cluster/index.html
only the Actor API can be thought of as similar to Akka. The Ray project itself has many other components and is designed for ML and reinforcement learning, hence the focus on Python. So even though you could probably do similar things with Akka, they have somewhat different target audiences and use cases.
Thanks for the clear explanations! I'd be interested to see an example of recovery in a multi node cluster where a node fails and its actors are recovered on another node.