Тёмный
The Data Entrepreneurs
The Data Entrepreneurs
The Data Entrepreneurs
Подписаться
A community of entrepreneurs in the data space.

Our goal is to promote the growth & development of data people, projects, and businesses.
Комментарии
@pauliusztin
@pauliusztin 3 дня назад
Incredible chat, guys! Incredibly valuable for solopreneurs in the data space.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 3 дня назад
Thanks Paul! Glad it was helpful 😁
@underfitted
@underfitted 14 дней назад
Thanks for having me!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 14 дней назад
Thanks for sharing your insights! I'm already rethinking my MVP validation strategy 😅 -Shaw
@elevatedschool
@elevatedschool Месяц назад
Brilliant! What an insightful talk Tarun!
@SeattleDataGuy
@SeattleDataGuy Месяц назад
Thanks for having me on!
@TheDataEntrepreneurs
@TheDataEntrepreneurs Месяц назад
It was a blast! Thanks for sharing your journey and insights :)
@inishkohli273
@inishkohli273 2 месяца назад
Completed the whole video, got lot of insights and questions cleared. How can I join the next workshop?
@TheDataEntrepreneurs
@TheDataEntrepreneurs 2 месяца назад
Glad it was helpful! You can keep up with all our upcoming events here: lu.ma/tde -Shaw
@smellypunks
@smellypunks 3 месяца назад
It is a shame that the lazy API is so entangled into the API. Might be nice to write generic code which then has the option to switch on the lazy API with one single change. I don't like the idea of having to rewrite the whole codebase to switch between lazy and eager. I question if that was a good design decision from polars. - Side note please always upload videos in 1080p
@ShawhinTalebi
@ShawhinTalebi 3 месяца назад
Here's my solution: cmd+f "scan_" replace with "read_" 😂 P.S. I'm on Mac
@user-yj3mf1dk7b
@user-yj3mf1dk7b 4 месяца назад
man, we can read. why read everything on the screen?
@TheDataEntrepreneurs
@TheDataEntrepreneurs 5 месяцев назад
🎥 Full talk: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-JyocXRkiIcA.html
@abdelkaioumbouaicha
@abdelkaioumbouaicha 5 месяцев назад
📝 Summary of Key Points: 📌 Evaluators for AI Models: The workshop focused on the importance of evaluators for AI models, especially large language models (LLMs), to ensure they are performing correctly. 🧐 Types of Evaluations: Different types of evaluations were discussed, including human review, user feedback, implicit user feedback, comparing against a reference, and ungrounded evaluations. 🚀 Building Evaluators: The workshop delved into building evaluators using LLMs, breaking down tasks into simpler steps, and utilizing techniques like faithfulness evaluation and context relevancy scoring. 💡 Additional Insights and Observations: 💬 Evaluation Techniques: Using LLMs for evaluations simplifies complex tasks, and keeping evaluations simple reduces chances of errors. 📊 Cost and Latency: Evaluations are recommended for analysis and development rather than real-time applications due to latency concerns and API costs. 🌐 Custom Evaluators: Consider using functions as evaluators or text embedding-based approaches for cost-effective evaluation strategies. 📣 Concluding Remarks: The workshop emphasized the significance of evaluators for AI models, highlighted various evaluation techniques using LLMs, and suggested cost-effective evaluation strategies for robust environments. It provided insights into setting up evaluators in CI/CD pipelines and the importance of simplicity in evaluation tasks. Generated using TalkBud
@TheDataEntrepreneurs
@TheDataEntrepreneurs 5 месяцев назад
Very cool, thanks for sharing!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 6 месяцев назад
🎥 Full Interview: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-uP-3eA9JckM.html
@ifycadeau
@ifycadeau 6 месяцев назад
Such a good interview! Thank you both, this was very helpful 🤗
@TheDataEntrepreneurs
@TheDataEntrepreneurs 6 месяцев назад
Glad you enjoyed it! :)
@TheDataEntrepreneurs
@TheDataEntrepreneurs 6 месяцев назад
🎥 Full Talk: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-72UdRcfzYcg.html
@RajivSambasivan
@RajivSambasivan 7 месяцев назад
There is a ton of tabular data that businesses have. Creating knowledge representations out of this with knowledge discovery, using reasoning and inference on this and then managing the life cycle of these representations is what we need. This is a use case specific workflow, requires domain and AI expertise. Natural language representations are great, Chat GPT is great, but... we do have a ton of tabular data too. Why go through and rediscover knowledge representations for these problems since chat GPT came along.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 7 месяцев назад
That's a great point Rajiv! Just because LLMs have unlocked many unstructured data for us doesn't mean we should abandon our existing structured datasets.
@RajivSambasivan
@RajivSambasivan 7 месяцев назад
Absolutely resonate with what David says about open AI. Good to hear this.
@iskandera1783
@iskandera1783 7 месяцев назад
Could you do session please on ai in education?
@TheDataEntrepreneurs
@TheDataEntrepreneurs 7 месяцев назад
That's a good idea! Anyone you recommend we reach out to?
@jairodiaz-ortiz2194
@jairodiaz-ortiz2194 8 месяцев назад
I like the video. Got some takeaway from it. But I did thought I was going to get a tutorial lol
@TheDataEntrepreneurs
@TheDataEntrepreneurs 8 месяцев назад
The “how to” does give the impression of a tutorial. Jérémy will be coming back next year for a more hands-on workshop on building an AI assistant 😁
@ifycadeau
@ifycadeau 8 месяцев назад
Great panel! Thanks for sharing! :)
@TheDataEntrepreneurs
@TheDataEntrepreneurs 8 месяцев назад
Glad you liked it!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 8 месяцев назад
🎥Full Talk: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-HA5v3w-5Rpw.html
@chrstfer2452
@chrstfer2452 8 месяцев назад
That was awesome
@TheDataEntrepreneurs
@TheDataEntrepreneurs 8 месяцев назад
Glad you liked it Chris 😁
@crossray974
@crossray974 8 месяцев назад
Amazing stuff Shaw. I watched your entire video series. My self-assessment of skills: High expertise in a service company working mostly with contracts and third parties, and medium technical knowledge (writing my Master's thesis in LLM, concept knowledge, Langchain, some Python and so on). What is the best way on hugging-face to fine-tune a model based on, say, only 10 real estate contracts written in Word to create a text generator based on a specific fact, like lets say: "The tenant is not obliged to rebuild, but must {...}, where brackets represent the text to be generated. Any input is highly appreciated :) Thank you so much! Followed on medium as well...
@TheDataEntrepreneurs
@TheDataEntrepreneurs 8 месяцев назад
Thanks, glad you enjoyed the series 😁 At first glance, 10 examples is not enough for effective fine-tuning, but it’s definitely worth a shot. I’d recommend setting up time to chat during “office hours” so we can dig into this a bit more. calendly.com/shawhintalebi/office-hours -Shaw
@chrstfer2452
@chrstfer2452 8 месяцев назад
This was a really good one. I guess theres a discord? Is it paid or could i join?
@TheDataEntrepreneurs
@TheDataEntrepreneurs 8 месяцев назад
Yes, there is a Discord and it's free! (and always will be 😁) Link: discord.gg/RSqZbF9ygh
@95Harshal
@95Harshal 9 месяцев назад
This is good ! <3
@TheDataEntrepreneurs
@TheDataEntrepreneurs 8 месяцев назад
Glad you liked it!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
📹Full Discussion: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-VKLLyv9cJSQ.html
@ravishmahajan9314
@ravishmahajan9314 9 месяцев назад
Can Polars replace pyspark Or hadoop?
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
Good question. Here’s a response from Ben. “I’m not entirely sure tbh. i'm pretty sure pyspark is more scalable (e.g. > 1 TB data), but polars is better for data processing on your local machine (e.g. < 1 TB). i don't think Polars has so much stuff yet like pyspark does for distributed computing, whereas that is pretty much what pyspark was built for afaik.”
@malikrumi1206
@malikrumi1206 9 месяцев назад
Talk talk TALK talk talk tALK talK *TALK*! NO code, no realistic use cases and examples - none of that. I can't get those 45 minutes back, gentle reader, but now you don't have to lose them.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
Thanks for the feedback. This talk is definitely more high-level and conceptual. Given the popularity of the topic, we'll need to bring Jeremy back for another more technical and hands-on workshop.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
📹Full Interview: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-S2_rgEIwy5I.html
@GetPaidToLivePodcast
@GetPaidToLivePodcast 9 месяцев назад
Super insightful! Thanks for sharing 🙌🏾
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
Glad it was helpful!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
📹Full Recording: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rTRfNQFY4ao.html
@pietraderdetective8953
@pietraderdetective8953 9 месяцев назад
hey great channel you got here...keep it up! Liked and subscribed!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
Thanks! Glad you're enjoying the content :)
@MartyAckerman310
@MartyAckerman310 9 месяцев назад
I use PySpark and Pandas every day at work, and am interested in learning more about Polars. I've messed with Polars a little, and it kind of seems like a cross between the two. Looks like Polars has window functions, which is something Pandas is missing compared to PySpark. One issue I see in general is there's a whole lot of really bad Pandas code out there, often written by people coming from Matlab who don't bother to really learn Pandas before deploying code. So be really careful of Pandas code you get out of SO or ChatGPT. And your Pandas example at 17:22 could be vastly improved by dot chaining, which is hard to show in YT comments but I'll give it a shot: decade_counts = ( pd.DataFrame(...) .assign(decade = lambda x:x['Age']/10*10) .groupby('decade') .agg(n_observations = ('decade',len)) .sort_values('n_observations', ascending=False) .iloc[:10] .plot(kind='bar') ) I added the lines starting at sort_values to show how to plot a top 10 bar chart of the data. Key here is to dotchain and avoid assignments.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
Thanks for sharing!
@jaysonp9426
@jaysonp9426 9 месяцев назад
I agree that a handful of people can be a company using AI...I don't get why it wouldn't lead to company layoffs...
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
My interpretation of Jeremy's comment was while it may lead to layoffs in the short run, AI innovations will lead to job growth long-term.
@jaysonp9426
@jaysonp9426 9 месяцев назад
@@TheDataEntrepreneurs yeah, I just don't see it. With every other revolution there was a place for humans. Now we're no different than horses when the car was invented. This is a GOOD thing. All that matters is that world assets continue to rise in value. In the short term I think entrepreneurship will be at its best and in the near short term we'll just all be free to do/create what we want. I don't see a world in 5-10 years where anyone works for anyone.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
I could see it. Thanks for sharing your insight!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 9 месяцев назад
📹Full Recording: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-nq6Td5aUZpE.html 📰Read more: shawhin.medium.com/a-data-entrepreneurs-guide-to-llms-af629a088a6f
@TheDataEntrepreneurs
@TheDataEntrepreneurs 10 месяцев назад
🎥Watch the full talk: Full talk: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-EsCa_bO-MuY.html
@spikeydude114
@spikeydude114 10 месяцев назад
Although I see the benefits of Polars. I haven't had enough obstacle with Pandas for my workflows. I don't deal datasets that exceed memory and I think currently I can extend my memory limit using Dask ... but looking forward to the development of Polars and will likely adopt once it has more support!
@virushk
@virushk 10 месяцев назад
Same situation here. I find Pandas and Dask to be sufficient tools for my workflows
@JOHNSMITH-ve3rq
@JOHNSMITH-ve3rq 9 месяцев назад
Chatgpt knows pandas much better. For exploratory work probably not an issue. But if shopping something to prod and want to keep it very fast and minimise system resource then polars seems a better choice.
@samuelswatson
@samuelswatson 9 месяцев назад
To me the appeal is the coherence of the API and the superior execution model. But the ecosystem disadvantages associated with using a much less popular library are substantial.
@signoc1964
@signoc1964 8 месяцев назад
@@samuelswatson but polars has a to_pandas() method, so the disadvantages is easily overcome, so its more like if you your doing simple things, then its unneccesary to bring in polars. We replaced a lot advanced elt(not etl) with polars. 16 000 lines of sql code done with the main transfroms done in polars instead, for this task it's excellent and translated really well, and a lot of stuff is easier to to in polars than in sql for example. Doing the same in pandas is a nightmare. Translating advanced sql code to pandas is a hard job.
@samuelswatson
@samuelswatson 8 месяцев назад
​@@signoc1964 That seems to me to be the best use case for Polars (replacing complex SQL in transformation pipelines, especially because of its composability), so it's cool to hear another testimonial for its success in that context.
@Entrepreneur_in_progress
@Entrepreneur_in_progress 10 месяцев назад
It is a very informative video, thanks a lot, Shaw. Regarding the LLM business opportunities, I think the biggest opportunities in the field for the next decade are to help companies get LLM-ready data sets. The computing costs will eventually decrease with more companies joining the GPU race. Companies will realize their data is valuable and want their own small model adapted to their own use cases (vertical + sector). However, only a small proportion of companies have a clear data strategy and have their data ready for LLMs. So the opportunity is huge for companies like Databrick as well as for vendors who can just jump in and help companies sort out their data.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 10 месяцев назад
That's a great point. As organizations become more interested in developing custom LLMs, there will be a large demand for data curation and preparation. Thanks for sharing your insight!
@wayneqwele8847
@wayneqwele8847 9 месяцев назад
Agreed, vector dbs will be key!
@ifycadeau
@ifycadeau 10 месяцев назад
Great talk! I’ve been looking for this!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 10 месяцев назад
Thanks, hope it helped!
@Vilijam1974
@Vilijam1974 10 месяцев назад
Thank you so much! Now I understand what my programmers are talking about :)
@TheDataEntrepreneurs
@TheDataEntrepreneurs 10 месяцев назад
Glad it was helpful :)
@DarrenSaw
@DarrenSaw 10 месяцев назад
Pandas is a massive mess. It's very easy to write very poor code in Pandas but to write it well is not that intuitive, Matt Harrison has written some great stuff, but it's not that easy to learn. Polars is way better and improving all the time. It's much easier to write and way quicker. The lazy API is a thing of beauty.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 10 месяцев назад
I'm looking forward to using Polars more in my own workflow -Shaw
@MartyAckerman310
@MartyAckerman310 9 месяцев назад
I agree, Pandas' learning curve was steeper for me than R. But I've kind of settled on a consistent workflow(.loc[:,['col']] instead of ['col'], and dotchaining) that minimizes the surprises.
@signoc1964
@signoc1964 8 месяцев назад
One problem with polars though is that "pandas" developers then to write "polars" code like they write pandas code, and to some extent it is possible which gives people a bad example, since a couple of those. Polars becomes like pandas then executing in serial instead of parallell.
@VinniePazo
@VinniePazo 10 месяцев назад
Thank for sharing this. Very valuable information for anyone who wants to understand MLOps
@TheDataEntrepreneurs
@TheDataEntrepreneurs 10 месяцев назад
Thanks Vinnie. I'm glad it was valuable!
@larceblake9436
@larceblake9436 10 месяцев назад
this is exactly what i needed, appreciate you shawhin + chris!
@TheDataEntrepreneurs
@TheDataEntrepreneurs 10 месяцев назад
Thanks Larce! Glad it was helpful 😁
@user-iz5rp4fl2q
@user-iz5rp4fl2q 11 месяцев назад
Great Job Ben! 👍
@feifa13
@feifa13 11 месяцев назад
Thanks Ilia!
@nikhiljaiswal1411
@nikhiljaiswal1411 Год назад
Hi. You say to not do the transformations in feature engineering pipeline but include steps like "standardizing" and "vectorize". let's say I have some text data and I'm using a bow vectorizer. How can I include that in this FE pipeline? This is ideally done/fitted on train data if I'm not wrong. How can please explain a little more on that?
@TheDataEntrepreneurs
@TheDataEntrepreneurs 11 месяцев назад
Great question! I'll share it with the speaker.
@TheDataEntrepreneurs
@TheDataEntrepreneurs 11 месяцев назад
Paul dives into more detail on FE pipelines here: towardsdatascience.com/a-guide-to-building-effective-training-pipelines-for-maximum-results-6fdaef594cee
@joaopaulocostanogueira1544
@joaopaulocostanogueira1544 Год назад
Great talk! Will you also share the slides?
@TheDataEntrepreneurs
@TheDataEntrepreneurs Год назад
Glad you enjoyed it. Slides are in the Discord: discord.com/channels/1054532836620775505/1097189045613903992/1135701955906379819
@sherjeelhashmi4828
@sherjeelhashmi4828 Год назад
Crushing it Paul. Keep content coming
@LucidDataAnalytics
@LucidDataAnalytics 11 месяцев назад
Valuable Session. Thankyou!
@TheDataEntrepreneurs
@TheDataEntrepreneurs Год назад
Glad you enjoyed it!
@danielmicoski4233
@danielmicoski4233 Год назад
This is a wonderful content shared here! Loved it
@TheDataEntrepreneurs
@TheDataEntrepreneurs Год назад
Thank, glad you enjoyed it!
@user-wr4yl7tx3w
@user-wr4yl7tx3w Год назад
are the slides available?
@TheDataEntrepreneurs
@TheDataEntrepreneurs Год назад
Yes, they are shared in the Discord discord.com/channels/1054532836620775505/1097189045613903992/1116496987857109143