The latest machine learning, A.I., and data career topics from across both academia and industry are brought to you by host Dr. Jon Krohn on Super Data Science, the most listened-to podcast in the industry. In lighthearted conversation with renowned guests, Jon cuts through hype to fuel your professional impact.
Whether you're curious about getting started in a data career or you're a deep technical expert, whether you'd like to understand what A.I. is or you'd like to integrate more data-driven processes into your business, we have inspiring guests and lighthearted conversation for you to enjoy.
We cover tools, techniques, and implementation tricks across data collection, databases, analytics, predictive modeling, visualization, software engineering, real-world applications, commercialization, and entrepreneurship − everything you need to crush it with data science.
Distract me as a really bad representation of Bayesian statistics. The whole point about Bayesian statistics is that the data is given, it's what it is. You don't revise the data on the basis of the process, you just come up with better estimates of the salient parameters.
Oh my god, this is brilliant xD But this makes me more conscious to whether a podcast uses actual data or not. If it doesn't take any effort there probably also be a lot of trash as well. Before nobody would go out of their way to build some low quality podcast ...but I guess now you can xD (Still am happy to live in this day and age)
When I took linear algebra in college I struggled to understand how any of it could be applied in the real world. I felt like I was just doing arbitrary computations to get meaningless results. I wish I could see the world in the way that a pure mathematician does.
One great thing about scikit is the infrastructure around data pipelines and transformations. You can use the skorch library to wrap your PyTorch model so that it works in a scikit pipeline.
I still get good results from building classifiers from word embeddings. A homemade solution is cheaper, faster, easier to control and manage, and usually performs better.
@RustIsWinning We can identify unnecessary and overcomplicated elements by comparing them with long-established and well-designed languages. By the way, X-Gen - the last generation with a refined aesthetic sense.
@@toragodzen Alright I apologize for assuming your generation. But you still did not explain what's so "ugly". Any examples? Maybe you dont understand generic programming?
SQL -- DuckDB is designed for data science -- Posit had an excellent talk by the developer (on RU-vid). DuckDB has an enhanced client protocol. DuckDB helps me handle much larger datasets with dozens of files (Florida Voter File). Evidence -- looks good for quick and dirty BI (like an explorer for a single file) Quatro/Shiny for more customizable dashboards Easier to learn Quatro in Posit; the RStudio/RMarkdown muscle memory (8 months of full time Coursera Data Science) does not get in the way. Spatial -- lots of potential value because of good standards in geospatial community. Loss of Basemap in Python hurts, but GeoPandas looks interesting (BTW DuckDB has a spatial extension modeled on Postgres PostGIS).
thanks for asking - check the part at 3:30. depending on the operation you're doing, it can make a massive performance difference to stay Polars-native (as opposed to converting to and from pandas)