Rearchitecting a SQL Database for Time-Series Data | TimescaleDB

Подписаться 37 тыс.

Просмотров 28 тыс.

50% 1

ABOUT THE TALK:
Today everything is instrumented, generating more and more time-series data streams that need to be monitored and analyzed. When it comes to storing this data, many developers start with some well-trusted system like PostgreSQL. But when their data hits a certain scale, they often give up its query power and ecosystem by migrating to some NoSQL or other "modern" time-series architecture.
In this talk, I describe why this perceived trade-off isn't necessary, and how we've built an efficient, scalable time-series database engineered up from PostgreSQL. In particular, the nature of time-series workloads one finds in devops, monitoring, IoT, finance, and elsewhere -- inserting new data about recent events -- presents very different demands than general transactional (OLTP) workloads. We've architected our time-series database to take advantage of and embrace these differences.
The system architecture automatically partitions data across both time and space, even though it exposes the illusion of a single continuous table -- a hypertable -- across all of your data spread across one or many servers. Its distributed query optimizations both hide the fact that users are interacting with many "chunks" of data, which are right-sized by volume and time constraints, and minimize which and how chunks are accessed to answer queries. In fact, the database supports "full SQL" against this hypertable (e.g., secondary indexes, rich query predicates and group bys, aggregations, windowing functions, upserts, CTEs, JOINs).
Through performance benchmarks, I show how the database scales much better than PostgreSQL, even on a single node. In particular, it avoids the "performance cliff" that vanilla PostgreSQL experiences at 10s of millions of rows, while maintaining robust performance past 100B rows. The database is implemented as a PostgreSQL extension, released under the Apache 2 license.
ABOUT THE SPEAKER:
Michael J. Freedman is a Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, building an open-source database that scales out SQL for time-series data.
His work broadly focuses on distributed systems, networking, and security, and has led to commercial products and deployed systems reaching millions of users daily. Honors include a Presidential Early Career Award (PECASE), SIGCOMM Test of Time Award, Sloan Fellowship, DARPA CSSG membership, and multiple award publications.
ABOUT DATA COUNCIL:
Data Council (www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: / datacouncilai
LinkedIn: / datacouncil-ai
Facebook: / datacouncilai
Eventbrite: www.eventbrite.com/o/data-cou...

Наука

Опубликовано:

29 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 6

@kaustubhtrivedi6502 3 года назад

Wow, really impressed by the possibilities TimescaleDB may open up, thanks for this amazing talk

@ragequilt_ 2 года назад

Sounds absolutely insane! Can't wait to use it in a project.

@samkerridge6573 2 года назад

「明確なメッセージ、明確な構造、理解しやすい、ありがとう」、

@HermannSchwarz 2 года назад

Great work! Could you describe more examples or use cases for using TimescaleDB please? Especially by using with Web Analyse tools like Matomo. Is TimescaleDB useful by queries on the Web-Log tables, where we don't have any statistic or numeric data yet? Or is TimescaleDB only useful if we already have tables with some aggregated/calculated (numeric) data. Thank you!

@npc73x Год назад

then it means timeseries and block chain is not that much diffrent ? only the possibility of updates ? also In considering GDPR Time Series is best on both worlds