Тёмный
PostgresTV 💙💛
PostgresTV 💙💛
PostgresTV 💙💛
Подписаться
Welcome to Postgres TV, a place where we discuss all things PostgreSQL.

This channel is organized and maintained by Ilya Kosmodemyansky (DataEgret.com) and Nikolay Samokhvalov (Postgres.ai).

For inquiries: contact@postgres.ai
Комментарии
@jirehla-ab1671
@jirehla-ab1671 2 дня назад
if I were to add a mesage que for real time OLTP database workloads, would that also induce latency which makes the OLTP database workload not in real time anymore? if so then whats the point of message queues then if its going to be used in real time OLTP database workloads?
@NikolaySamokhvalov
@NikolaySamokhvalov 2 дня назад
The whole point of message queue is to "detach" some work, to make it async. This allows to response faster (lower latency) but guarantee that the work will be done. And if this work is done very soon - it's almost real time. But you're right, there is certain trade-off here, and this "detaching" usually makes sense when system becomes more complex.
@davidfetter
@davidfetter 6 дней назад
If that unix socket regression is real, it's very likely a bug. Also, the fact that there's a huge difference between the TCP version and the unix socket version suggests that there are improvements to be had in the listener code.
@deadok68
@deadok68 11 дней назад
Hi guys, very proud of you and yours creativity.Found this channel through Nickolay's some 5 years old podcasts and got here, almost 1/3 already listened, thanks
@RU-qv3jl
@RU-qv3jl 11 дней назад
It would also be interesting to use different machines and try it out with the different connection pooling options. I imagine that could be interesting too. You would add latency for sure. Sadly I don’t have the credits to try something like that :(
@mehmanjafarov5432
@mehmanjafarov5432 12 дней назад
hi @NikolaySamokhvalov . I regularly listen to your podcasts, and I've been actively researching memory management topics in the documentation as well as various sources on Google. While doing so, I came across several misleading or inaccurate blog posts regarding certain cases. Therefore, I have a specific question: Is Vacuum Buffers considered a form of local memory (similar to work memory) or shared memory? Thanks
@NikolaySamokhvalov
@NikolaySamokhvalov 9 дней назад
shared (256KB ringbuffer)
@NikolaySamokhvalov
@NikolaySamokhvalov 9 дней назад
see also: BUFFER_USAGE_LIMIT and vacuum_buffer_usage_limit in PG16+
@RU-qv3jl
@RU-qv3jl 15 дней назад
Really good content as always, thanks for sharing your knowledge.
@Neoshadow42
@Neoshadow42 17 дней назад
Subtitles are incredible, thanks guys!
@poppop101010
@poppop101010 17 дней назад
great content thnx for the effort!
@kirkwolak6735
@kirkwolak6735 18 дней назад
I loved this one! I love how PG allows the entire record to be easily encoded and stored. We implemented an audit feature like this in Oracle. It was way too much code. We stored the OLD and NEW record. When I saw how easy it was for a single table in PG... I started falling in love... For us, the table, and the timestamp was always attached. To Answer the question: "How do you show this to the Manager" (the record changes. Assuming you stored the table_name, table_id columns with it. Then you would create a visible link that pointed to that record if there was an updated record in existence. And if it is a deleted recorded you will either need to merge it into the results, or show them NEAR the messages of the rough similar timestamp. You don't have to show them. Just show that they exist, with an easy way to get to them. FWIW, Day 1 in training. We showed users that all of their edits were stored. And Deletes were Stored as well. we only had to recover a couple of times.
@drasticfred
@drasticfred 18 дней назад
i always do add a reserve "flag column" to my tables, usually type int, no matter what the table serves for, it comes very handy, gives a flexibility that glue it any other table, service or logic etc.
@obacht7
@obacht7 24 дня назад
Thank you for another nice episode! I like that you started out with a very gentle introduction what the topic is about, why it is important, and what the main issues are related to Postgres. In some of the past episodes, I was sometimes a bit lost because I couldn't follow your deep knowledge quickly while not knowing enough about the postgres-specific challenges/internals myself. So thanks for setting the stage a bit for the beginners and Postgres-"foreigners" (pun intended) 👍
@NikolaySamokhvalov
@NikolaySamokhvalov 24 дня назад
Thanks I needed to hear this. Passed to Michael too. I think we'll do it more - basics in the beginning of an episode
@LearningSFR
@LearningSFR 24 дня назад
Awesome work. I would love to hear more about logical replication on high intensive workloads (master node with hundreds of databases x 1 replication slot per database)
@jianhe5119
@jianhe5119 24 дня назад
at 1:10:35, i use tmux, when i use mouse select text, it will automatically copy the text, so i don't have "search with google" option.
@chralexNET
@chralexNET 25 дней назад
In a personal project I am making, I am trying to build a backend and database where foreign keys isn't the default (without thinking) mechanism to use in all cases, I wrote a comment about that for episode 69. I think this video validates a lot of what I am experimenting with, but definitely I think using foreign keys is okay on tables that you know are very low activity because it reduces the complexity of the application code if it has to handle relationships not being guaranteed to be valid. In the end what I'll end up with is something where foreign keys aren't used, where hard deletes without cascades are used on the root ("parent" tables), and where the application will clean-up data during regular maintenance and do the full database vacuum during maintenance. It will work well for my project, because it will have daily maintenance with downtime, where I am aiming for that to be 5 minutes or less. It is a personal project, so 99.99% uptime isn't a concern of mine, but performance during the advertised operational hours of the system is important. The backend is basically just infrastructure for some game servers for old games, and the thing about these old games is that they get more unstable the longer they run for. So the user-facing application (the game server) will have downtime, I am just using that down-time window for my backend and database as well.
@NikolaySamokhvalov
@NikolaySamokhvalov 25 дней назад
Thanks for sharing your experience. Worth noting, "heavy loads" I mention all the time are quite rare - say, starting at 10k TPS. Before that, I would use FKs without doubts.
@chralexNET
@chralexNET 25 дней назад
​@@NikolaySamokhvalov What I am getting from this is experience in building a system, that can work without relying on foreign keys, just as one thing. It actually tries to do a lot of the things you have been talking about on your streams, even if it is on a small scale. It will most importantly give me experience for building this sort of system, both the application side and the database side, but the most important things is on the application side because it changes what kind of code should be written.
@chralexNET
@chralexNET 25 дней назад
@@NikolaySamokhvalov Uhm, I had to edit my comment before because I misread what you wrote, I thought you wrote "worth nothing", but you wrote "Worth noting", so I went on a bit of a tangent. Sorry about that, you can just forget about what I wrote before, unless that is actually what you meant. And it is a good point that I should only expect benefits at the higher TPS.
@NikolaySamokhvalov
@NikolaySamokhvalov 24 дня назад
@@chralexNET no worries. My comment was my own worry that when I talk about edge/corner-case problems, I forget to mention that to meet those problems, you need to grow your workloads quite high. So it might provoke false impression like "FKs are really bad" - this I wouldn't want to happen. They are good. It's just, really heavy loads are challenging, and edge cases are not good :)
@RU-qv3jl
@RU-qv3jl 27 дней назад
I mean I think that the benefits of partitioning are obvious. I also think that there are a lot of people who don‘t know internals and won‘t think about it. I also think that with partitioning it is worth cautioning not to go too far. By default the planner will only re-order, I think, 8 tables or something like that? So too many partitions can lead to worse plans as you run into the genetic optimiser more quickly right? I think that would also be worth discussing (Says me just part way through the episode) :) Another really nice chat but the way, thanks. I always like hearing your thoughts.
@rosendo3219
@rosendo3219 Месяц назад
gratz boys on episode 100! always listening to you in my car while driving to my boring work
@bhautikin
@bhautikin Месяц назад
One of the best episodes!
@dshukertjr
@dshukertjr Месяц назад
Congrats on episode 100! Sorry if this has been covered in the past episodes, but I would love to know more about the following. 1. Why do you seem to discourage using foreign keys? 2. It seemed like all of the three companies rarely perform joins within their databases, but do they do they perform joins on the application layer? Is this common for large scale databases to not join within the database?
@NikolaySamokhvalov
@NikolaySamokhvalov Месяц назад
FKs are great and I personally use them everywhere. However, they have risks: 1) perf. overhead required to maintain them (that's ok usually), 2) perf. cliff related to multixact IDs - going to demonstrate soon with the PostgresAI bot.
@utenatenjou2139
@utenatenjou2139 Месяц назад
I large scale, having (constraint) foreign key make managing data real difficult. under complex structure, imagine when there are data to be correct. Note: for small data set no prob.
@anbu630
@anbu630 Месяц назад
Congrats on your 100th episode !! Watched your 1st one and here the 100th one as well :-)
@JamesBData
@JamesBData Месяц назад
Congrats on reaching episode 100!
@RajanGhimiree
@RajanGhimiree Месяц назад
Can you guys make a complete episode on logical replication, from configuration to replicate data from source server to replication server.
@davidcarvalho2985
@davidcarvalho2985 Месяц назад
Okay, you guys convinced me. I will try pgbadger. Thanks for this interview by the way. Really nice
@kirkwolak6735
@kirkwolak6735 Месяц назад
So, I was wondering... Wouldn't it be nice if there were 2-3 types of plans based on some of the values of the parameters, so you get the most optimum plan and maybe the optimizer does Parameter Peeking to determine which of the X plans to choose... And then I realized. Wow... The application could do this. Create 3 prepared statements for the same query. And execute against the one TUNED for the query parameter types forcing the best plan to be used by design... Hmmm... We have this situation. We have a complicated search. But when the value we are searching for is small (lots of hits) vs large (few hits). It wants to choose the wrong one after a few queries and then a switch. Unfortunately, this is inside of a Procedure where the statement is prepared around us. We would have to basically duplicate the complex query just to make the condition so that it executes the right right way. But I might still try that.
@kirkwolak6735
@kirkwolak6735 Месяц назад
Yes, you should test with your extensions. You should have a few general procedures you run that exercise using all of the extensions. And you should monitor log sizes. In case something is going wrong, and it's only in the log files. I like using htop in linux, and watching how much memory the various threads are using and the total. In case memory consumption has changed... This can lead to issues. Reading the documentation for the release. YES, it is good documentation. But it can feel a bit overwhelming because they document so much...
@Marekobi
@Marekobi Месяц назад
This is gold !! :)
@pdougall1
@pdougall1 2 месяца назад
Can ya'll talk about the best way to think about adding indexes? What is the problem when adding too many on a table for instance. Or when to reach for one when a query is slow. Confounding factors when there are other queries using the same column (not sure that's relevant). I'm sure there is a lot to consider that are just unknown unknowns for me.
@NikolaySamokhvalov
@NikolaySamokhvalov Месяц назад
hey Patrick - have you listened to episode "068 Over-indexing"?
@pdougall1
@pdougall1 Месяц назад
@@NikolaySamokhvalov I have not, but definitely will. Also looks like there's one on under indexing as well! Might be exactly what I'm looking for, thanks!
@kirkwolak6735
@kirkwolak6735 2 месяца назад
Michael, thank you for sticking to your guns to get your explanation out there. There is a subtle difference in the AUDIENCE you two seem to be addressing. Nikolay seems to not care about launching a long-running query... Because when he sits down, he likely either knows he has a problem already, OR he's got such deep experience in PG, that he knows to check a few thing before he starts pounding out a query. I believe he implies this when he talks about how he adds the LIMIT based on what he is expecting (eg, when he might be wrong, he will do a LIMIT 2 and let the error guide him). Whereas you were (IMO) driving from a Novice (like me) who *thought* that just adding a LIMIT was *always* a decent safety approach. And my understanding is currently limited to (LIMIT + Order By = Red Flag). Your point goes deeper than that. So, now I realize the correct formula is: (LIMIT + (Order By|Index Range Scan) = Red Flag). Meaning the optimizer might be doing what looks like a simple range scan on some column, but it is orthogonal to the data being found, and can quickly become a semi-seq_scan (find first row with the index, and the seq_scan in reverse until the number of records hit the limit... Which may never happen! Making it scan to the beginning/end). That's two wildly different target audiences. And I could be completely wrong. It's my guess. Of course I look up to both of you, so I apologize if I misstated your positions!
@michristofides
@michristofides 2 месяца назад
Thank you Kirk, for the kind words and the wonderful summary! I think you're spot on, and am glad to hear it was helpful
@pdougall1
@pdougall1 2 месяца назад
Ya'll are great! Its really important to hear professional db people talking about how all of this works in practice. Beyond a basic explanation that can be found in books (books are also really important btw)
@hamzaaitboutou8563
@hamzaaitboutou8563 2 месяца назад
more of this please <3
@iury0x58
@iury0x58 2 месяца назад
Great content, guys! Binging the channel
@iury0x58
@iury0x58 2 месяца назад
Thank you for this content. Very nice
@davidfetter
@davidfetter 2 месяца назад
I just love the way this episode captured the processes that actually go into doing the thing! BTW, the repository for the web site is, as far as I know, also a git repository, and I suspect that rebase requests--NEVER use merge--would be easier to get into it than patches sent to the -hackers mailing list for the core code would be.
@keenmate9719
@keenmate9719 2 месяца назад
Looking forward for this one... paging and limits, it's like naming and cache invalidation :-)
@nishic1
@nishic1 2 месяца назад
Woww.Excellent video..Very informative..
@nitish5924
@nitish5924 2 месяца назад
What about massive updates ? We recently had a usecase where we have a postgres database that has 250 million rows and we introduced a new date column, we are facing so many issues in backfilling this column today. it would be great if you could share your insights on how to handle such massive updates
@NikolaySamokhvalov
@NikolaySamokhvalov 2 месяца назад
it's very similar - batching is very much needed additional complexity is index write amplification - all indexes have to be updated (unlikje for DELETEs), unless it's a HOT UPDATE
@kirkwolak6735
@kirkwolak6735 2 месяца назад
@@NikolaySamokhvalov Excellent point on indexing adding writes. I would certainly add the column. Batch some updates. And only when updates are finished would I consider adding the index on that column. Otherwise it feels like a Footgun!
@awksedgreep
@awksedgreep 2 месяца назад
What you need is UUID's across maybe 25 tables with FKs between each, no on delete cascade, and a need to keep the data from all 25 tables elsewhere(archive schema). Getting <200 deletes a second after moving the data off to the archive schema. Had one set of deletes on a large table going for 20 days. Not speaking from experience or anything. haha
@TimUckun
@TimUckun 3 месяца назад
Why doesn’t Postgres have unsigned integers? Also why doesn’t timestamp with Timezone actually store the Timezone of the timestamp? If my app is crossing time zones I really want to know the Timezone of the writer.
@wstrzalka
@wstrzalka 3 месяца назад
Love your podcast. Its fun to listen for both advanced and basic topics as always something new will pop up. And the attached articles list makes waiting for the next episode more bearable 🐘
@chralexNET
@chralexNET 3 месяца назад
One thing about timestamps. In my current project I only use timestamp without time zone, because I always save UTC for all my dates and times. If I do that I guess there are no problems with using timestamp without time zone then? Edit: Okay so reading more of the wiki article they suggest not doing that, because they think it is going to give you issues if you are going to do calculations with other timestamps that have time zones, but I am never going to do that, because all my dates and times will be in UTC.
@kirkwolak6735
@kirkwolak6735 3 месяца назад
I switched away from that, and one of the reasons was DST.
@chralexNET
@chralexNET 3 месяца назад
@@kirkwolak6735 I don't know why that would cause any issues. UTC is not affected by DST. What a client application does when it receives a timestamp is to work with that timestamp in UTC in its logic, and for displaying the timestamp it explicitly converts it to the local time of the client by adding the time zone offset to the timestamp.
@ilyaportnov181
@ilyaportnov181 3 месяца назад
comparasion of timestamps is nontrivial topic at all due to timezones :)
@andherium
@andherium 3 месяца назад
Would love it if you guys could make a video on schemas and roles
@dartneer
@dartneer 3 месяца назад
a lot of gold bits here! Thank you guys..👏💪
@PostgresTV
@PostgresTV 3 месяца назад
thanks! keep watching - and let us know if there are ideas (there is a doc: docs.google.com/document/d/1PNGSn_d0A7gTR4C0p6geQyHOTgdpitKCRvafCNnT-44/edit)
@sanity_equals_nil
@sanity_equals_nil 3 месяца назад
Would be interesting to listen about how caching is implemented in postgres.
@PostgresTV
@PostgresTV 3 месяца назад
good topic - included to the list of ideas docs.google.com/document/d/1PNGSn_d0A7gTR4C0p6geQyHOTgdpitKCRvafCNnT-44/edit#
@maudrid
@maudrid 3 месяца назад
You mentioned some pitfalls with trigram. Do you have links to any media that goes into more detail?
@NikolaySamokhvalov
@NikolaySamokhvalov 3 месяца назад
Well, the bottom of the docs for pg_trgm already give an idea that things are not easy and require significant effort. a) too low level (cannot be used, for example, with full text search to have fast correction of typos without creation an additional table and a dance around it); b) under heavy loads, for large volumes, GIN fastupdate, pending lists - this will become harder and harder to tune to avoid performance issues (same thing as for any other GIN indexes in general) So these days, I would now perhaps consider some fast LLM for proper typo corrections.
@PostgresTV
@PostgresTV 3 месяца назад
also, what the bot says about it: postgres.ai/chats/018e9250-abff-73fd-af40-1b06ad17919d // Nik
@obacht7
@obacht7 3 месяца назад
Quite philosophical this time. Reminds me of "free as in freedom" vs. "free as in free beer". I will add "you only own what you can destroy" to my list of quotes worth remembering 👍 For me, the podcast is perfect for listening without video. I enjoy every episode!
@obacht7
@obacht7 3 месяца назад
I like the analogy of going to the doctor. It makes me think about the health of some people vs. the health of the systems they build 😁
@DavidTuron1
@DavidTuron1 4 месяца назад
Thanks for podcast. pgBadger is my favorite tool for many years. Very good tip with setting log_min_duration statement to zero for some sort period. I read about the sampling setting but not try jet. And many thanks for songs❤ They are better than original🙂
@anithag6214
@anithag6214 Месяц назад
how to read pgbadger
@DavidTuron1
@DavidTuron1 4 месяца назад
Hi, thanks for the podcast and many useful links. I usually use depesz tool, but i will try yours as well:) I have one tip of maybe strange use explain verbose - sometimes i want to have insert with columns definition, so i use EXPLAIN VERBOSE SELECT * FROM table and just copy column names from output:) I now there are other ways to do that - like pg_dump with --column-inserts or use some GUI tool for generating insert o just get columns from some create table def. or \gdesc but with some painful editing:) Thanks again and have nice day and fast queries.
@maudrid
@maudrid 4 месяца назад
I think that there's a flaw in how postgresql treats superuser. Some things only super user can do. Why not allow me to assign permission to a user that I want to allow to refresh a subscription for example. Now this user has to be a superuser. I've had to write functions that are just wrappers do specific things with execution rights of the superuser.
@jianhe5119
@jianhe5119 4 месяца назад
🎉