Computer scientist, author and speaker working as a Systems and Database Architect. I've worked with, and contributed to, Open Source tools for 25+ years, and I'm an active member of PostgreSQL Europe and occasional contributor to the PostgreSQL project. This channel contains talks I've given at conferences :)
Hey Jimmy, great talk! I have just one question, is it possible to automate this with pg_partman and pg_cron. Create one current partition and one for the next month, if we're partitioning by month and then remove the default partition which pg_partment will create and attach our legacy one? Have you done something like this?
You don't really need pg_partman for anything, or pg_cron for that matter. You can just automate the partition management directly via cron. (I hate dependencies ;) )
It seems there may be a misunderstanding here: The customer specifically requested this partitioning operation due to operational concerns, and each step involved was discussed and approved. Just to clarify, nobody is recommending renaming tables in production as something you should generally do. Thanks for your comment.
In the talk you stated that pg_dump is not a backup, but never explained why. The first paragraph of the man page for pg_dump says, "pg_dump is a utility for backing up a PostgreSQL database," which seems to disagree with you. ???
Thanks for your comment. pg_dump can only produce logical backups that are useless for things such as point-in-time-recovery. Moreover, it doesn't have automation, monitoring or alerting so you have to provide all those yourself. The documentation has been there since before other PostgreSQL backup tools existed, and is accurate insofar as that pg_dump *can* be used as a backup tool, but I would not consider it a good solution for your only backup method.
A great video, i have some questions: 1. If I my query only select by id (or ids in multiple partition table), performance is good? Do You have a real estimate for this problem? 2. My unique key such as "name", cannot use when partition by create_date. how I resolve this problem 3. Have some another table foreign key into my table, if I create new partition table, it will miss some key? 4. I want to partition date with range base quarter, with old data how I add it to partitioned_table with right parititon. In the last step I need to detach before delete ?
Very well explained and the example provided with details is going to help when handling huge tables for parititioning on an active DB. Thanks for this lecture, much appreciated.
Thanks for the awesome talk. I had a query, will we ever see pg_repack or logical replication type util that will help move data without this age old procedure. With logical replication i can move the db to another db live, but that would be too much.
pg_repack does something different, it can't help in this case. But copying the table onto a partitioned table on another server with logical replication is not as crazy as it sounds.
Thanks a lot Jimmy Angelakos. I have a question with below scenario One huge table work_history is having composite PK (wid and seq_no) Need to be range partitioned on column work_date. Now after partition , it's PK is composite on (wid , seq_no an work_date) which is creating issue for Referenced tables as few of those doesnot have work_date column. could you please suggest some alternate way, thanks.
You can't have UNIQUE constraints that are not part of the partitioning key. Perhaps since it's a history table and presumably gets only inserts with those values filled in from the application, you can drop the uniqueness constraint and have the partitioning key only be work_date. You can rebuild the necessary indexes as non-unique.
Thanks for the nice tutorial. One query, If i set partition by range(date). And i want to create partition only when i receive data for a particular date. How can i create it on the fly
Postgres doesn't support dynamic partition creation out of the box. You can create partitions in advance if you script it though, and you'll only need to do it once a year.
Thnaks for the content. I'm not sure I understood the ""Move data to new table at our own pace" . Why are we moving old data to the new partition table? isn't that data already part of the dailytotal_legacy partition?
Hi, you might not want to leave the old data in one huge partition if there's a need to query it often, so it may need to be moved to smaller partitions.
Slide 17. I think it should be values from ('2021-06-01', 'AAA') to ('2021-07-01', 'BAA'); as TO is exclusive; unless there is no actual AZZ for location_code
Thanks for your comment; you are right that it is upper bound exclusive. In this case it doesn't really matter, as this is just meant to show a common misconception due to confusing syntax.
Thanks for your comment. This operation can't be done for free anyway, and that's exactly what autovacuum is there for. I would actually suggest increasing its limits before starting, because it's better to trade off a little performance than accumulate bloat. The point of this exercise is to perform a necessary action without blocking usage.
@@JimmyAngelakos in Our case, we use different approach to handle this data type issue with minimum downtime from apps, the downtime less than 5 mins actually
32TB is one installation of Call of Duty and a few jpegs. LOL Good vid. Data partitioning is serious stuff and a concept that I don't see being addressed frequently, so good stuff.
In the minute 28:19 you said that the transactions ID space is circular with a visibility horizon. I don't get what you mean exactly by "T 9999 It's visible to you (T 10000) because you can see the results". Do you mean this is depending on the isolation level in which each transaction is running? Or do you mean the results are visible if the T 9999 is committed? So from this, we know that T 10001 will not be visible no matter at what isolation level each T is running. Because even if T 10000 takes way longer to the extent that 10001 is started and committed, it still cannot see its results regardless of the isolation level.
Correct. If 9999 has committed, 10000 will be able to see its changes. But row versions with an insertion id (xmin) of 10001 can never be visible within 10000.
I have little confusion in last step about anonymous query block. Does all that operation would be in single transaction BEGIN; 1. rename 2. create new table and indices 3. create new partition 4. attach old partition 5. anonymous stored procedure COMMIT; ?
Hi Imran, yes, you can use an anonymous code block with DO inside a transaction block, provided you don't use transaction control statements inside the DO block.
Yeah definitely this is one of those big problems DBA encounters in large and busy environments. The solution is always painful but you described here a graceful solution. Thanks for sharing, best regards from Brazil.
@@JimmyAngelakoswould partitioning hep with a big table thats mostly handling random updates (OLTP) r.g an acccounts tables with many users updating the account balances
@@jirehla-ab1671 It depends on the size of the table and what you are trying to do. If you choose a partition key that allows you to perform efficient pruning then it should help with query performance.