Тёмный

Zip vs Tar.gz Files Explained and Compared (Archiving and the DEFLATE algorithm) 

Tony Tascioglu
Подписаться 4 тыс.
Просмотров 9 тыс.
50% 1

In this video, I explain the similarities and differences between two popular compressed archive formats, zip on Windows, and tar.gz in the *nix world. Both formats typically use the same compression, and also serve as a way to collect files together in an archive, however, there are a some fundamental differences between how they work and why they're used for different purposes.
I discuss the two main steps of compression and linearizing the files into an archive, and how each stage differs between the two formats, as well as the advantages and disadvantages of each implementation. I hope you enjoy the video and learn something new!
This is yet another video that is a bit rambly at times (having cut it down from over 30 minutes). I have added timestamps so you can easily skip between the sections you're interested in. I apologize in advance for the clipped (and heavily compressed) audio, this was recorded accidentally with a 10 dB boost on the mic, on a 128K MP3 recorder making for a questionable combination.
Timestamps
Introduction
00:00 - Introduction
00:19 - What is a zip and tar file?
01:45 - Gzip (and other compressors)
02:45 - Why use tar instead of zip on Linux?
What's the difference?
03:48 - How does tar work?
04:39 - How does zip work?
Comparison
05:09 - The advantages of zip
06:46 - The advantage of tar.gz (and streaming compression)
08:30 - The disadvantage of tar
Further learning
09:30 - Notes on 7zip and .gz.tar
10:40 - Indexed tar files with pixz (and comparison to 7z)
13:30 - What should you use?
15:10 - Conclusion
Links (Get Smarter Section)
DEFLATE algorithm:
Wiki: en.wikipedia.org/wiki/Deflate
How DEFLATE works (good summary): zlib.net/feldspar.html
Full specification: datatracker.ietf.org/doc/html...
TAR format:
Wiki: en.wikipedia.org/wiki/Tar_(co...)
Man page: linux.die.net/man/1/tar
Tar format specs: www.gnu.org/software/tar/manu...
Gzip: (based on DEFLATE)
Wiki: en.wikipedia.org/wiki/Gzip
Homepage: www.gnu.org/software/gzip/
Bzip2:
Wiki: en.wikipedia.org/wiki/Bzip2
Homepage: www.sourceware.org/bzip2/
XZ utils: (LZMA2 compression)
Wiki: en.wikipedia.org/wiki/XZ_Utils
Homepage: tukaani.org/xz/
pixz: (parallel indexed xz)
github.com/vasi/pixz
pigz: (parallel implementation of gz)
github.com/madler/pigz
Lzip: (also based on LZMA2)
Homepage: www.nongnu.org/lzip/
LZMA2 Compression:
Wiki: en.wikipedia.org/wiki/Lempel%...
Z-Standard Compression: (aka zstd)
Wiki: en.wikipedia.org/wiki/Zstd
Homepage: facebook.github.io/zstd/
Source: github.com/facebook/zstd
7zip: (also generally LZMA)
Wiki: en.wikipedia.org/wiki/7-Zip
Homepage: www.7-zip.org/
Source code: sourceforge.net/projects/seve...
p7zip (POSIX port): p7zip.sourceforge.net/
Zip: (generally DEFLATE)
Wiki: en.wikipedia.org/wiki/ZIP_(fi...)
Specs: pkware.cachefly.net/webdocs/c...
Dar: (competing new format for tar)
dar.linux.free.fr/
Content used:
Zip and Tar icons in thumbnail from FlatIcon.
Ending music is We'll Meet Again by TheFatRat
Clarifications and Corrections
Just to clarify a few things before I get some comments: The 'only decompress the file' benefit I mentioned in zip is because zip (and 7z) keep an index at the front. If you did .gz.tar, you wouldn't get that benefit, as tar isn't indexed. Next, when I say 'the index of the tar file is at the end', what I mean is that if you want the file list (like an index would produce), you need to read the archive to the end as though there is an index there (tar files don't have an index, just a few bytes at the front of each file). So, to get a file list, you need to read those bytes at the start of each file, meaning you have to read the full archive. I hope this clarifies it.
Clarification: pigz is not indexed, pixz is. Pixz is backwards compatible with xz, although both support multithreading these days.
(more to be added)

Наука

Опубликовано:

 

7 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 26   
@eduardmart1237
@eduardmart1237 2 года назад
You make really interesting videos! Especially because they cover not very popular, but really important topics about linux.
@pavelperina7629
@pavelperina7629 2 года назад
Minor stuff: Tar files have directory entry in a blocks aligned to 512B and 512B long ahead of each file. Basically one sector on old discs. Files are then padded by zeros. Zip files have directory entry in a block in front of each file with much less attributes (but more than gzip I believe), so it can be written as a stream and then each directory entry is repeated in the end file. Offset to the central directory is stored at the very end of file. IMHO choices are: zip for maximum compatibility, tar.gz for compatibility within Linux bubble and archiving files including user rights (which is pointless except backups) 7z for maximum compression if it's worth the time and relatively good compatibility within IT bubble lz4 for maximum speed on large sparse data for internal use zstd as a tradeof of good speed and compression for general purpose (beats ZIP's inflate/deflate almost every time in both (de)compression speed and ratio), but for internal use as it's not widespread and it's not archive format so it needs container such as 7zip, but 7zip itself has to be patched to support it
@sknfer
@sknfer Год назад
I was going to sleep then this video popped up , great explanation of various topics, u deserve more subs
@vukanoa
@vukanoa Год назад
This was very well explained. Thank you.
@borsasostorangunt
@borsasostorangunt Год назад
Awesome video, keep making more!
@szymonpiechutowski2340
@szymonpiechutowski2340 2 месяца назад
Thanks for a very useful video!
@aces8481
@aces8481 Год назад
very clear explanation you are a prodigious talent my friend
@TonyTascioglu
@TonyTascioglu Год назад
Thanks!
@Rainy32434
@Rainy32434 2 года назад
Great video, thanks!
@tulsatrash
@tulsatrash Год назад
Woo! Thank you for making this.
@TonyTascioglu
@TonyTascioglu Год назад
Thanks for the kind words!
@randomdamian
@randomdamian 2 месяца назад
Awesome video!
@sharlove3508
@sharlove3508 Год назад
wonderful explanation, ty😎
@TonyTascioglu
@TonyTascioglu Год назад
Thanks!
@I_good_at_alaphabet
@I_good_at_alaphabet Месяц назад
Thank you for this
@ArmandoCalderon
@ArmandoCalderon Год назад
great explanation.
@TonyTascioglu
@TonyTascioglu Год назад
Thanks!
@OscarCedano
@OscarCedano 26 дней назад
Good Video!
@sunnyyoda
@sunnyyoda 5 месяцев назад
Nice ❤
@1aminepro
@1aminepro 2 года назад
new subscriber here, love your content, if only you put that mic down
@ConorFenlon
@ConorFenlon 2 года назад
Would it be possible to convert all files you want to compress to plain text files prior to compression? If the DEFLATE alg works better on text files, that would seem like a good idea, no? Is it more efficient to convert an mp4 to text, then compress, than just compressing the mp4 directly? 🤔 So many questions! 😂 Thanks for the explanations Tony. Keep up the great work! 😁👍🏻
@NielsGx
@NielsGx Год назад
bruh what mp4 is mp4, you can't "translate" it to txt, whatever this even means. when saying txt compress better, he's talking about compressing text that have 26 symbols from the alphabet, and have been designed ti compress well language, and not really for random stuff, because y'know, we use languages lmao
@ConorFenlon
@ConorFenlon Год назад
@@NielsGx Yes, you're absolutely right. We use languages. Like Machine Code, Binary Coded Decimal, Binary, Assembly Code, Hexadecimal. The list goes on and on and on. You can represent an mp4 video (or any other file type) in whatever type of encoding you want. Then we transmit that data using algorithms like BPSK and QPSK using beams of light to shoot the data down massive undea-sea cables from continent to continent. Literally anything is possible. Even the words you're reading from this comment right now have been transmitted by strings of 1s and 0s to explain this to you. But of course, what do I know? I've only been studing Electronic Engineering and Computer Science since before you lost all your milk teeth.
@mgord9518
@mgord9518 2 месяца назад
It's possible but there's no advantage. DEFLATE compresses text better than binary because natural text typically has less entropy. When you convert binary to text (using hex, base64, base91 etc) you cannot magically remove that entropy, so you get seemingly random text that's bigger than the original data
@ilhammega
@ilhammega 8 месяцев назад
Accidantly i get to watch this video. I need tuttorial to convert backup Whatsapp acc in tar.gz to txt. Can you give me tuttorial?
@Bladedomainandhosting
@Bladedomainandhosting 2 года назад
tar tar tvf setuptools-58.0.2.tar.lz find the file you want tar xvf setuptools-58.0.2.tar.lz setuptools-58.0.2/tools/finalize.py no need for it to extract it all :)
Далее
Luiza Rasulova #luizarasulova
00:37
Просмотров 814 тыс.
Cabeças erguidas, galera! 🙌 Vamos pegá-la!
00:10
What turned out better to repeat? #tiktok
00:16
Просмотров 1,8 млн
Explaining File Compression Formats
15:26
Просмотров 140 тыс.
How does Bluetooth Work?
21:35
Просмотров 8 млн
UNIX Inodes and Files (Harry)
14:12
Просмотров 11 тыс.
Secret Key Exchange (Diffie-Hellman) - Computerphile
8:40
HUGE Speed Differences! Flatpak vs. Snap vs. AppImage
11:08
WHY IS THE HEAP SO SLOW?
17:53
Просмотров 207 тыс.