When I first got into linux all these words "tar", "gzip", "tarball" and their commands were a mystery which took me way longer than 100s to figure out. This video is gold for getting started. Putting all this in perspective in less than 2 minutes is just awesome.
I also like how they are not afraid to get "funny" with it. You have "more" and then "less" ("less is more"), you have weird names like 🔫zip, and tarball, and to get help with something you call the "man"
It's also ridiculously hard to zip/tar things with CLI compared to a GUI. Archiving and compressing being separate commands, weird flags you have to learn and footguns like accidentally zipping every file in a folder individually (happened to me several times).
You're all lucky newer implementations of Tar stripped all the tape-related stuff. Reading through the man page you'd spend an hour to find how to create, or add to an archive, while in the process learning how to rewind the tape, seek the next archive on tape, activate a jukebox-like tape swapper mechanism, wipe a tape, check whether there's enough free space after the last archive to fit yours, switch to a different track on a multi-track tape, set tape drive rpm, and a myriad of others.
@@stickyfingies what "in the days"? Tape drives are used a lot right now in datacenters, like google and amazon, they are great for long term data storage
One interesting difference with zip is that in zip each file is compressed and then archived, whereas in tar.gz you archive all the files and then compress the whole thing. Advantage: Cross-file compression (i.e. repeated patterns across files are used for compression) Disadvantage: You can not extract a single file and leave the rest alone.
That's what's nice in RAR, it has a flag for that. If you want to compress a huge library of pictures or videos for example, you might want to have them independently compressed. But a large database or text-like files or things like a complete game / program you might want to compress the whole archive.
The main use for archives that I see is not for saving my own data, but for distributing collections of files. And normally when you get one of those you want to extract everything.
The actual compression algorithm (deflate) is the same between zip and gzip. Zip adds the archiving around it, while gzip just adds a small header and a checksum, so it's lighter.
Wat. Living in the era of winrar and 7zip, it never occurred to me that archiving was an intermediary step for zipping multiple files, I always assumed the compression program would handle it on it's own. I've never been able to figure out the purpose of tar until this video. So that's what .gz and tar.gz meant.
It also nicely fits with the UNIX philosophy of having small programs doing a single focussed job. tar does the archiving while gzip does the compressing. :)
tarballs sounded so intimidating to me and i thought it was something complicated lmfaoo i always wondered. its so freaking simple. i just made gzipped and gunzipped a couple files and then made a tarball out of them. shrank 250kb to 80kb. Cool stuff lol Just goes to show you how you can get in your own way sometimes. And also how good Fireships videos are.
This man is so considerate, he even included the -v flag (which for starters stands for verbose) just so you can see your files being compressed, truly a born teacher. Here's an explanation of the flags used by him: -c = stands for create, it's the one you'd use the most. -z = tells tar to compress using gzip. There's many other ones you can choose from, which imo gzip being the best. -v = verbose, tells tar to display output in the terminal. -f = stands for filename, usually a directory. You can read most, if not all options using the --help command, with two hyphens. Even if you use windows, using the terminal makes your life way faster if you get used to it. I highly recommend giving a shot to the lifestyle.
@@Gesepp95 Most of the stuff you can use in linux is included in windows as aliases. They should have similar, if not the same functions that what you're used to has.
Oh yeah, compression really played a hand when I implemented a redis caching in the backend. It significantly reduced memory usage and network traffic to cloud redis. And gzip is actually really fast too (we save string to redis as key val).
Nah, you can save different data types as val but not a file if I remember right. What I did is JSON stringified data and compress the string after. Issue with this approach is that Json parse is not enough to convert values like functions, but that's okay cos you shouldn't be caching functions inside redis anyway. Oh and Json parse is crazy fast to in es5
I usually use zstd (Zstandard) for files when the portability of gzip or zip is not required. Also lz4 is pretty cool to use in applications because of its ultra-low compression and decompression overhead.
@@lawrencedoliveiro9104 I wouldn't say so. Give tar.zst or 7z created with zstd/brotli/lz4 patch to 100 people, I guess one of them will be able to decompress it. When I experimented with LZ4 i found it's useless for anything, but sparse data (such as segmentation labels where index does not change often) and then it's perfect - something like run-length compression. Am I wrong about LZ4? One I stumbled on lossy compression of scientific data that had maximum allowed error and that was likely using some kind of predictor (basically it compressed n-dimensional arrays of integers or floating point), but I have no idea what was the name and if it was free (I think it was not).
I had to write my own implementation of the Huffman encoding algorithm for a computer science class back in the day. Compression and data encoding is a really interesting subject! It's an interesting scientific problem to figure out how to represent lossless information in the last amount of bits possible.
While an undeniably good developer, Jeff really shines with his ability to synthesize and condense down information. These videos are snappy, engaging, and eloquent. Thank you a million, Jeff.
Yes, it is definitely becoming more popular. Blender has removed the option for gzip compression when saving its documents (though it can still read existing gzip-compressed ones) in favour of zstd.
p sure you can use -a instead of -z with tar and it will determine which compression algorithm to use based on the extension of the archive file name you provide
Compared to LZMA and xzip, gzip isn't the most efficient at compression, but its ubiquitous. And the original developers didn't become miserable drunks long after releasing it. Mark Adler (you misspelled his name) still works at NASA.
Which one is LZMA again? Is it the one where you'd get unbelievably tiny files, but it takes super long to compress and decompress? Would also recommend lrzip, faster and smaller file size than bz2
Shoutout to the people behind pako for removing one of the most important functionalities from their project making me spend another half an hour trying to figure out how to use it
@Fireship 1:35 You're incorrect about the compression ratio. A ratio of 90.3% means that the size of the compressed file equals 90.3% of that of the original file, which means that the lower the ratio, the more compressed it is.
I'd be interested in seeing an implementation of server-side to client-side compression and decompression. Would you be able to make a video on that? It could also be cool to show off brotli too :)
*and the servers. Though you do need to explicitly enable compression inside nginx, etc. sometimes if you are not using a managed hosting and running your own VPS, Cloud Instance, whatnot.
it definitely wouldn't have fit in this 100 seconds, but it would be cool for a quick explanation about the vulnerability of using gzip with https and why its a problem.
Can you do some "in 100 seconds" videos on different IDE's and build tools? Would be really cool to see a condensed explanation of VSCode, WebStorm, Postman, etc.
VS Code is a general purpose code editor that plays nice with all languages, as long as there's an extension for one. WebStorm, if I recall correctly, is a highly specialized IDE for writing JavaScript and related stuff, TypeScript, etc. I presume. It has specific features to work with JavaScript built into it that you either can't get on VSCode or need a ton of extensions to replicate. Postman is an app that's used to test API endpoints and develop API systems.
@@shlongchad6159 I've used all of these specific pieces of software and know them well, just figured they would be a good starting point if Fireship decides to branch out to development tools.
You usual excellent super-fast firehose of information, informative to anyone whose brain has "turbo mode". However, despite your usual perfection, this one video has a flaw! Yes, hard to believe but it's ture! I mean, true! The one inventor, Mark Adler, is not named Mark Alder. He's well known in the NASA/JPL space mission community, besides co-inventing a great compression algorithm.
The key point not mentioned here is that information content is a statistical concept. The closer your data stream looks like to random noise, the less compressible it is. It is inherent in the design of compression algorithms that they produce a data stream that looks very much like random noise.
As a note, the superior operating system (macOS) also has gzip built in give that it runs on a Unix subsystem (better then the cheap imitation known as linux)
maybe a nice video idea could be a overview of some pf the popular package manager like apt, pacman, apk, maybe winget, so you can update all systems without googeling the distro+package manager name to search the update command?
Still waiting for TLA+ in 100 Seconds. I mean who needs practical videos when you can talk about formal specification language for concurrent systems and distributed systems?
Can you explain why gzip compressing takes so long (when rapidly animating images) in JavaScript. I threw in 0 compression level and it still took more time than my custom 2d renderer rendering multiple elements.
If your images use a compression like JPEG, then they already look very much like random noise. In which case further compression isn’t going to work very well.
Images are usually already compressed and as binary data you're not going to get a lot more improvement in the process. SVGs are almost always text and compress very efficiently.
@@KingThrillgore SVGs are wonderful. I know clients that send PNG icons. PNG icons can get be so large, converting to WEBP and shrinking the icons definitely help a little, but still not > 1KB like you would see in an equivalent SVG icon.
@@linuxization4205zlib compression still took a decent amount of time in 0 compression level (even though all it does is add a padding around a buffer). I can't imagine concatenating 2 strings and taking that long.
I always chuckle as I tar cvzf a bunch of files to my 128 GB usb drive when I recall this command was about *TAPE*. Almost as silly as Qwerty keymap was about type writer.
When I first started getting into Linux in the early 2000s I was so confused by gzip. Then I discovered how awesome RAR and 7z are. Back then you’d leave a download running all night just to hopefully clear ~700MB download just for it to fail right at the end. 🤦♂️ f’ing dialup
Wouldn't it be smarter to make that 700MB into smaller chunks to mitigate connection issues like dialup? I always thought early 2000s files were chunked into millions of tiny pieces but seems like that's not the case.
@@siddiki9778 Linux isos I tried downloading in the day were in one large file. DSL wasn’t u heard of, we just didn’t have it at my house. Getting a second phone line for dialup? That was the extent of luxury 🤣
I really love these kind of videos. Short & sweet. You really feel like you've learned some fundamental (and interesting!) things in a very short amount of time
For guys who is a use python and still don’t understand: gzip scans your code Analyze everything If (I don’t know how much code you did, but I make an text with 1 GB and my laptop smells like fire) are the same, then: b = len(your code) def a(): print(your scanned code with the same meaning) print(the rest of the code) print(your same scanned code with the same meaning, but with a different function and code) print(the rest of the code) …. …. …. for _ in range(b) a()
This saved me in production. I saw a massive reduction of 70% in case of large nested json lists having similar keys. It reduced 400+KB data size to approx 100+KB in my case
There are some inaccuracies. I believe that deflate compression comes was either stolen/inspired from lharc or was invented by Phil Katz (PKARC, PKZIP) by some modification to the original code. Mark Alder and the other guy just took deflate compression and used it to replace compress for single file compression, cause TAR is used for metadata on Unix systems (likely becase there were not requirements as storing compressed files directly to a bunch of floppies as on MSDOS machine with small or no harddrive). In simplicity, ZIP is like many gzip files attached to the end of each other with a central directory attached to the end, so it allows to decompress the archive from the start in a sequence or listing it's content and finding specific file to unzip. Tar+gz has better compression, because everything is put together and it compresses repeated patterns across files.
gRPC in 100 Seconds please!! We’ve started using gRPC at our company and love it, mostly for the type enforcement and reduced payload size thanks to sending information over a binary stream instead of human readable, inefficient-to-parse JSON
You don't have to type anything in the command line on linux! I just right click and choose compress. This can be on one file or many. I don't know why most linux videos try to scare people with the command line way of doing everything. Linux is just like windows. I never use the command line and I have been using linux for over 10 years.
i almost never use gzip xz -9 with 12 threads and a 24 GB memory limit can compress really well, whereas zstd can compress and decompress extremely fast if you need something in the middle, xz -5
The GNU version of many Unix utilities has flags to control additional functionality that has become part of the Single Unix Spec. c creates a new archive, v provides verbose output, f assumes yes on every prompt.
In a course I learned about gzip. You can't gzip a gzup file but on the next I learned of bzip2 and bzipped the gzip file and then gzipped the bzip of the gzip file and I just kept doing it to see how small I could make it, it eventually stopped getting smaller and the file extension went to a newline, I had to unzip it all. And on the next page, I learned of the xzip file...