Former programmer and current linux/unix administrator I've beaten many of our in house Java/Python(many multi-threaded/processor) routines that massage data with parallel + Awk (or Mawk for even faster performace, given you fit the limitations). We crunch through massive amounts of plain text / hour. Nearly every time the younger staff are perplexed at how this can be (which is in turn perplexing to me how they think they will beat a nearly 40 year old highly optimized program with a few days of coding). Sometimes Mawk will beat my C++ implementations compiled with the most aggressive flags I can set.
@@BlackwaterEl1te If you need to manipulate text, CSV, or basically anything textual nothing beats the ease of use being able to isolate columns, rows, line numbers, regular expressions, etc. Having some of the worlds best computer scientists optimize its code doesn't hurt either.
@@BlackwaterEl1te I recommend the book "Sed & AWK" (Make sure its a later edition), the free pdf is floating around everywhere on the net (O'reilly publisher). Just skip the SED parts if you want. SED is a line editor for quick and dirty regex replacements (its basically built into VIM). AWK is a full programming language.
I use a lot of bash scripts not because I like bash scripts, literally any other scripting language would be better, but if written properly, it just works on almost on anything... Also GNU parallel is awesome.
More like works on almost nothing. Aside from built-ins, every command is an external program that may or may not be on the system. And the program may have different versions or implementations. Sometimes it's GNU tar, sometimes it's BSD tar. To make a platform independent bash script you need to find the common denominator of every target platform. Don't get me wrong, I use bash all the time, but it's because it's actually a very good model for composing programs together.
Parallels is great for anything you could do using multiple tmux sessions. Anything that requires joining and error handling gets overly complicated pretty quickly.
Acadamia has knowledge about GNU parallel for a long time. This is what happens with big tech is so focussed on their own stack. They ignore the amazing open source contributions made by smarter people who know how to simplify things.
Academia is about that money. Most of the tools used in college are used as advertisment. Most students end up with jobs that used those classroom commercial tools. I had to learn about parallel via perl in my sparetime and night job; I was my university's web administrator and saw it in a script that some guy in the 1990s wrote. Thank god, I stumbled upon the old timers' scripts.
I had the same question about the ::: when I started seeing examples of the parallel command and did the same thing went crazy digging around perl, and bash and zsh and fish trying to figure out what this thing was and couldn't find anything but it is part of parallel its how it separates arguments or you can use -arg-file
I surprised prime didn't know that I had to find out as well and found it in his book "GNU Parallel 2018 ,Ole Tange" section 2.1 'Input sources' it's the first thing he talks about. No, I wonder when it was introduced... dam I'll have to do some more digging.
I feel like there are so many amazing tools out there that you would never find in a google search if you search for a specific problem. Isn't there like a comprehensive list of those?
This is kind of surprising that he didn't know ::: is cartesian product (in parallel). Its basically data set math/stat like designs to create a argument list of commands. I n bash cartesian product is like `echo {a..b}{1..2}` and it'll print a1 a2 b1 b2. Its SQL like understanding too. Its used to reduce invocations and source code (especially in maths). He probably knew it though, but just forgot (or only saw it used a handful of times by the solo smart old-timers).
Parallel is awesome, I even used it a few times where I didn't even need parallelization because it's argument parsing and hand handling is so much more powerful than xargs so I just used it instead
this episode was extra amusing as I am currently writing a bash script to auto play Cookie Clicker using only core linux/X11 tools, not just a clicker but full on buying,gardening, combo'ing etc
This could solve a lot of problem at my job, we have a bunch of e2e tests that are crazy, some developer that left try build a multilprocess version with node but it suck and it fail for no reason
yeah, and when after 100+ attempts you finally get all the quoting/escaping/piping just right in your interactive bash session, you can begin fixing quoting/escaping/piping required for a script version
As someone who has programmed altogether too many lines of Bash (all 10 of them were 10 too many), I know I've done a little too much Bashing (excellent name for it, BTW. It's called bashing because that's what I do to the keyboard with my head) when I see any control flow statements more complex than an "if". "While" is right out, and "for" gives me PTSD (dear God, may I never try to implement flags in Bash). Also, anything more than 5 pipes or escape characters on one line are heinous to read, and I've colorized my command prompt. Why does double quote string expansion escape single quotes, I'll never know.
Indeed, as zsh on MacOS says: ❯ ls {1} ::: {1..10} ls: 1: No such file or directory ls: 10: No such file or directory ls: 2: No such file or directory ls: 3: No such file or directory ls: 4: No such file or directory ls: 5: No such file or directory ls: 6: No such file or directory ls: 7: No such file or directory ls: 8: No such file or directory ls: 9: No such file or directory ls: :::: No such file or directory ls: {1}: No such file or directory
found this on SO: Parallel processing makes sense when your work is CPU bound (the CPU does the work, and the peripherals are mostly idle) but here, you are trying to improve the performance of a task which is I/O bound (the CPU is mostly idle, waiting for a busy peripheral). In this situation, adding parallelism will only add congestion, as multiple tasks will be fighting over the already-starved I/O bandwidth between them. On macOS, the system already indexes all your data anyway (including the contents of word-processing documents, PDFs, email messages, etc); there's a friendly magnifying glass on the menu bar at the upper right where you can access a much faster and more versatile search, called Spotlight. (Though I agree that some of the more sophisticated controls of find are missing; and the "user friendly" design gets in the way for me when it guesses what I want, and guesses wrong.)
Parallel was made in like 2002, back when SMP was being written in kernels, and google started promoting python instead of perl. During that time most programs were written to only run on 1 cpu. Parallel allowed you to take a 1 cpu program and use all 4 of your CPUs. Today we have 128-Cores and parallel will apply to all of them; stuff gets done quick. We'd use it in clusters and then Gearman was made. I think Powershell copied alot of that stuff.
Powershell is an actual programming language with a type system, whereas POSIX shell is mostly just calling executables and piping the text of one executable to the next
Anyone remembers which is the video on this channel where @ThePrimeTime spoke about gnu parallel being able to continue to the next command down the pipe without blocking? There were also some diagrams drawn in that video. Found it! thanks to youtube's good search algo, searching "parallel" led to ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pHJmmTivG1k.html
If you know Python forget about this tool and just use Multiprocessing module -> you will have queue, shared memory and more. Bash is very limited and hard to read in terms of just fundamental programming logic.
@@ordinarygg no, but I don’t understand the use case for ansible. I used it to configure machines, that’s it. How do you use it? You have a cluster of n machines, and use ansible to pull latests changes from git repo and run tests across all machines? Or what?
@@greyshopleskin2315 pull from repo? This is security breach to give access machine to run tests to your repo, it should be synced outside. In general yes, but you missed: - collect the results - bootstrap env to run those tests if they are E2E - create a report If you use something like gitlab CI or even worse github actions imagine it will go down or change the syntax LOL. All kids that use free things don't understand one simple rule, if it's free then price is you and your code. Nothing is free in this world.
What's the point of this tool? Basically all modern scripting languages can do the same thing with the same amount of code. Except you aren't using a shell language anymore = Win.
@@ThePrimeTimeagen Yes, there should be a threshold, and in the examples given using an array was unnecessary (that's why we have globs, e.g. `for test in potentially_flaky_*.sh; do...`). Still, there is also a threshold for switching to a 'real' scripting language, since interacting with CLI tools is so much easier in bash than in python, ruby and alike. And using an advanced feature can keep the script more readable if used appropriately.
Btw. brace expansion (à la project/{src,dist}) I don't consider an 'advanced feature', since it's also very practical in interactive use (i.e. in particular when you don't write a script). And it's not even bash specific and supported by practically all shells (fish doesn't have range expansions though).
4:36 why not just use a makefile for this? feels like it would work well enough with the parallelization, but now it's more understandable, and I don't need to download a new utility. (maybe something like uhhhh) ```make # tests T=potentially_flaky_1.sh potentially_flaky_2.sh potentially_flaky_3.sh potentially_flaky_4.sh potentially_flaky_5.sh tests: $T potentially_flaky_%.sh: bash $@ ``` and then just exec the tests with ```sh make -j tests ``` and you can still get output in the correct order with just a couple tweaks to the makefile: ```make # output files O=pft_out_0 pft_out_1 pft_out_2 pft_out_3 pft_out_4 pft_out_5 tests: $O cat $O ; rm $O pft_out_%: potentially_flaky_%.sh > $^ ; bash $^ 1>$@ 2>&1 ```