Your videos remind me why I fell in love with programming. They're also more informative than most of the university classes I recall. Good for the mind and the soul.
Mr. Zozin congratulations on 100K subs! (it's funny how the "vod's" channel has more subs than the "main" channel lol). Anyways i just want to thank you for being the cool guy that you are. I really love programming, but i cannot keep programming much on my own because i get discouraged or distracted really easily. But watching your videos give me motivation to code again. So thank you really much! again congrats on 100K (you deserve more than 100K). love from maldives 💖
Love the rant about the lexer, that's a reason I usually just end up writing all my own stuff... A friend of mine always like to tell me "why do you waste your time writing your own stuff when there are perfectly good libraries you could use", I tell him A) what fun is that and B) stuff like you've just found. lol :)
I saw QBE in the recording of the Hare stream and was immediately excited... last time I build a toy language was in college like 18 years ago. But I had to use LLVM when contributing to one or two other language projects and it definitely felt too big (mentally and diskwise...). Cranelift seems like another simpler option but QBE takes it to another level.
Zig still uses LLVM. They plan to keep using LLVM for the foreseeable future, and plan to continue supporting using LLVM as a backend in the long-run. The plan is to *eventually* not depend on LLVM to compile Zig itself, as it causes quite a bit of friction with Zig's own development, and furthermore, LLVM is very slow to compile. The Zig team plans to slowly over time build up their own internal backend, that you can choose to use instead of LLVM. In the short-term, this custom backend is only going to be meant for Debug builds, where you want to build the project quick, test bugs and features, make an edit, and recompile it in an instant-basically almost no optimizations. In the long term, the custom backend should be good enough to compete with LLVM, while hopefully learning from the mistakes that have been found over the years.
Yes, it definitely can. Hare does provide emacs configuration (syntax highlighter) alongside. Besides, you can always write your own syntax highlighter for a specific language.
I am putting QBE under “I want to build compilers, but I have no patience to try and learn GCC and Clang because they are somehow longer then Wheel of Time” I even have the dragon book. Lets do this.
It is impossible for a hobbyist to build something close to LLVM or GCC. LLVM is 20 millions lines of C++ code. For comparision, Linux Kernel is 35 millions lines of C code. Remember that C++ is less verbose than equivalent C code, wich make them even closer in complexity. Also half of Linux code is just drivers, so you could say that it is less complex than LLVM in some aspects.
I wish I knew what exactly that number is. Long time ago it was around 8GiB, then it raised to 63, now it is around 57... I never cared about that number, but since it is changing so much - it started to trigger my curiosity... 😄
@@ecosta It's obvious that at the time he had 8GiB, he thought that wasn't enough and decided to "update" his "collection" to, for example, 63GiB. And from that moment he deleted another 8GiB since he has already "researched" it lol.
This guy is cool, knowledgeable, funny and one of the best teachers i've seen. Not bad at all :-) Edit. Aaand the expression compiles to 69, also known as urmom 😂.
From the user perspective prefixes were useful in the absence of LSPs and other code analysis tools to understand what you are looking at, it's for the same raison we see p_*,m_* conventions in old codebases
Depends on the programmer really. I've always hated Hungarian notation, especially as I'm typing something in. Even now that tab completion is everywhere, I still prefer to not use it.
some versions of C (pre C89) had global identifiers for structs. so people would also prefix things with the struct name. struct Foo { int foo_value; bool foo_enabled; } it was... not a great time
Coming from a functional background, it is mind-blowing to me that funcs and vars would live in separate namespaces. Lambdas evaluate to closures ... and closures are just vals like any other.
As regards SSA: People don't normally write SSA by-hand ... rather, they'll implement an automated correctness-preserving transformation to SSA as one of the intermediate phases of their compiler, basically for the reasons that you said (further static analysis is way easier on already-SSA'ed code -- dead code elim, const prop, registerisation, liveness, etc.).
Awesome stream ... would be fun to see something with a bigger impedance mismatch -- maybe Scheme -- compiled to QBE ... Scheme's parsing is basically easier than falling off a log (since the programmer is basically writing the AST directly), but dealing with call/cc and lambdas would be ... a fun challenge to represent in the QBE instructions.
2:30:00 that in particular looks like a bug: on in stb_lexer.h line 565 it sets string_len to n (aka 0) before the loop looking at the characters, but then doesn't set it after the loop when it is done. moreover it reuses the string storage, so if you have 2 different identifiers it the string view of the previous one will get overwritten.
13:40 zozin talking about storing names as various types, like tagged unions, in tables while I'm in the process of creating a tagged union to put in my symbol tables.
zig used to use llvm(still do i guess) but theyre in the process of migrating into their own backend for reasons similiar to yours which i find very exciting
Llvm will become a optional backend, as the self-hosted backend only support x86, ARM, Risc-v and Wasm. So you still need LLVM to compile for MIPS or Sparc for example. Also the new backend is optimized for fast compile times, it is almost impossible to compete with LLVM in optimizations, so to ship the fastest binary for the user, LLVM still the best.
2:44:30 you have to copy the constant into that temporary. you can do this like `%s0 =w copy 3` or whatever. I got stumped on this too when trying to figure it out for my own language edit: nvm you figured this out later lmao
Nooo....don't let it leak. Sure the OS cleans up for you most of the time. But sometimes someone will want to build your code to run in an environment where it does not. I had this problem with a compiler for the Spin language for the Parallax Inc. Propeller micro-controller. I built the copier with Emscripten into asm.js to run in the browser. Problem was everyone it was run it leaked and eventually crashed the page. The emscripten run time did not clean up. Luckily I convinced the author to fix that leak and all was well eventually.
BNF is not useful? I grew up with yacc and bison. There is no better way of understanding the syntax of a language than the condensed DESCRIPTION of the language in that metasyntax form. Okay, with the BNF of C++ (which isn't possible to describe C++ in that way, as a sidenote) your head will blow up. But you do not have to look at all the "arms". You can slowly crawl down from up to the bottom, which also represents DETAIL. Go as far as your head-internal-pressure allows. Don't be afraid!?!:)
You should be fair in regards of LLVM. OF COURSE it is "bloated". Look at the options and possibilities it gives you. For me, targeting X86, ARM, MIPS and RISCV this package is a gift ... all under one hood and standardized. Oh and BTW I am on Gentoo. So don't tell me what is "Bloat" as I have to recompile LLVM every time when there is an update. Oh ... and who would've guessed it? I compile only THE BACKENDS I USE! You didn't know that this is an option and absolutely speeds up compilation times and also space requirements? Now you know! For others that get a binary package ... WTF? You install that in seconds. It is 2024, so who cares about 1 or 2 GB disk space? Period! There is nothing wrong with using a dedicated solution like QBE or specialized compilers, f.e. for older hardware or electronic projects, etc. But that is a question of using "the right tool for the job", which is YOUR responsibility (and shows your expertise ... or not). Acting like a child and beaching(spelling) around about general purpose compiler infrastructure, which is OF COURSE big in every aspect, doesn't help you in any way besides getting your steam off. There are better ways, like touching grass, or getting laid, don't you think?:P P.S.: Oh and this comment is not directed at Tsoding. He is joking. "... React of compilers", hehehe. So do I. This comment is for YOU. Yes you, angry looking motherboard who didn't even understood my "tool" analogy, with that stick up your back:P Have fun coding and thanks for the great video!:)
51:35 Not only is raylib pretty easy to implement and work with, but it also has bindings for basically every language under the sun. It is pretty awesome lol. The fact that you can interface with it surprisingly easily from Assembly is icing on the cake. Name another framework that can do all of that! Lol
My couple of cents: 1. Using "$" and "%" in IR is justifiable - as a language writer, you can define symbols without any keyword clash. Example: write a function named "$export" which compiles to a symbol named "export" - perfect valid as a C function, but it would be a mess if QBE IR had no symbol prefix. 2. That "string_len" in the lexer probably makes only sense if you parse a literal string. I haven't checked the code, but I guess literals are pointer-copied to that "string" attribute - that would be a optimisation to avoid copying/strlen a string constant. That's why I favour writing my own libs when programming for fun. There is always some nasty surprise from an undocumented/unexpected behaviour that takes hours or more to troubleshoot.
The begin end convention also allows a nice way to describe an empty range. begin=end. The begin end token's does not need to point to anything real. Null is fine.
The problem with QBE is that it doesn't have a way to add debugging information. This makes it a no-started for me as any language that can't be easily debugged is useless IMHO. I know they are working on adding some support for it.
Yes, another language. That's why I initially subscribed to Tsoding Daily: because of the language creation content. :) What about creating a minimalist yet useful language? Not another Porth, though. Maybe a language that includes just: if, elif, else, while, variables, and functions. That would essentially be Turing complete. Perhaps arrays could be added, but that’s about it. The goal would be to have as few keywords as possible. Or, implementing some sort of old-school BASIC could be fun. You know, the kind without proper loops, relying solely on 'goto' and 'gosub' for control flow.
Make a fast compiler for a small scripting language? Such that it is super-easy to integrate into C/rust applications like games, but does not do any IO (such that it is designed for embedding into other apps from ground up, but compiled into native code).
It looks like the trend is to either slap your wrist before you even start or try to read your mind before you code. Basically, it's getting to the point where you're almost scared to hit the keyboard, lol.
@@paulredmond6467 simplicity meaning it would be easier to read the source code and step through the code while debugging if some just wants to see how things are implemented and how they work.
Having looked at a fair amount of qbe output in the context of Hare, it's definitely better than directly spitting out unoptimized asm like tcc, and can even elide some bounds checks, but definitely worse than LLVM or GCC (as expected). I'd say the biggest problem for Hare at least is the handling of aggregate function parameter and return types. As far as I can tell ABI lowering happens after most optimization passes, so you get tons of unnecessary copying around that won't get optimized out. There's definitely also more register shuffling than bigger compilers would produce. I'd expect both of these to improve eventually. Personally I also think qbe should have the ability to generate blits with a non-constant size parameter, falling back to memcpy if it can't work out the size statically or it's too big. At the moment harec just uses a constant threshold for struct copies, and always uses memcpy for slices. But I don't know what the Hare or qbe people would say about that. Oh, and qbe at the moment is _way_ below the "10% of LLVM LoC" goal (and probably also the performance one...), so there's room for improvement left for sure.
Zig uses LLVM indeed, but they are working on their own backend and their plan is to throw LLVM, mostly because it's slow and bloated. And they might succeed, they are very focused on that
around 1:51:20 . Absolutely true. The -React- sorry, the Rust developers got on a path where they thought it was a good Outlook(pun intended) to the future. But in many cases this looks like patronizing the Rust language user. And another disturbing point: That "thing" we are talking about here is actually called "clippy". You remember that annoying BS from the Windows/Office 97/98-2003 (internal name "Clippit", publicly used was "Clippy") that permanently wants to "fix" and harass the User with its demands and "well-intentioned advice" = YOU!? So, the question is: Are the Rust developers comedians, or are they outright evil? Hehehehe:) (Switching of that clippy BS, including warnings needs its own manual itself, LOL. And wow: Entries in the actual code. Like "#[allow(clippy::wrong_self_convention)]". What a joke!:) Or, examples: rustc -A dead_code / RUSTFLAGS="$RUSTFLAGS -A dead_code" cargo build look into "RUSTFLAGS=-Awarnings", when you are down on your knees:) Edit: I think this plague was introduced by languages like C# and their extensions/helper, like StyleCop, etc. C# a wonderful but also horrible language. A mish-mash and disaster that actually needs such documentation helpers and external styling tools, because of oversights of the language developers and what this abomination lacks of features. Also "Design Principles", all those fancy hard rules and all this great advice and guidelines that suffocate the programmer like a corset. Test Driven BS. And last but not least: effing AGILE Brainfart ... Not only "should" ... your "friends" will clearly suggest that to you! ... the professional programmer follow all those rules and "ideas", but also the hobby programmer, because it is better to learn it the "right way", yeah ... surely it is. Maybe you should question your friendship with those plonkers?!:) All what I addressed has its use and can hopefully be switched off. Nonetheless forms every single piece of this modern infrastructure together not only a great toolbox, that may help you and give you good advice, but it also is a tool that easily allows to suppress individuality, own thoughts and worse: critical thinking. It also may have negative effects on creativity, deep knowledge or deepening your knowledge (WhyTF shudd I leearn orthUgraphI, whenn I has audo-corect? LOL) and is always biased in favor of the mainstream opinion and established values (especially A.I. BS, which is by design an horror example on that matter). Oh and all of those nice anti-democratic, anti-freedom, anti-artist, anti-human features are ENABLED BY DEFAULT! (Ask your management if he can wipe his own a... **g** )
Odin uses LLVM indeed, but I think they're planning of getting rid of it. Ginger Bill has stated in interviews that LLVM is the bottleneck of Odin's compile times.
I have no idea, but having developed debuggers, I'm guessing @start either means that the function prologue is done or that it starts there. Also you're not supposed to write IR directly. You're supposed to translate your higher level constructs, into IR. The QBE IR looks exactly like LLVM IR (also looks a bit like wasm if i'm not mistaken, which also is just another form of IR).
Would have needed this 15 years ago, when i had to use C as a target output. How does it handle stack unwinding, multiple results, stack switching and all this things that are not possible in C? Everything is easy when you have an add function. But C itself is limited and unfortunately had more influence to language design then necessary. In the good old SPARC register file days you had upto 7 return variables build into hardware. Also how is this working with incremental compilation? Recompiling only when offset values change? Okay, this one you can add with comments and string compares as i did, but whats about linkers? Unfortunately LLVM is not language independent for a decade or more, its total focused on creating a C++ compiler (or something like this). Linking was the main problem i had with my language. Like Jonathan Blow, i was writting on a system that should build one milion line program in under a second (incremental) and five seconds (full compile). Got the compiler that way but failed on the linking terrible.
firstly i was like "wow this dude is making his own garbage collector kind of, fucking json in the code, efficient debugger almost like a visual studio, lists ects like as is he writing some c# that's fucking cool" then the last 30mins reminded me why you shouldn't use C 🤣
fn is bad because it's too short. f sub n is definitely a name you could encounter while translating math into code, and "f_n" doesn't look very appealing. Of course unless it's not reserved despite being a keyword, but that's even worse.
around 1:49:00 . If you do not like LSPs (Language Server Protocol implementations, Immediate Error/Syntax Helpers), or syntax-highlighters, tags, clang-format and all of that fuzz ... I have a REVOLUTIONARY NEW IDEA here for you: SWITCH THEM OFF! (Instead of moaning and whining around **g** ) I know, I am the world famous inventor of off-switches(ROTFL) and such a concept lies in my nature. But I shared that idea with you all. Now you know!:))))) No offense, Tsoding! I'm just kidding you:P
IMO all lexers suck. Since discovering PEG / packrat parsers that have inlined lexing, I've never written separate lexing / parsing passes ... might be a fun thing to consider in the future.
Please don't create any more programming languages, reading the documentation about QBE I see that it seems to be a mix of C and Assembly. The programming languages of the future are natural languages.
Odin is not *making* the backend. Cuik and Tilde already existed and odin just wants to target it besides LLVM. Which is a good move, over 90% of Odin's compile times are due to LLVM