I discovered this man after he died, and I feel such regret for his passing, even though I never knew him. When he speaks about computers, thing's just make sense.
:-( I met him once at NDC, very nice guy. when he told me what he did, i was like, ahhh..... and was reminded of the hanslman joke of what do you say to the creator of x.... "your leige"
I love the way my grandfather explained things when I was small. Listening to Joe here was very reminiscent of that. It was like getting a good explanation from my grandfather if my grandfather were a brilliant technologist. I thing the voices of some of the experienced elders in the development world is something us younglings are severly lacking in the days of Brogramming. Love this.
Joe Armstrong is sheer brilliance here. Outstanding. Even two years after the event his words still ring true - if not more. There really is a LOT of work for the new generation coders out there.
Joe says around 7:50 if you don't why your program doesn't work you simply ask google. Not sure if you guys feel the same like me, but recently it is harder and harder to find proper solution in Google search.
I am keeping a pdp11/93 (sounds big, but is not) running. It was built around 1990 I think and has 4 MB of RAM. This thing hosts a website and supports a dozen people using it _simultaneously_ without noticeable slowdown. Apart from graphics, it can do almost everything a modern computer can do. You can write texts and documents, do calculations, create programs in various different languages, and of course people can chat and send each other files. When I look at a modern computer, I can see a lot of progress, but this progress seems to have been made in the wrong areas. Some common tasks you would do with a computer actually take longer on contemporary computers than on the old beast.
@@crides0 True but not always. It's pretty obvious when changing the quality in a game from max to lowest doesn't fix lag, because the CPU's struggling to keep up with the inefficient code that's holding everything up.
@@crides0 Idk, there's not much to change about graphics on desktop devices. 1080p@60hz is totally fine for everything but gaming or watching movies and here I am running my system with more eye candy than a stock install of either Windows or macOS has and yet difference between eye candy turned on and eye candy turned off is literally not noticeable (not even when looking at the resource consumption). I believe that as time progresses, we will instead find other ways of destroying all the advancements we make in hardware, such as shipping a desktop application not as an executable but rather as an entire fucking browser coupled together with a secound JS runtime so that we can then show a web page which is in fact only there for us to change some HTML elements around using a ton of JS (aka Electron in case anyone wonders).
I find that I do write really big comments - when I am thinking through the problem. Then they quickly become irrelevant, or even misleading, as you refine the approach, refactor, and generally do things differently. To keep the comments in alignment with the code doubles your effort.
I find that if you find yourself "needing" to write a comment to explain what the code does, then you haven't written the code in the best way and need to rethink it.
Armstrong has done well in this lecture. It is a pity he passed away so soon before he could see any real progress in fixing the mess we are in. But I do want to stress what he said about comments. It is so important to do them, I have long been of the unpopular opinion that anyone who does not take advantage of Javadoc (or the equivalent in your language of choice) to document all public classes and methods should not be allowed to even check in the code. Personally, I often write doc blocks for private and package access classes/methods as well, when it is clearly necessary because the doc blocks on public alone do not tell a coherent story. But even if you *do* religiously document them all, it it all too easy to fall short, explaining the obvious while leaving out what is really important -- Armstrong gave a funny extreme example, "now here's the tricky bit". As for specifications, the first thing that came to my mind is all those allegedly 'agile' methodologies that seduce both management and programmers with the siren song of less documentation. These methodologies are to blame for many people who use so little documentation their 'specification' is woefully incomplete and even more woefully out of date.
I used to work on commercial code where this was spelled /* If you have to understand this part then you've already lost */ It was pretty much correct.
very astute 'Seven deadly sins' slide, I find, including myself, number 7, 'Code that you wrote without understanding the problem' to be very common, always in a rush and later during a walk or something thinking back to a block of code, or its logic, and a more complete/elegant solution occurs to me
I think the real issue he had was not duplication of raw data, but rather duplication of data with similar meaning (as duplication of raw data is trivial to avoid with a simple comparison). And thats where the difficulty of the task is revealed - the only compression algorithm capable of the task is a human mind or general artificial intelligence capable of extracting the meaning. It must even be specified from what point of view similarity will be evaluated (can two pieces of text that say the same thing but in different languages be combined? what about two stories that are different but convey the same meaning through a metaphor? two computer programs that compute the same result but one with horrible cpu use and other with horrible memory use?) And that is how the problem is managed today. Humans (As the only compressors up to the task), gather meaning and compress it into products of increasing and increasing sophistication. So while his algorithm is correct, its not so useful because in the end its just a re-expression of the problem. Thinking about compression in relation to the task of programming is useful though (its what we do when creating those abstract models, instead of directly mapping every input to the desired output in a huge lookup table). But even there, the human does the compression. We reduce the entropy of the program by releasing some into the environment. When we get some sort of general AI, computers can do that for us.
I wouldn't be surprised if at that point the AI will be already sick enough at us for asking it to create petabytes of bad renditions of hands that it'll be sick of us. And even more surprised at the idea of humans trying to reduce bloat.
Yeah his point is a bit silly because it's an absurdly complex problem. It's the biggest problem that social beings have been working on since the inception of consciousness. How do you get two people to agree with each other? Well first they need to identify all the information they share but represent differently. Failures at this task are the cause of all disagreement. His proposition is akin to advocating world peace. I'm with him 100% and I love his perspective, but it's not clear that there's anything actionable here.
"Similar meaning" in human language may be impossibly hard, but not in programming. In Unison, for example, the names of functions and variables are discarded, and only their hashes used instead, which means little is left besides raw meaning/uniqueness. Then they can be compared in the usual/cheap way, data equivalence. Yet they are effectively comparing what in other languages would be "similar" things...
We were very good at making things compact and efficient, but then we got this ever-increasing level of power, computation and storage, and we stopped caring. Plus, we've put ourselves in a situation where we are punished by being more efficient; if you optimize away the work you do, you remove the need for yourself. This is a fundamental problem for humanity, no less. We have to overcome this, sooner or later, or we are doomed.
As a thought - the way things evolve in the pogramming/computer world is in a way very similar to the way the natural world evolves - which then means, there isn't very much we can do about it...
Regarding efficiency, if I wait 10 years for it to get 1,000 times faster, I'll also be processing 1,000 times more data and rendering it to 1,000,000 times more pixels.
totally agree. things get faster and things just get more complex and bloated. tech isn't really saving anyone time now days. it's wasting a lot of time.
Amazing talk - really funny and at the same time, it present a very unique and interesting side to programming. As weird as the idea of no names is, it sounds quite interesting.
I love his file (program/data, hash, "similarity hash", interval_tree, pairs_of) Least Compression Difference approach. :) I've thought before about compression the "stupid" way, using Minimum Description Length and trying a lot of programs in a given algorithm language for output most similar to that file. While the cost _could_ be prohibitively problematically prohibitive the advantage is enormous, you get a result that looks a bit like a black box, but isn't entirely, and _is_ a really good minimal representation of the _meaning_ of the thing.
I actually used his compression idea (to determine the "universal similarity" between two strings) in a side project idea. There's something there, although the actual check (using a given compression algorithm) would be abysmally slow without some form of optimization (which he suggests)
It's an intractable problem. In order to actually reduce the level of complexity (rather than merely slow the rate at which it is growing) you would have to build a condenser machine that was powerful enough to reduce complexity at a rate faster than it is currently growing. Let's say that I were able to somehow build such a mounstrously powerful machine. It would require such a huge amount of energy and resources to run as to be ruinously expensive, and the only way I could afford to run it is by selling some of it's mamouth computing capacity to others. Which would cause an increase in the rate of growth of complexity. So now I have to build an even more powerful machine (or increase the capacity of the existing one) to keep up with increase in the rate of complexity growth that I've just caused by running my machine. Which would again cause an increase in the rate of complexity growth. The second law of thermodynamics states that the energy cost to increase the entropy of a system will always be smaller than the energy cost to reduce the entropy of the same system. So no matter how big a machine I make, I will never be able to keep up with the rate of increase in entropy caused by the existence of the very machine I built to solve the problem in the first place.
Ok, so how having less files addressed by hashes help with the overall complexity? Its difficult because ecosystem is so diverse, and not because people have copies of the same file (within different machines and directories)
Can some abstractions be baked out at compile time for efficiency? His Entropy Reverser is a classification of code modules A challenge librarians have been working on for millennia with the printed or electronic word
If I have a textfile and a file that I made by printing the textfile and then scanning the printed file, I have two very different files, and there is no way I can find that they are similar, although they contain the same information.
The most optimal way to find similar files by 'edit distance' has (sadly?) been discovered 40 years ago already: news.mit.edu/2015/algorithm-genome-best-possible-0610
i need an eli5 on what he’s really trying to say here. i dont understand the call to action. just by having a condenser we will have a solution to all these languages and levels of abstraction? isnt that just how we compile to machine code (or say in future web assembly)? i dont see how that will make our programming any easier, if anything that will allow more languages to proliferate.
That’s a fair criticism of the condensor proposition as posed, but I think Web Assembly is a fair comparison to what he meant. It doesn’t necessarily and inherently promote the proliferation of new languages, because creating compilers into web assembly for all of them has a cost.
Nice, so the future computer geeks will have hadron colliders under their desk :D "Jimmy, time for bed!" , "Yea, mom! Just waiting for the black hole to appear so I can save my data!".
39:15 "Let's start making the condenser" You do know, this guy is not just talking about file storage, compression and indexing? He is outlining a project for practical machine intelligence, an idea processor. This is the most hair-raising part of his talk.
BTW, it was solar powered hardware that served as the implementation of the Matrix in said movie. Humanity, in the movie, crippled it by creating a nuclear winter effect.
40:12 So for the difficult bit of finding similar things within a distributed system, rather than compressing the actual bytes of two things and diffing them, you most certainly want to utilize neural networks. For every piece of content an author cares to make discoverable, they'll want to generate multi-modal embeddings for the most popular neural network(s), and if they want it to continue to be discoverable, they'll want to continually generate new embeddings as new versions of those neural networks are published. Then, whenever you want to find similar things, you just send out requests to the network to find content with embeddings within a certain vector space radius of the original search input, and the radius would grow as more results are needed. I do wonder if the ideal internet would do well to have only ONE neural network to perform these embeddings. It would continually be training itself on new content.
16:55 But the number of plausible states your machine can be in is not that large. We should quantify the entropy over the distribution of computer states. :P
+The Inconsistent Park Of course it is. You can modify every file on your computer, and I can mine. there is no guarantee we haven't. Plausible is not the same as likely, but even likely is no guarantee. We can't guarantee that everyone has the same DLLs. We can't guarantee that the software we are writing today will never run on ARM, or MIPS, or OS X. We have no guarantees in the software we create, even when we give recommended system requirements. We can't even guarantee we are running on actual hardware, or in a VM, or in a VM inside a VM, which is possible using the VMWare solutions. And every byte on your computer's hard drive has the potential to affect your system. When iTunes was eating Windows systems, it could have deleted any combination of files before you turned it off. you might have turned it on and everything seemed to have worked, but you didn't know it had gotten to ntfs32.dll and deleted the last portion which might mean every time you save file, a random file is now deleted. Software acts on Data, and executable files are data. As soon as you install a program at your first Windows boot up, your system is now different than most others. With Windows 8/8.1/10, when you sign in your computer has settings changed to the last computers settings that you signed in with that account. So you don't have to install anything. On Linux, OS X, *BSD, Haiku and all other systems this is true.
the frustration of using google and stackoverflow to get a quick fix only to find out it's not quite the same thing is very real and it has to do with states of the two underlying computers being somewhat different. One of the most standard troubleshooting technique it to reproduce an environment that does work with one that does not and migrate the two ever closer to a common state until the mysterious difference reveals itself in an obvious manner.
NDxTremePro check out nixOS, which determines the system state by deterministic compile and deploy instructions, and hashing of configuration files. I think it is the closest experiment in that direction.
"Who can program for more than 5 minutes without using the internet." Ignoring the fact that I do web development where I need to access resources on the web, I actually could do a lot of things offline that I do online. It is often faster to google for information than to search the locally installed documentation. Kinda ridiculous.
If we can define the 'name' of a file in terms of the (class of) computer program that have that file as an output given a null input? To do this, we need a way of defining equivalence classes of computations modulo language.
I like the idea of hashes but I think they are absolutely unnatural to humans. Instead we are deeply relational creatures. Our brains are neural networks - the condensing machines (even time condensing, not only space) he talks about. And the amount of associations we have with data, values, etc - conscious and subsonscious - is enourmous. If we build a neural interface that can track our brain associations and automatically tag data it will be the ultimate relational database. Distributed human memory, distributed with machines and thus with other humans. In this relational database (which would be a mix of traditional relational databases, graph databases and archivers) a search would be performed by thinking about "that noisy guy in a red cardigan".
This talk is great, but the example of entropy with the dice is just wrong on so many levels. The Entropy does not change at all when just looking at the dice. If you chuck them long enough eventually they will all fall on the 1 face, because he is changing the arrangement of them, but the entropy stays exactly the same(the is an argument to be made that the entropy actually travels on a sine-wave here, but that probably clashes with the exact definition of entropy, so that the laws of thermodynamics keep working)
It is ideal in theory, but conventional Distributed Hash Table designs (like Kademlia, what IPFS uses) don't scale well in practice. Also IPFS doesn't have a good way to keep things stored indefinitely, nodes have to periodically broadcast that they have some piece of data, which strains servers hosting large amounts of data.
you don't need the condenser. just throw away all the stuff and write anew - though not programs this time, but high-level specs. And make systems that know how to execute these specs. Then write the high-level specs language, and re-write your specs in it. Pretty soon you just end up with the Executable English (TM) which is what we all should have been working towards all these years anyway - and nothing else. That's the principle of Ditch the Efficiency taken to its natural conclusion. Example: window systems. There are lots of them, but the concept of a window is the same (or nearly the same). Example: web programming. There should be not one web programmer on the Planet. Not one. You explain - in English - to the System what you want done, and it writes the code for you. Only the designers will be left - all the formalizable stuff should be formalized and dumped on the machines to do it.
Atoms in the universe have 4 quarks, each has it's own spins that doesn't have hard limits and influences each other. I think, saying everything has 10^26 states is very much understatement. But, reducing entropy of computation data is the must of our technologies. Seeing this comes from 8 years ago, the only thing I see successful are noSQL and IPFS. We still didn't solve "Find similar things with less entropy" problem, yet. But, least compression size sounds the most promising. How about least compiled compression size?, well, that'd not work with scripting languages, and dynamic addresses. So the problem is to figure out how to calculate computation flow and assign identity index so the same algorithm as the same index, and similar algorithm have indexes lists that tells us which idea it borrowed from. Sounds like git forks. However, to automatically identify an algorithm, it would take some intuitions, so AI reducers might be the best way to approach this problem.
22:30 In fact, I spend a lot of time programming in the train or bus and I totally feel like that. I can do just fine when it's low level stuff and I've got access to the man and source code, programming an android app is completely impossible. I just learned to avoid Java and anything related to it when I'm on the train.
I figured he was going to talk about how because of the internet and the introduction of bad actors, we've had to make systems more flexible, more safe and thus bigger. Got maths, physics and quantum theory instead. 10/10, would math again. c10789a5cac389c63e67622892c0e5ac1401d493 - title (The Mess We're In) 074675bb7350d5077da234919568bcebd3c5ae83 - full title ("The Mess We're In" by Joe Armstrong ) How can I not find anything, based on the hashes of the title? SHAME!
@7:48 Things like neural networks require different ways to understand them. Saying 'we don't understand why it doesn't work/not work' is acceptable when dealing with neural network.
"information travels at or less than the speed of light" - i don't claim to be a scientist, but that sounds like a very outdated knowledge. didn't we find out that communication between entangled particles happens at speeds at least 10kx higher that the speed of light?
Yeah, even though entanglement effects "happen" FTL (which might depend on your interpretation, I think MWI sidesteps this issue somehow), you still can't exploit them to _transfer information_ FTL
There’s no way to prove what speed the communication happens at, because in order to confirm the state of both particles has changed, you need to make use of other communication methods, which must necessarily occur at or below the speed of light.
There is this famous story that I've been told numerous times at university about the Ariane 5 rocket crash on first launch. They had multiple guidance computers to prevent losing the rocket to a single failure, but all computers ran exactly the same code. Said code was reused from the Ariane 4 and had worked perfectly many times. When the new rocket accelerated much faster, a sensor value overflowed. The first guidance computer came to the conclusion that the rocket was going too slow and the angle of attack must be lowered dramatically. The other guidance computers came to the same conclusion a few milliseconds later. The rocket broke apart moments later. Just throwing more computers at the problem can only protect from hardware failures, not software bugs.
@@SolomonUcko And who's gonna pay for that? Also, what if the specification itself is "buggy", as in does not tell whatever it was _intended_ to specify?
The people who care enough about not crashing their rockets will pay for it. Either way they're paying for it, it's just a question of whether the alternative implementations stay in the head of the guys write reliably correct code, or in the fuzzed test suite, or gets deployed to production as well.
If the specification itself is buggy you have the same problem regardless of how many alternative implementations you put in - your rocket will fail. That's entirely irrelevant/orthogonal to whether or not your reliability goals and budget are best served by one implementation or multiple.
@@alexanderkozhevnikov9087 Actually, I would argue that every specification for a complex product ever is "buggy". Not in the sense that it is necessarily wrong, but at least *incomplete*. If the product should evolve over time (so not for spaceflight but for many other applications) this becomes even worse! Because you need something new but won't build it from scratch, so you change the specification and then the product (Or, if you want to create a mess: You only change the product). So then we should only make one program (or other product) following the specification and then improve upon it. Of course, rockets are some of the only cases where you can't improve later (unlike space probes, which can be and are reprogrammed). So it needs to be correct the first time. But I think more eyes on one product are safer than few eyes on many products.
It seems like he's saying that we need to find a reasonably small number of abstract things that we are capable of understanding, which can in turn be used to recreate all the files we already had, such that they can be maintained by future and parallel generations, and not continue to increase the cost of computing. So… We need absolutely rock solid fundamentals to combat entropy in computing environments? (Which is something we definitely do not have at this time.)
The (management insistence upon the ) C programming language for writing code on 5 MHz computers inflicted great mental damage on this industry :-( Worst. Premature. Optimization. Ever. (or, "Worse is Better", if you prefer)
We would *still* mostly be writing C++, if it were not for the "extinction level event" that was "servers connected to the internet". Not that Java is far enough away from COBOL...
Places and names definitely should be abolished from installation procedures. We need to move to a linking-file-system to find files and to relate modules and libraries to programs.
RIP Joe, this comment comes too late.... but I don’t believe we could get the state of the external pair prior to the evaporation of that black hole. Causality would be conserved.
I revisit this talk every once in a while because it holds more and more true every year. I love that he actually presents a solution to such a vague unsolvable problem, and with a great comedic timing throughout.
10:55, my father uses to do that. He thinks he's code is self-explanatory. I'm much more humble about this. I think this is a kind of art, still wild, untamed. Making lots of f()s doesn't facilitate things, because the code that could once be read mostly horizontally _(natural for our eyes format, aka widescreen)_ now became vertical. Even worse, now you need the aid of your hands too. This leads to spending more of a silent waste of energy, the most underestimate. However, the code like that became indeed more self-explanatory. So it's a trade-off. Meanwhile, I rather stick to comments, in a mostly single page of technical code. 11:27, for libs or projects I make, I use to write pages of text editor, like a tiny book. But this remains outside the code. I should also write blocks of comments inside the code, like in open source libs.