Тёмный
No video :(

Plain Text - Dylan Beattie - NDC Copenhagen 2022 

NDC Conferences
Подписаться 196 тыс.
Просмотров 161 тыс.
50% 1

Software is complicated. Machine learning, microservice architectures, message queues... every few months there's another revolutionary idea to consider, another framework to learn. And underneath so many of these amazing ideas and abstractions is text.
When you work in software, you spend your life working with text. Some of those text files are source code, some are configuration files, some of them are documentation. Editors, revision control systems, programming languages - everything from C# and HTML to Git and VS Code is based on the idea of "plain text files". But... what if I told you there's no such thing? When we say something is a "plain text file", we're relying on a huge number of assumptions - about operating systems, editors, file formats, language, culture, history... and, most of the time, that's OK. But when it goes wrong, "plain text" can lead to some of the weirdest bugs you've ever seen... why is there Chinese in the event logs? Why is the city of Aarhus in the wrong place? And why does Magnus Mårtensson always have trouble getting into the USA? Join Dylan Beattie for a fascinating look into the hidden world of text files - from the history of mechanical teletypes to encodings, collations and code pages. We'll look at some memorable bugs, some golden rules for working with plain text - and we'll even find out the story behind the mysterious phrase "pike matchbox" and what it has do with driving in Belarus.
Check out more of our featured speakers and talks at
www.ndcconfere...
ndccopenhagen....

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 270   
@jonnilazzerini9085
@jonnilazzerini9085 2 года назад
I was a little bit skeptical: how can anyone give a one-hour talk speaking just about 'plain text'? But I have to admit: it was simply AMAZING! Well done!!!
@tharfagreinir
@tharfagreinir Год назад
Dylan Beattie can make pretty much anything interesting. I think he likes to challenge himself that way.
@hansbaeker9769
@hansbaeker9769 11 месяцев назад
Same here. I was expecting to go to something else within a minute or two, but stayed for the whole thing.
@crax83
@crax83 6 месяцев назад
​@@tharfagreinirhis art of code talk is one of my all time favorite talks. This one is also way up there in the top 5 or so.
@f.d.3289
@f.d.3289 10 месяцев назад
23:30 That is the most beautiful thing about human beings that I've heard in a long, long while. God bless that postman who really cared for his job and even was smart enough to figure out that problem. This will make me happy the rest of the day :D
@NicholasShanks
@NicholasShanks 2 года назад
At the risk of being one of those RU-vid comments shown in your next talk, the diacritic you discuss at 29:18 is a diaressis not an umlaut. They look the same and are encoded with the same codepoints, but are pronounced differently. An umlaut changes the quality of the vowel, and can appear on lone vowels in any language that uses them. A diaresis tells readers that the second of two vowels is not to be read as a diphthong, but a separate vowel. That's why English has one on, for example, naïve (nigh-eve, not knave). Coöperation is co + op not co͞op.
@jkollin4875683F
@jkollin4875683F 2 года назад
Something Nordic readers of Tolkien would do well to be aware of -- I'm referring to Eärendil etc.
@EricChipko
@EricChipko Год назад
Well done. I am not sufficiently educated to know if you are right, but the criticism is concise and I recognize the words if not what they mean.
@stevecarter8810
@stevecarter8810 Год назад
Saved me posting the same, but having to look up all the terms to double check myself. Thanks!
@TonyCoyle
@TonyCoyle Год назад
and that specific diaresis is called a trema in almost every other language that uses it...
@Shack263
@Shack263 Год назад
Also, the umlaut is used in German and was derived from roundabout there (idk the history too well) whereas the diaresis or trema evolved independently and is notably used in French to mark vowels that may usually be silent, but should be pronounced. This is similar to it's use in coöperation, to basically say that the second o is pronounced distinctly. The two symbols were developed independently.
@jandorniak6473
@jandorniak6473 Год назад
Since Dylan does read comments, here's one of my favorite examples, in Polish: "Zrób mi łaskę" means do me a favor. Most of the characters can be turned to their ASCII lookalikes without any issue whatosever. Except one. "Zrób mi laskę" is asking for a specific sexual act. Just turning ł into l changes the entire meaning of the whole sentence.
@merthyr1831
@merthyr1831 Год назад
This ascii issue is also a cause of cultural tension in (Republic of) Ireland and (Northern) Ireland, where birth registrations at some hospitals are refused or incorrectly assigned when a child's parents opt to use a Gaelic name, which often includes a bunch of non-ASCII chars. Hospital software is usually pretty archaic and predates a lot of the elegance of UTF. Also. Amazing talk. Funny and interesting the whole way through. Dylan Beattie is a legend!
@szymonbaranowski8184
@szymonbaranowski8184 Год назад
it's like Slavic names in Germany
@malcolmhutchison
@malcolmhutchison Год назад
One of my favourite sorting rules is that for Scottish surnames "Mac" and "Mc", both with and without following space, are considered the same letter that comes after L but before M
@EvincarOfAutumn
@EvincarOfAutumn 11 месяцев назад
There’s a similar quirk with English genealogical documents, such as old church birth registers and ships’ passenger lists. They’ll often use abbreviations of common personal names (and even some surnames) to save space, and when these are sorted-whether in the text itself, or later on by a computer-it may be according to what the abbreviation stands for, not the letters themselves. So you have to just know, for example, that “Hy.” might appear before “Herb.”, because “Henry” comes before “Herbert”. Moreover, some of the abbreviations are based on a Latin and/or Greek transliteration of the name, such as “Iabus” = “Iacobus” = “Jacob” or “Xpr” = “Christopher”.
@paulwesley3862
@paulwesley3862 10 месяцев назад
​@@EvincarOfAutumninteresting! Just wondering why Jacob was abbreviated with another 5 letter word? 🤔
@altreusplays
@altreusplays 10 месяцев назад
I’ve also noticed it’s a free for all on whether the word “the” is ignored when sorting lists of names. Steam doesn’t ignore it, for example, and I think Google Play music used to but RU-vid music doesn’t. But to me, it’s correct to ignore it and incorrect not to!
@EvincarOfAutumn
@EvincarOfAutumn 10 месяцев назад
@@paulwesley3862 In that case, the person’s name in everyday life would’ve been Jacob, but if the church records are (partially) in Latin, it’s the Latin form that’s abbreviated. I think just “Iab.” is attested as well, though I’m not sure.
@nsulikow
@nsulikow 2 года назад
This is one of the best presentations I've seen in a long time. Amazing content!
@chascuk
@chascuk Год назад
The 7-bit encoding for SMS messages in GSM is the same as ASCII for most characters but many of the control characters have been replaced text characters that were missing from ASCII. In particular it does not have NUL, 0 encodes the '@' character. So, as one of my colleagues at Ericsson found out the hard way, you cannot use C NUL terminated strings to process SMS messages.
@UliTroyo
@UliTroyo Год назад
Interesting!
@flammungous3068
@flammungous3068 10 месяцев назад
This video also explained to me why SMS becomes converted to MMS if just put in a few emojis. Because the emojis take so many bytes.
@Architector_4
@Architector_4 9 месяцев назад
wait, what about ASCII 0x40? Isn't that an @?
@chascuk
@chascuk 9 месяцев назад
@@Architector_4 In GSM 7-bit encoding 0x40 is inverted exclamation mark, one of the characters missing from ASCII. No idea why they didn't use 0 for this and keep @ where it was.
@Architector_4
@Architector_4 9 месяцев назад
@@chascuk ...huh. That's fun, thank you lol
@notthedroidsyourelookingfo4026
@notthedroidsyourelookingfo4026 2 года назад
Recently, a student of mine opened a text file and it was all Chinese gibberish. I remembered your talk and switched the encoding from UTF-8 to UTF-16 or vice versa, and there was a readable file again :)
@FlameRat_YehLon
@FlameRat_YehLon Год назад
Meanhile in areas people actually use Chinese, well, time to try all the encodings.
@HasanSIM14
@HasanSIM14 2 года назад
Watching this for the second time (I watched the video referenced several times in this talk). Absolutely brilliant and I learned a lot
@NicolasChanCSY
@NicolasChanCSY 2 года назад
44:14 Glad that my comment in the previous talk video was found helpful :)
@drullo
@drullo Год назад
Absolutely one of the best presentations that I've seen and it was a total shock. I watched this because I'm a geek and I like Dylan Beattie. I never expected it to be this awesome!
@JeremyAndersonBoise
@JeremyAndersonBoise 2 года назад
The youtube comment near the beginning of this updated version of his previous presentation illustrates the point of the talk powerfully. Dylan is always amazing, but this talk from him is perhaps uniquely important to everyone in the field! From 1st year associates to the most seasoned senior architect, plain text is always less than plain.
@fabioluizalvaresosti7115
@fabioluizalvaresosti7115 Год назад
Plain text but the 'l' is silent
@jeberle1
@jeberle1 Год назад
Very good talk. Regarding ASCII and punchcards, it's unlikely they would ever meet in the first place. You do course correct a bit w/r/t the DEL character, but punch cards were originally in 6-bit BCDIC (binary-coded decimal interchange code). This was extended to 8-bits to become "Extended" BCDIC, or EBCDIC. The layout of the character set aligned w/ the rows of the punchcard, such that all alphabetic chars were x1 - x9, so in late variants 'A' is 0x11 and 'Z' is 0x39. To get 3 rows of 9 columns to line up, there's a "/" at the start of the last row, 0x31. Interestingly, ASCII was created by Bob Bemer at IBM to solve interop problems between the BCDICs. However, IBM was in so deep w/ their card-based (E)BCDIC, they couldn't use it in any of their operating systems. Note also, EBCDIC is still very much in use. Finally, Multics did not influence Unix, except to serve as a counter-example of design principles.
@edgeeffect
@edgeeffect Год назад
I've always wondered how come EBCDIC was "extended", thanks for that.
@f.d.3289
@f.d.3289 10 месяцев назад
I have been a softare developer for 20 years and it's only in the last 5 years that I began to realize the actual complexities of good old plain text. Once I realized how complex this issue actually is, I began to wonder why many of the systems I had worked on even WORKED. It's not something they talk about at university or anywhere, so it was nice to see this gets so many views. I haven't watched it yet but I'm sure it will open many people's eyes.
@braveatnight
@braveatnight 2 года назад
Yay I love this guy, I binged all his talks like a month ago
@JeremyAndersonBoise
@JeremyAndersonBoise 2 года назад
You have impeccable taste, bravo!
@MrIkariaman
@MrIkariaman Год назад
Also, for future talks you may find the "Greeklish" system interesting: en.wikipedia.org/wiki/Greeklish Basically before Greek language was fully supported, Greek people interacting with electronics came up with mappings between ASCII and Greek. These mappings were unofficial and there are several variations. Even after UTF-8 was implemented and got more and more adoption, lots of young people still utilized Greeklish in SMSs to send messages to each other because you'd get charged by the number of bytes you used (in groups of bytes) and not by the actual number of characters used. This is also an issue in a lot of fields that have a byte limit instead of a character limit. On a parallel note... If you do a bit of time travel, and go to Greek villages in Anatolia during the time of the Ottoman Empire, you'll find the Greek alphabet being used to write Turkish text: en.wikipedia.org/wiki/Karamanli_Turkish
@deus_ex_machina_
@deus_ex_machina_ 11 месяцев назад
That sounds similar to what many Arabic speakers use, numbers in place of characters.
@heinzk023
@heinzk023 Год назад
In days of 7 bit ASCII, there were lots of workarounds in non-English speaking countries. For example, in order to be able to print umlauts, printers had special character sets that had umlauts where normally the characters {, [, ], }, \ and | were, because nobody needed them when writing a letter. However, if a C or C++ programmer would use such a printer, his code would look quite funny. In parts that's the reason why some languages have special replacements for these characters, called digraphs and trigraphs. This all sound like multiple layers of duck tape putting on top of one another but it kind of worked.
@ayle1312
@ayle1312 Год назад
30:00 ij is a dutch letter, not a typesetter's ligature! It's in the extra block at 19:50 left of Ö. Most fonts don't support it and ASCII led to it being written as 2 letters (i and j) because it was the only non-ascii letter in dutch, but all dutch typewriters before PCs were popularized had a dedicated key for it. Fonts that turn it into a ligature often run into problems with words like minijack, Beijing and bijoux. It used to have the same problem as å, with some people turning it into a Y (most famously Cruijff) until it got standardized as I+J.
@filker0
@filker0 Год назад
I spent a fair part of my career designing and implementing serial terminals and emulators of the same. For terminals from DEC starting with the VT100 (and other "ANSI" terminals), there was something called "code extension", along with character set designators, graphic sets, and shifts (both locking and single) that were used to mix text from multiple character sets on one screen/page using either 7 or 8 bits per character. This was fine on terminals and printers that had the same character sets available, but caused a lot of grief when a device receiving the text didn't support all of the character sets used. Also, very few editors at the time could handle storing such text. It was a mess, but at least it was better than what it replaced, which was National Replacement Character Sets (NRCS), where it was 7-bit ASCII with the glyphs for some of the code points replaced. There was no way to tell which NRCS had been selected when the file was created, even with a hex editor.
@henrikholst7490
@henrikholst7490 Год назад
Fantastiskt innehåll. Borde vara allmänbildning för alla som jobbar med IT och utveckling.
@qm3ster
@qm3ster Год назад
Nothing wrong with writing JavaScript in Ukrainian: 1. It runs fine. 2. In production build, the minifier will take it all out and replace it with single-character ascii names. 3. Source maps will work fine.
@vincentvega7908
@vincentvega7908 Год назад
The reason why you get smiley faces when DOS crashes is not because there is something trying to generate the stop character. The reason is that often it starts executing random garbage or tries to print a message that became random garbage due to memory corruption. In a piece of program data the values 1 and 2 would be quite common if you have some counters that did not fit into your registers, and maybe they encode some common x86 instruction as well. The string terminator in the common OS interface for printing strings was the dollar sign rather than nul on DOS operating system. The dollar sign is much less common than nul and smiley faces in random garbage so you will likely get some smiley faces printed. Note also that 'plain text' is just a binary format (or more precisely a family of binary formats with ASCII, EBCDIC, various code pages, JIS, BIG5, GB 18030, UCS-2, UTF-7, UTF-8, big endian and little endian UTF-16/UTF-32,...) for which there happens to be a lot of editors and viewers. In the end it's all binary bits. One specific property that 'plain text' has over many other binary formats is that it has very little structure and can still be of some use when some bits are flipped or bytes missing as opposed to, say a compressed JPEG image with the caveat that the multibyte encodings are much more fragile.
@zuao76
@zuao76 8 месяцев назад
Now this was incredible funny, entertaining, intelligent and interesting. Not expecting this. Incredibly done. We need more talks like this in IT and not so serious and boring. Well done :)
@feisty-trog-12345
@feisty-trog-12345 Год назад
43:35 Generally a very solid talk, but the section about UTF-16 was kinda inaccurate. UTF-16 is not actually a fixed-length encoding and you cannot get the number of bytes just from the number of contained characters (e.g. Emoji need two UTF-16 code units forming a surrogate pair). The actual reason that so many of these 90s systems use UTF-16 is that this was the time of the fixed-size 16 bit UCS-2 encoding ( "65k characters ought to be enough for everyone"), which was later expanded to become UTF-16 when they ran out of code points. Instead, the range of code points U+D800 to U+DFFF was permanently snapped out of existence, so that UTF-16 could use them to encode higher code points as multi-word sequences. This is also the reason why not every String in C#, Java, or JS is Unicode; these languages allow you to have unpaired surrogates which are not valid UTF-16 (they are not scalar values). See the "History" section of UTF-16 on Wikipedia. And this entire paragraph was even without going into that dreaded word "character". If you take character to mean code point, then doubling the number of characters to get the number of bytes is almost correct (so long as you don't care about anything outside the BMP, aka basically all instant messaging, social media, ...). But as we've seen one "character" can be made of many many code points and each of those code points can be multiple code units. And if sequence of code points is displayed as one "character" or multiple depends on the display technology you're ultimately using (wtf is an extended grapheme cluster?). In fact, the Unicode standard doesn't define what a character is. So, ultimately, there is no actual correspondence between the number of "characters" in a string and the number of UTF-16 code units, the concept of a character varies from use to use, and UTF-16 falls short of even the most charitable interpretation of "character = code point". Additionally, the reason that UTF-8 stops at four bytes is actually because Unicode is a 21-bit scheme. Unicode has made guarantees that it will only ever go up to U+10FFFF and this, again, stems from the fact that they weren't able to squeeze more bits out of UCS-2. In summary, UTF-16 is weird a legacy encoding resulting from expanding UCS-2 to a set of code points it was never meant for. In doing so, UTF-16 has lost a key property of UCS-2 (being a fixed-length encoding for scalars), while only displaying the lack of this property for (until recently) uncommon inputs. It now has both the disadvantages of UTF-8 (variable length) and UTF-32 (wasted space, ASCII incompatibility) while introducing additional drawbacks (byte order confusion, false belief in being fixed-size). Unicode has had to insert multiple hacks just to keep this mess going. UTF-16 is Unicode's original sin. Every emoji broken by a Java developer using "char", every "Bush hid the facts" censored by IsTextUnicode, and every broken API call from mishandling wchar_t is a punishment from the tech gods themselves. In our hubris we believed that there were less than 2^16, so now we must suffer forevermore.
@serpent77
@serpent77 Год назад
Having recently delved into utf8, unicode, etc, I knew a lot of this, but learned a few new things as well, either way it was thoroughly interesting. Well done!
@rojokongen
@rojokongen Месяц назад
Loved the talk. Well done, Dylan! 👌
@sauliustb
@sauliustb Год назад
this is an amazing talk. i already knew some of this, but it still is nice to get a reminder on this stuff :)
@bujin1977
@bujin1977 Год назад
Late to the party, but I enjoyed that. So much so that I started watching at about 1am thinking of just catch the intro before I went to sleep to determine if it's something I want to keep watching, and ended up watching over half of it before finally deciding I was too tired. Also I learned something new that will solve an issue with one of my applications, so that was a bonus!
@DerekCroxtonWestphalia
@DerekCroxtonWestphalia 11 месяцев назад
Good talk, I did a lot of research on this about 20 years ago but I always forget. BTW, the two dots in English are a diaresis, not umlaut.
@dgsagoskis1851
@dgsagoskis1851 11 месяцев назад
I love them YT commentators. World would be a much more imperfect place without them. Btw i thought i knew a lot about plaintext, but turns out i knew something about plaintext. Thank you!
@Rx7man
@Rx7man Год назад
2:57 My favourite part of this is your youtube suggested videos are all ones I've watched!
@microcolonel
@microcolonel Год назад
UTF-8 is rarely slower to process than UTF-16, and because UTF-16 only has the BMP in a single code unit, you can't rely on that for counting codepoints anyway; furthermore, rarely do you want to count codepoints, you generally want to count graphemes.
@tappy8741
@tappy8741 Год назад
UTF16 generally sucks and was the bane of my existence for many years, thanks for nothing windows as usual.
@Karreth
@Karreth 9 месяцев назад
UTF-16 is actually just another hack to fix UCS-2, which is the fixed 16-bit Universal Coded Character Set. It was intended to contain all the codepoints until we discovered that 16 bits were actually too few bits to contain the set. It really is hacks and partial backwards compatibility all the way down. Windows extended their API to work with wide characters to support UCS-2 before UTF-16 or UTF-8 was a thing, and when UCS-2 died they were kinda screwed and couldn't update their design. So that's how we ended up here.
@CRBarchager
@CRBarchager 2 года назад
At first glance the headline of this video/presentation seems dull but it ended up being extremely interessting! - Very good video and very informative!
@colinmaharaj
@colinmaharaj 9 месяцев назад
Lovely talk, like going down memory lane. Spent a lot of time dealing with this. From writing xmodem and ymodem, to parsing csv files, converting bin to text, and back.
@SiriusXification
@SiriusXification 2 года назад
You know, featuring the youtube comments in the talk only embodlens us.
@fedormalyshkin
@fedormalyshkin 2 года назад
It's the most funny IT conference's speech I've ever seen in years!
@etmax1
@etmax1 Год назад
Well that was another exceptional video from the master. I found that extremely enjoyable and informative. Unsurprisingly I didn't know a lot of the histrionics
@AshtonSnapp
@AshtonSnapp Год назад
Rewatching this talk proved very useful today. Currently dealing with the lexer for my programming language project failing unit tests on the Windows runner for GitHub actions. Wanna guess why? I’ll give you a hint: newline tokens report their span to be exactly one character later than expected.
@maximvoloshin7602
@maximvoloshin7602 2 года назад
You should never underestimate things labeled “simple” or “plane” )) Thanks, Dylan! Appreciate so much everything you’re doing for the community.
@NeatNit
@NeatNit Год назад
I have never underestimated a plane. Be it a machine that can carry me to the sky, or an infinite flat set of points in 3D space, or a tool used to smooth wooden surfaces, they are always quite intimidating.
@maximvoloshin7602
@maximvoloshin7602 11 месяцев назад
@@NeatNit 🤣🤣You got the point!
@JonathanPlasse
@JonathanPlasse 5 месяцев назад
Thank you for this wonderful talk 🙏
@f.d.3289
@f.d.3289 10 месяцев назад
Great lecture -- super fun and informative, thanks! And now I'd love see a follow-up that touches upon those lovely grey areas of A) finding out the encoding of a given "plain" text file, and B) UTF-16 surrogate characters. Especially the latter is quite important, because I'd guess that 95% of all applications using UTF-16 are broken, in the sense of not being able to deal with any text that contains Unicode codepoints which can not be encoded in the 16-bit units of UTF-16.
@user-oc3mi2ct6t
@user-oc3mi2ct6t 11 месяцев назад
Small comment from a Dane. Aarhus is at the start of the alphabet then spelled with a double aa atleast acording to any convention I have seen in use here in Denmark. Eventhough aa and å represents the same letter we still keep the alphabetic order distinct. Implying that Aabenraa is first in a alphabetically sorted list of city names in Denmark.
@BradenBest
@BradenBest Год назад
I'm famous. I vaguely remember the train of thought I had with that WWIII joke. That you posted a meme on twitter that was so funny that it prevented WWIII, and with you erased from existence by time travel shenanigans, that meme never gets posted and thus WWIII happens. I know I can get long winded especially when I talk about technical stuff, which is probably why I put that joke in there at the end. It's like a reward for sitting down and reading all that stuff about base64 and how vim fucks up binary encoding. Also, how dare you say the End Of Transmission character, Ctrl-D, is unimportant. How else would I log out of my Linux terminal in one keystroke?
@deus_ex_machina_
@deus_ex_machina_ 11 месяцев назад
This popped up at the right time; while messing around with Notepad++ I looked up the purpose of carriage return, line feed, and tricks like *bolding,* underlining, and -strikethrough- with typewriters and teletext. I've since come across resources like Typography for Lawyers that, apart from being an excellent reference for general formatting, advocate the end of shortcuts picked up from typewriters and a return to form for good typefaces and typesetting.
@dmurvihill
@dmurvihill Год назад
I couldn't imagine working at an airline, where I know for sure that names will be scrutinized in every detail, and deciding "eh, I'll just strip diacritics off of everything." Having scanned passports before, there are very well-publicised and clear standards for how to transliterate any Unicode character into that strip at the bottom.
@theelmonk
@theelmonk 9 месяцев назад
You're probably not American or English, then, where diacritics are uncommon and used only by foreigners. Yes, if you think about it that's a bit parochial but that shows the difference between programmers working for commercial companies with a certain market and the people who write standards like the one that allowed all those different forms in an email address.
@jalexanderdatkins
@jalexanderdatkins 11 месяцев назад
28:36 Æ is totally a letter in English. It's called the letter æsc, which sounds like "ash", because it represents the tree ash. And for completeness I should also mention the letter œthel, which sounds like Ethel, the personal name. They appear in obviously english words like encyclopædia, manœuvre and Cat7 UTP Æthernet cable. … Not to mention archæologist. I may have cheated a little bit with one of mine, but why doesn't that count?
@theelmonk
@theelmonk 9 месяцев назад
Laughed at Cat7 UTP Æthernet cable. And realised it's perfectly correct.
@jalexanderdatkins
@jalexanderdatkins 9 месяцев назад
It’s obviously an English word, right? And everyone knows that’s a valid spelling for it. The cheaty one is manœuvre, because that’s a French word. But I don’t get why he doesn’t count archæologist? Maybe it’s in the same way as because Latin only has the letter K in one word, it’s not considered part of the Roman alphabet. And to be fair, Æsh and Œthel don’t come up very often. Œstrogen is another one, but that’s basically a Latin word. I don’t know any non-borrowed words containing œ that are still in modern English. Unlike æther.
@Kitulous
@Kitulous 4 месяца назад
that was a very interesting watch, thank you!
@jkollin4875683F
@jkollin4875683F 2 года назад
On alphabetical ordering in Finnish... back when I was in school in the 1990s, I was taught that V and W actually are considered equal in Finnish. So going through a list of Finnish surnames, Valli, Waris, Virtanen, Wirtanen (tiebreaker here, I suppose) would be in correct order. But having googled this a bit more, this is apparently nowadays (since 2000) somehow dependent on context -- mixed with foreign words and names such as Vanderbilt and Wolf, it's OK to sort them all V first, then W. So I don't know if even printed dictionaries use this sorting today. I don't think this peculiarity is even well-known, IIRC this surprised many of my Finnish coworkers.
@cameron7374
@cameron7374 Год назад
So, do computers ever deal with this or do they just sort V first, then W?
@jkollin4875683F
@jkollin4875683F Год назад
​@@cameron7374 Never noticed a system that would (probably in part because W is in Finnish only in names (outside of possibly loanwords), and even there it is very rare). But after a quick googling, apparently at least in 2006 PostgreSQL allowed for this at least in Swedish.
@pepijnkrijnsen4
@pepijnkrijnsen4 Год назад
36:09 I see this a lot in the large German company I work for, specifically this example of having to select a country from a dropdown list. The countries' English names are displayed, but ordered as if they're German names.
@SerrinTheElf
@SerrinTheElf Год назад
That postal worker deserved a raise lol.
@BenjaminAster
@BenjaminAster Год назад
Mistake in 50:23: the rocket emoji is U+1F680, not U+1F680D
@GuildOfCalamity
@GuildOfCalamity Год назад
Great presentation! I code systems that use control codes all the time for work; they are still widely used and accepted (receipt printers, barcode scanners, serial comms, etc).
@heinzk023
@heinzk023 Год назад
When I was working with ASCII terminals, I liked to use BEL to sound the squeaky buzzer of the terminal.
@Carewolf
@Carewolf Год назад
Emoji existed in the West long before iPhones did. It came to us with things like instant messaging platforms. ICQ, MSN messenger, even facebook.
@nneddenn6207
@nneddenn6207 11 месяцев назад
Dylan, thanks for a speech! It was really interesting to hear all this historic details and understand more how unicode works. And my gratitude for your support of Ukraine! Слава Україні!
@emmafountain2059
@emmafountain2059 11 месяцев назад
God I have homework but now I have an irresistible urge to research unicode cause this was fascinating. Its amazing how clever some of their solutions are
@sportundwein
@sportundwein 2 года назад
Amazing content - mega cool Präsentation 🈶
@JeremyAndersonBoise
@JeremyAndersonBoise 2 года назад
I see what you did there.
@edgeeffect
@edgeeffect Год назад
@@JeremyAndersonBoise I was going to comment "I see what you did there".... but then I saw what YOU did THERE.... so couldn't.
@rustkitty
@rustkitty 9 месяцев назад
53:42 According to Apple, Dylan was in Denmark. According to Microsoft, he supports Donkey Kong. Both very respectable!
@TooLazyToFail
@TooLazyToFail 10 месяцев назад
This was a really fun talk, and very well-delivered.
@akirachisaka9997
@akirachisaka9997 Год назад
I really wish Dylan talks about Han Unification. Like, it's just such a cursed aspect of Unicode. I really wish more people know about it.
@nikneumann1752
@nikneumann1752 9 месяцев назад
I thought it was boring, but surprise! I watched it to the end. 😁
@gbeziuk
@gbeziuk 10 месяцев назад
I guess there's not much hope for doing a cameo in the next version of the presentation, but I'll try anyway. Using Cyrillic, or any other local writing system in JavaScript is probably a bad idea in any production code, for sure, and it's universally frowned upon for a reason. Universality, you know - if you write science in Medieval Europe, use Latin, don't be a dick. But, there's a "but"! Teaching programming to newbies with no STEM background whatsoever, who also don't happen to be fluent in English (you can imagine), I suddenly found allowing them to use the words of their native language as names in their source code very, very useful. Separation of concerns and cognitive load reduction, I guess. As a bonus, there's a clear distinction between library entities and the locally introduced ones, which is also a good thing for the newbies. In fact, the role of English in international software development is a huge topic with a ton of practical consequences. Some Chinese have already stopped giving shit on this "you must write everything in English" thing, and it's not gonna stop there. I LOVE FiraCode, BTW!
@MeriaDuck
@MeriaDuck 11 месяцев назад
That Russian postal service anecdote is just so wholesome.
@pyropunk51
@pyropunk51 10 месяцев назад
Good talk. I was a bit disappointed that you did not even touch on the whole EBCDIC vs ASCII situation.
@KangoV
@KangoV 11 месяцев назад
Java now uses UTF-8 internally. They dropped UTF-16 when Java 8 came out. An hour on plain text? I would not have believed it until I watched it. Just awesome.
@imranhussain8700
@imranhussain8700 2 года назад
This Guy is true Gem 💎.
@stevecarter8810
@stevecarter8810 Год назад
Omg that was god level summarising at the end
@daniilboiko
@daniilboiko 11 месяцев назад
The best one I watched last year! Special thanks for supporting Ukraine! Pike matchbox!!!
@jensGC
@jensGC Год назад
ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gd5uJ7Nlvvo.html The Danish letters "æ" and "ø" are much older than the spelling reform in 1948. The only new letter that was introduced in that reform was "å". It is correct that the reform did make Danish orthography more distinct from German - but the main reason for this is that the reform removed the capitalization of nouns.
@acobster
@acobster 10 месяцев назад
I've read the SO post, buy I never knew there was a name for Zalgo Text! Fantastic talk.
@Fetrovsky
@Fetrovsky Год назад
I remember running echo ^G in DOS as a teen.
@junestorm
@junestorm 11 месяцев назад
Brilliant lecture!! They didn't teach this in the 1980's when I studied computer science. ☝🙃
@valtterihuuskonen4207
@valtterihuuskonen4207 11 месяцев назад
Did I really spend an hour listening to a guy talk about text formats in the middle of the night‽ Yes I did. What a fun and interesting presentation. Thank you Dylan!
@wagyourtai1
@wagyourtai1 Год назад
I love watching different versions of the same talk... :)
@theelmonk
@theelmonk 9 месяцев назад
Is there another version where it carries on past the intruiging statement 'and this is where the version for youtube ends' ?
@bommel88
@bommel88 Год назад
As somebody from Aachen, I appreciate the choice of examples :D
@Mokkatomic
@Mokkatomic 8 месяцев назад
"your recording sounds great! What mic do you use?" "Rødgrød med fløde"
@hfranke07
@hfranke07 Год назад
Awesome job..... blown away
@CRBarchager
@CRBarchager 2 года назад
11:42 Anyone else had to try this when viewing the video? - It works!
@theburner4522
@theburner4522 Год назад
I wanted to try it out, but which key does he mean with "echo"?
@SebastianSchleussner
@SebastianSchleussner 11 месяцев назад
@@theburner4522 Not a key. You open a shell (e.g. "cmd.exe" and literally type "echo" followed by a space and then Ctrl-key together with G, Enter.
@bluenuttefly8813
@bluenuttefly8813 Год назад
They sang Odoia on the Billie Joel concert, which is a Georgian folk song!!! It is entered as Odoya in the beginning of the album shown... What the heck. I did not know of this. Cool!
@secondengineer9814
@secondengineer9814 Год назад
It was interesting to see the origins of Dwarf Fortress UI!
@Wyrd1975
@Wyrd1975 9 месяцев назад
BTE (Best Talk Ever) ! 👏
@NonTwinBrothers
@NonTwinBrothers 9 месяцев назад
I forgot about the ending. I've always known this as the Kohuept talk :D
@tolgagorgun7816
@tolgagorgun7816 Год назад
54:10 I literally bursted in to hard laughter for Windows' statement " 🏳‍🌈🏴‍☠🏁Gay pirates are winning!", hilarious mate. Amazing :D
@Proppeti
@Proppeti 3 месяца назад
Amazing, informative and pretty entertaining!! 😮😅
@richardtwyning
@richardtwyning 10 месяцев назад
Brilliant 👍
@lazykbys
@lazykbys 11 месяцев назад
Just to add a bit more pedantry, ASCII is not in alphabetical order since uppercase A comes after lowercase Z. I didn't realize this until I started typing a post to complain about how Windows 10 (unlike Windows 7) sorts Japanese hiragana and katakana, then noticed something similar happened with the English alphabet. Odd how things don't seem strange when you're used to it. :)
@kevinfleischer2049
@kevinfleischer2049 Год назад
Great talk. I was wondering, what would hide behind that title, and I was not disappointed.
@yugoprowers
@yugoprowers Год назад
Pike Matchbox is going to be one of those thing like when someone said Parachuting Buffaloes for lead on the Periodic Table, I'll never forget it because it is such a weird thing.
@warwickleahyssw4163
@warwickleahyssw4163 Год назад
Awesome video Calum
@NamanArusia
@NamanArusia Год назад
Key takeaways : 1. Try out FIRA Code. 2. Gay Pirates are always Winning. 3. In Soviet Russia, Post Office fixes YOUR code-page mistakes.
@pawelhepnar1608
@pawelhepnar1608 Год назад
Absolutely brilliant great speech
@Jayderzomb
@Jayderzomb 10 месяцев назад
this was beautifully interesting, thanks!
@pengain4
@pengain4 Год назад
Brilliant speaker and very exciting talk. ❤ Дякую!
@thevikas5743
@thevikas5743 Год назад
I was ready to waste my time on the boring plain text. But somehow that moscow postman made me go WOW!!!
@byteseq
@byteseq 9 месяцев назад
Brillant!
@fieryscorpion
@fieryscorpion 2 года назад
Wow That was a pretty interesting and fun talk!
@illegalcoding
@illegalcoding Год назад
This was incredible
@sharkie115
@sharkie115 10 месяцев назад
11:37 End of transmission (Ctrl-D) also still exists. This is the way to exit Linux console session.
@Carewolf
@Carewolf Год назад
Only one letter was added to the Danish alphabet in 1948. We already had æ and ø. Only å is a Swedish letter
@Carewolf
@Carewolf Год назад
Ironically it was Sweden using the German typewriters and alphabet. Hence they forgot the old Scandinavian letters æ and ø and replaced them with the German ä and ö.
@SebastianSchleussner
@SebastianSchleussner 11 месяцев назад
​@@Carewolf Typewriters? Ä and Ö, also in Sweden, go back to centuries before typewriters. Just different routes taken - to illustrate with "AE": Make a ligature of it, or put the e above and simplify? Anti-Danish sentiments of certain kings, who went out of their way to make Swedish sound/look different than rival Danish, may have contributed to the development.
@helmanfrow
@helmanfrow Год назад
Thanks, this was awesome!
Далее
The Web That Never Was - Dylan Beattie
1:01:46
Просмотров 90 тыс.
When I met the most famous Cristiano
01:03
Просмотров 17 млн
Əliyev və Putin kilsədə şam yandırıblar
00:29
Просмотров 198 тыс.
Plain Text • Dylan Beattie • GOTO 2023
43:12
Просмотров 37 тыс.
How This New Battery is Changing the Game
12:07
Просмотров 189 тыс.
When I met the most famous Cristiano
01:03
Просмотров 17 млн