Тёмный
Dylan Beattie
Dylan Beattie
Dylan Beattie
Подписаться
Specs, bugs & rock'n'roll.
Cases, Code Points and Collations
13:05
День назад
Learning JavaScript with WAT
9:37
Месяц назад
What If Wordle Had Cost A Dollar?
9:01
Месяц назад
Why Does My Phone Think It's In Cairo?
5:00
3 месяца назад
Delving into the Landscape
16:48
4 месяца назад
You Forgot To Say The Magic Word!
4:26
5 месяцев назад
The Evolution of Web Apps 1992-2024
14:01
5 месяцев назад
re:bass
4:34
2 года назад
Teams
4:45
3 года назад
Комментарии
@paramdandekar562
@paramdandekar562 8 часов назад
pike matchbox? more like rike matsnvokh
@jurion
@jurion 17 часов назад
@dylan I have been doing development for more than 21 years. I use UTF-8 as much as I can! Random question: It seems like Huffman compression algorithm a bit! Doesn't it?
@morthim
@morthim 19 часов назад
i thought REST just meant asynchronous with the server spining down and massively underclocking in relation to use. so i appreciate the video, the idea that it was an acronym never occurred to me. i wonder what hypermedia is.
@0LoneTech
@0LoneTech День назад
Some more systems with their own text encodings are GSM 03.38 (mobile phone text messaging), barcodes (code 128) and alphanumeric QR codes (though UTF-8 inside 8-bit QR is common).
@stevecarter8810
@stevecarter8810 День назад
I'm just about to do my final practical exam on a "DevOps apprenticeship" which will be interesting since I've optimized the app to a single static page that gets refreshed nightly. Not sure how i will get marks for logging and monitoring.
@niclash
@niclash День назад
I ended up with this problem last Wednesday. I needed emergency health care, and at first I tried to use a well-known "modern" provider, but in my state of pain I couldn't work out whether my request had been accepted or if they expected more information....So I called 112 since I was too confused and too scared to bet that I had overcome their too complex website. Turned out I had, and they tried to call back while I was on the 112 call. But afterwards I was thinking; How stupid are they not to test on people in extreme pain, inability to move properly, can't see clearly and so on?
@kellymoses8566
@kellymoses8566 День назад
UTF-8 has saved an absurd amount of storage and bandwidth.
@AxlefublrMain
@AxlefublrMain День назад
I'm russian, and am really happy at you actually pronouncing russian correctly! incredibly rare :D
@revengerwizard
@revengerwizard 2 дня назад
UTF-16 shouldn’t exist.
@R.B.
@R.B. 2 дня назад
I think you should have dipped your toe into MBCS and DBCS before Unicode. I think it would also have been useful to discuss BOM for UTF-16.
@dlwiii3
@dlwiii3 2 дня назад
Love this series Dylan. Thank you!
@TheBalthassar
@TheBalthassar 2 дня назад
I'm pretty sure I saw a video of you giving this at a conference some number of years ago. Doesn't mean I didn't just watch the whole thing again though.
@JeffWarnica
@JeffWarnica 2 дня назад
Minor nits. Novell, on selling Wordperfect, kept the "mail" part, which was GroupWise. (MS Exchange / Lotus Notes / Novell GroupWise competing with each other as bad mail servers with worse mail clients on top of independent invented client/server/db bad ideas is a story...) The other is more a clarification, Wordperfect itself had to be ported to Linux to be sure, but was in the Dos days, cross platform itself, available on various traditional UNIX systems as well.
@fredgenius
@fredgenius 2 дня назад
Thanks, fascinating. A real eye-opener for me, a native English speaker. I know 'of' Unicode, and MBCS, but rarely affect me in my cozy day-to-day life. Btw, I hate the word 'sequel', when used to say 'SQL'. WHY??? Do you also type Sequel'?
@paulbort6371
@paulbort6371 3 дня назад
Subscribed on the strenght of hte character encoding series alone. Neat, approachable explanation of a very complex subject.
3 дня назад
With Unicode strings, and with combining characters, you need to define what do you mean by the length of the string: is it the number of Unicode codepoints? Is it the number of glyphs (so, for example, letter E + combining character CEDILLA is 1 glyph long)? Is it the width in columns when rendering it on text terminal, using constant-width / monospace font?
@knightpp
@knightpp 3 дня назад
Your Ukrainian is great 7:42 Interesting video, thanks!
@yrebrac
@yrebrac 3 дня назад
some content is good enough to earn an instant subscribe
@nickwallette6201
@nickwallette6201 3 дня назад
Came here from wanting to read FAT32 file names on SD cards from a microcontroller, without the benefit of a full C library. Now staring at a header file and contemplating German streets and whether I should pass a flag to the sort algorithm that denotes the list contains European city names. I make poor life choices.
@niclash
@niclash 3 дня назад
I am sure it was a small effort for Magnus to travel to USA. Try China, where the border people are uninterested, unknowledgeable and unreasonable... Yeah, almost missed the flight, but an hour later, several supervisor's input, eventually they let me through. Speaking of China; Don't have too many characters in your full name. Many banks takes 10-20 characters maximum in their systems, and within the same bank, different lengths. Then combine that with different handling of Swedish characters (a, ae and ä existed within the bank in my case), I think I had about 10 different variants of my "full name" within that bank. And just about every interaction with the bank ended up being a crime novel, either treating me like I tried to defraud the account, or that time when it stopped accepting salary pay out when they upgraded the system, since the new TT system accepted a different number of characters in names than previous system and took awhile to work that out and get my employer to change that.
@JordanManfrey
@JordanManfrey 3 дня назад
utf-16 Hague trial when
@iOfSauron
@iOfSauron 4 дня назад
I love these series, so fascinating.
@paulohtobias
@paulohtobias 4 дня назад
The shortcut to bring up the emoji selector on my brazilian keyboard is win+ç. I wonder what it is on keyboards that don't have a dedicated ç key
@kacperkonieczny7333
@kacperkonieczny7333 День назад
win + . (dot)
@paulohtobias
@paulohtobias День назад
@@kacperkonieczny7333 huh, win + . works on my keyboard too, so I have 2 shortcuts for it haha
@ralfbaechle
@ralfbaechle 4 дня назад
I've dealt with all variants of encodings and charsets some of which created before I was born. While it was converging to to greater usefulness the incompatibilities were just wasting too much time. I'm so happy we've finally arrived at a solution for something so fundamental. The only thing that bugs me are the non-essential extensions such as the emojis which in turn are causing some churn with fonts. As for UTF-7, it was defined in RFC 1642 the status of which is "Informational", that is it's never been an official standard. There's even Modified UTF-7 showing the original UTF-7 didn't quite cut it. It's never been a standard of the Unicode Consortium either. Security issues have been found with UTF-7 so it was retired, software support for it even got removed. So no surprise you never got to see it. I didn't either and it's not one of the experiences I'm missing.
@taimunozhan
@taimunozhan 4 дня назад
Until not so long ago, Windows and many other supposedly UTF-16-complaint pieces of software used to only support the subset of Unicode that could be directly encoded in 16 bits, the 65536 codepoints which correspond to Unicode's "Basic Multilingual Plane" (BMP) which, as a matter of fact, includes almost everything you might need. The BMP is enough to support all major languages and most minor languages as well, you'd only need to go outside the BMP, into codepoints not supported by Windows, if you needed to work with ancient historical scripts, or if you needed to include some very rare Chinese character not included in the BMP. Then something happened and, all of a sudden, billions of people began regularly using characters beyond the BMP, requiring proper Unicode support. It was emoji. Support for emoji (which have codepoints above 65536) is the main reason Windows and many other software now properly support Unicode.
@TYNEPUNK
@TYNEPUNK 4 дня назад
great video, subbed.
@JasonSpielberg
@JasonSpielberg 4 дня назад
8:16 I knew I could hear a baby or other pre-1.0 lifeform of some sort
@ChadGeidel
@ChadGeidel 4 дня назад
Text files are binary files too!
@MeriaDuck
@MeriaDuck 4 дня назад
7:07 Java has stopped using UTF16 internally fornmost cases. By default it is latin1, unless stated otherwise. Took about twenty years...
@dascandy
@dascandy 4 дня назад
@6:45 four ways? ... lol. The seriously used ones are utf8, utf16-le, utf16-be, utf32-le, utf32-be, utf7 and utf-ebcdic.... which is 7. And I'm 99% sure you actually use most or all of these every day you're alive, in one way or another. Incidentally, *all* things that encode to 16 bits in UTF16, are between 1 and 3 bytes in UTF8, while all surrogate pairs in UTF16 map to 4 bytes in UTF8. The boundaries just line up. And UTF32 / UTF8 mapping *could* use 5-byte sequences, except that UTF8 and UTF32 are both held back by the UTF16 systems, so that everything can roundtrip through UTF16 without loss. Plus, who ever needs more than 1114112 code points? (famous last words; 3rd attempt)
@aixtom979
@aixtom979 4 дня назад
When you open the file from the guy there is a slight chance you go "OMG, I've been hacked!" ? 😆
@Stoney_Eagle
@Stoney_Eagle 4 дня назад
🧏‍♂️😍⌚🎞️, 🙇🏼‍♂️❤
@Haarhzh
@Haarhzh 4 дня назад
Delightful as ever
@hmueller7047
@hmueller7047 4 дня назад
Thank you for sharing, great video :)
@stephenspackman5573
@stephenspackman5573 4 дня назад
This _is_ Spın̈al Tap.
@petewarner1077
@petewarner1077 5 дней назад
Dylan could turn up at future tech conference promising a talk entitled "The history of Bolivian toilet brush manufacturing", and you'd just know that his would be the best presentation there and that you'd learn something incredible that made you a better programmer.
@jonragnarsson
@jonragnarsson 5 дней назад
"Pike Matchbox"? Hey, that's my email password!
@DylanBeattie
@DylanBeattie 5 дней назад
PIKE MATCHBOX is a terrible password. РІКЕ МАТСНВОХ, on the other hand, would be extremely resistant to rainbow tables and bruteforce attacks since it uses extended Unicode and isn't based on dictionary words.
@PaulBennett
@PaulBennett 5 дней назад
Five. There are five- among the *many* encodings of the Unicode exposition, are ... UTF-8, UTF-16BE, UTF16-LE, UTF-32, UTF-7, and a ruthless and fanatical devotion to Punycode.
@DylanBeattie
@DylanBeattie 5 дней назад
Well, I didn't expect "inquisition".toLocaleString("es-ES")
@JanMichalSzulew
@JanMichalSzulew 5 дней назад
Bulgaria also uses the Latin-alike glyphs on their license plates (same format as Ukraine: "AA NNNN BB" or "A NNNN BB") with one exception: the letter "У". I guess it was deemed similar enough to the latin "Y". I've only ever seen it used in the "A NNNN BB" format on the "A" position.
@jannickbreunis
@jannickbreunis 5 дней назад
What did you originally said on 0:47? :p
@DylanBeattie
@DylanBeattie 5 дней назад
I said "hypertext transport protocol", instead of "hypertext transfer protocol", and wasn't going to re-record the whole clip for that but also wasn't going to leave it as-is because this is RU-vid and somebody in the comments would be "HA LOOK THIS GUY THINKS THE T IN HTTP STANDS FOR TRANSPORT LOL WHAT AN IDIOT"... so I overdubbed one single word. And OF COURSE somebody spotted it. Well done. 😉
@sullivan3503
@sullivan3503 5 дней назад
Haha I thought REST just meant GET, PUT, POST, etc...
@samvarcoe
@samvarcoe 5 дней назад
Superb 👌
@user-yc6km3iw7c
@user-yc6km3iw7c 5 дней назад
I'm not sure what's better: the content, the t-shirts or your enthusiasm!👌
@GlubbDrubb
@GlubbDrubb 5 дней назад
This has been an excellent series. May I suggest a related subject? Keyboard to text mappings. I use a scandinavian mac keyboard, which I type into a linux VM, and get very strange results...
@leyasep5919
@leyasep5919 5 дней назад
Hey Crowdstrike ! You had been warned 😛
@Squossifrage
@Squossifrage 5 дней назад
4:51 While eight-bit bytes were already common when work on ASCII began in the early 1960s, they did not become ubiquitous until the mid-to-late 1970s.
@WilliamHostman
@WilliamHostman 5 дней назад
I think you skipped my favorite two encodings! (THey don't normally happen on computers): Semaphore and Naval Flags. Semaphore has stick and flag versions; Naval Flag Codes have an international encoding requiring color characters. Semaphore is essentially a transmission protocol. Naval Flags, however, are both a transmission protocol and a writing system for a superset containing the latin letters and a few functional symbols. I need to check and see if you covered Baudot code, too... and Japanese Morse Code. and other deliciously weird telegraphy and radiotelegraphy systems. It's quite a deep hole and you've only touched on a part of it.
@DylanBeattie
@DylanBeattie 5 дней назад
The way I approach these kinds of videos is to find weird stuff that most developers will actually encounter in the course of their career, and try to unpack why it behaves the way it does. You work with strings for long enough, you're going to encounter UTF-8 vs UTF-16, you're going to wonder what the bottom block of ASCII is actually for, you're going to find strings that look the same but aren't the same length... if my only inclusion criteria was "delicious weirdness" we'd find ourselves exploring a much bigger rabbit hole :)
@WilliamHostman
@WilliamHostman 2 дня назад
@@DylanBeattie Oh, I get that.
@beatadalhagen
@beatadalhagen 5 дней назад
Just think if octal had been more popular, we might also have had UTF-9! (heh heh heh)
@probablykasper
@probablykasper 5 дней назад
1:35 Isn't that incorrect for Greek, with the letter sigma having three variations, Σσς?
@DylanBeattie
@DylanBeattie 5 дней назад
Today I learned - I had no idea. I'm gonna need to make another video about all the things I learned from the comments, aren't I... 🤣
@leyasep5919
@leyasep5919 5 дней назад
3:23 Oprah at the Unicode founding meeting : "Every character gets a number !"