@dylan I have been doing development for more than 21 years. I use UTF-8 as much as I can! Random question: It seems like Huffman compression algorithm a bit! Doesn't it?
i thought REST just meant asynchronous with the server spining down and massively underclocking in relation to use. so i appreciate the video, the idea that it was an acronym never occurred to me. i wonder what hypermedia is.
Some more systems with their own text encodings are GSM 03.38 (mobile phone text messaging), barcodes (code 128) and alphanumeric QR codes (though UTF-8 inside 8-bit QR is common).
I'm just about to do my final practical exam on a "DevOps apprenticeship" which will be interesting since I've optimized the app to a single static page that gets refreshed nightly. Not sure how i will get marks for logging and monitoring.
I ended up with this problem last Wednesday. I needed emergency health care, and at first I tried to use a well-known "modern" provider, but in my state of pain I couldn't work out whether my request had been accepted or if they expected more information....So I called 112 since I was too confused and too scared to bet that I had overcome their too complex website. Turned out I had, and they tried to call back while I was on the 112 call. But afterwards I was thinking; How stupid are they not to test on people in extreme pain, inability to move properly, can't see clearly and so on?
I'm pretty sure I saw a video of you giving this at a conference some number of years ago. Doesn't mean I didn't just watch the whole thing again though.
Minor nits. Novell, on selling Wordperfect, kept the "mail" part, which was GroupWise. (MS Exchange / Lotus Notes / Novell GroupWise competing with each other as bad mail servers with worse mail clients on top of independent invented client/server/db bad ideas is a story...) The other is more a clarification, Wordperfect itself had to be ported to Linux to be sure, but was in the Dos days, cross platform itself, available on various traditional UNIX systems as well.
Thanks, fascinating. A real eye-opener for me, a native English speaker. I know 'of' Unicode, and MBCS, but rarely affect me in my cozy day-to-day life. Btw, I hate the word 'sequel', when used to say 'SQL'. WHY??? Do you also type Sequel'?
Subscribed on the strenght of hte character encoding series alone. Neat, approachable explanation of a very complex subject.
3 дня назад
With Unicode strings, and with combining characters, you need to define what do you mean by the length of the string: is it the number of Unicode codepoints? Is it the number of glyphs (so, for example, letter E + combining character CEDILLA is 1 glyph long)? Is it the width in columns when rendering it on text terminal, using constant-width / monospace font?
Came here from wanting to read FAT32 file names on SD cards from a microcontroller, without the benefit of a full C library. Now staring at a header file and contemplating German streets and whether I should pass a flag to the sort algorithm that denotes the list contains European city names. I make poor life choices.
I am sure it was a small effort for Magnus to travel to USA. Try China, where the border people are uninterested, unknowledgeable and unreasonable... Yeah, almost missed the flight, but an hour later, several supervisor's input, eventually they let me through. Speaking of China; Don't have too many characters in your full name. Many banks takes 10-20 characters maximum in their systems, and within the same bank, different lengths. Then combine that with different handling of Swedish characters (a, ae and ä existed within the bank in my case), I think I had about 10 different variants of my "full name" within that bank. And just about every interaction with the bank ended up being a crime novel, either treating me like I tried to defraud the account, or that time when it stopped accepting salary pay out when they upgraded the system, since the new TT system accepted a different number of characters in names than previous system and took awhile to work that out and get my employer to change that.
I've dealt with all variants of encodings and charsets some of which created before I was born. While it was converging to to greater usefulness the incompatibilities were just wasting too much time. I'm so happy we've finally arrived at a solution for something so fundamental. The only thing that bugs me are the non-essential extensions such as the emojis which in turn are causing some churn with fonts. As for UTF-7, it was defined in RFC 1642 the status of which is "Informational", that is it's never been an official standard. There's even Modified UTF-7 showing the original UTF-7 didn't quite cut it. It's never been a standard of the Unicode Consortium either. Security issues have been found with UTF-7 so it was retired, software support for it even got removed. So no surprise you never got to see it. I didn't either and it's not one of the experiences I'm missing.
Until not so long ago, Windows and many other supposedly UTF-16-complaint pieces of software used to only support the subset of Unicode that could be directly encoded in 16 bits, the 65536 codepoints which correspond to Unicode's "Basic Multilingual Plane" (BMP) which, as a matter of fact, includes almost everything you might need. The BMP is enough to support all major languages and most minor languages as well, you'd only need to go outside the BMP, into codepoints not supported by Windows, if you needed to work with ancient historical scripts, or if you needed to include some very rare Chinese character not included in the BMP. Then something happened and, all of a sudden, billions of people began regularly using characters beyond the BMP, requiring proper Unicode support. It was emoji. Support for emoji (which have codepoints above 65536) is the main reason Windows and many other software now properly support Unicode.
@6:45 four ways? ... lol. The seriously used ones are utf8, utf16-le, utf16-be, utf32-le, utf32-be, utf7 and utf-ebcdic.... which is 7. And I'm 99% sure you actually use most or all of these every day you're alive, in one way or another. Incidentally, *all* things that encode to 16 bits in UTF16, are between 1 and 3 bytes in UTF8, while all surrogate pairs in UTF16 map to 4 bytes in UTF8. The boundaries just line up. And UTF32 / UTF8 mapping *could* use 5-byte sequences, except that UTF8 and UTF32 are both held back by the UTF16 systems, so that everything can roundtrip through UTF16 without loss. Plus, who ever needs more than 1114112 code points? (famous last words; 3rd attempt)
Dylan could turn up at future tech conference promising a talk entitled "The history of Bolivian toilet brush manufacturing", and you'd just know that his would be the best presentation there and that you'd learn something incredible that made you a better programmer.
PIKE MATCHBOX is a terrible password. РІКЕ МАТСНВОХ, on the other hand, would be extremely resistant to rainbow tables and bruteforce attacks since it uses extended Unicode and isn't based on dictionary words.
Five. There are five- among the *many* encodings of the Unicode exposition, are ... UTF-8, UTF-16BE, UTF16-LE, UTF-32, UTF-7, and a ruthless and fanatical devotion to Punycode.
Bulgaria also uses the Latin-alike glyphs on their license plates (same format as Ukraine: "AA NNNN BB" or "A NNNN BB") with one exception: the letter "У". I guess it was deemed similar enough to the latin "Y". I've only ever seen it used in the "A NNNN BB" format on the "A" position.
I said "hypertext transport protocol", instead of "hypertext transfer protocol", and wasn't going to re-record the whole clip for that but also wasn't going to leave it as-is because this is RU-vid and somebody in the comments would be "HA LOOK THIS GUY THINKS THE T IN HTTP STANDS FOR TRANSPORT LOL WHAT AN IDIOT"... so I overdubbed one single word. And OF COURSE somebody spotted it. Well done. 😉
This has been an excellent series. May I suggest a related subject? Keyboard to text mappings. I use a scandinavian mac keyboard, which I type into a linux VM, and get very strange results...
4:51 While eight-bit bytes were already common when work on ASCII began in the early 1960s, they did not become ubiquitous until the mid-to-late 1970s.
I think you skipped my favorite two encodings! (THey don't normally happen on computers): Semaphore and Naval Flags. Semaphore has stick and flag versions; Naval Flag Codes have an international encoding requiring color characters. Semaphore is essentially a transmission protocol. Naval Flags, however, are both a transmission protocol and a writing system for a superset containing the latin letters and a few functional symbols. I need to check and see if you covered Baudot code, too... and Japanese Morse Code. and other deliciously weird telegraphy and radiotelegraphy systems. It's quite a deep hole and you've only touched on a part of it.
The way I approach these kinds of videos is to find weird stuff that most developers will actually encounter in the course of their career, and try to unpack why it behaves the way it does. You work with strings for long enough, you're going to encounter UTF-8 vs UTF-16, you're going to wonder what the bottom block of ASCII is actually for, you're going to find strings that look the same but aren't the same length... if my only inclusion criteria was "delicious weirdness" we'd find ourselves exploring a much bigger rabbit hole :)