Тёмный

A better hash table (in C) 

Jacob Sorber
Подписаться 165 тыс.
Просмотров 30 тыс.
50% 1

Patreon ➤ / jacobsorber
Courses ➤ jacobsorber.th...
Website ➤ www.jacobsorbe...
---
A better hash table (in C) // I've been meaning to revisit hash tables for a few years. Well, today's the day, we look at making our original hash table (circa 2020) into a general one that we can store just about anything in.
Related Videos:
Hash Table: • Understanding and impl...
Libraries: • How to Load Libraries ...
Makefiles: • Learn make in 60 seconds.
***
Welcome! I post videos that help you learn to program and become a more confident software developer. I cover beginner-to-advanced systems topics ranging from network programming, threads, processes, operating systems, embedded systems and others. My goal is to help you get under-the-hood and better understand how computers work and how you can use them to become stronger students and more capable professional developers.
About me: I'm a computer scientist, electrical engineer, researcher, and teacher. I specialize in embedded systems, mobile computing, sensor networks, and the Internet of Things. I teach systems and networking courses at Clemson University, where I also lead the PERSIST research lab.
More about me and what I do:
www.jacobsorbe...
people.cs.clem...
persist.cs.clem...
To Support the Channel:
+ like, subscribe, spread the word
+ contribute via Patreon --- [ / jacobsorber ]
Source code is also available to Patreon supporters. --- [jsorber-youtub...]

Опубликовано:

 

27 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 63   
@sanderbos4243
@sanderbos4243 Год назад
What I really enjoyed programming and found incredibly useful during my 1.5 years of C assignments was to write my own vector implementation. A basic one is only about 50 lines of code. Because my uni also requires us to free() *all* allocated memory manually, I was then able to write void *my_malloc(size_t count, size_t size, char *description): a malloc() wrapper that stores the new address in one of those vectors. I could then call print_allocations() and free_allocations() at the end of my main()! Very nice during debugging.
@Urre5
@Urre5 Год назад
Did they say why you had to free stuff at the end of main
@sanderbos4243
@sanderbos4243 Год назад
@@Urre5 I presume it is because in most of our projects we don't have any loops that would force us to free memory. So they just want to make sure we are aware of how to use free() properly. On some systems the OS might not do it for us at the end of the program too.
@Urre5
@Urre5 Год назад
@@sanderbos4243 yeah I was hoping it's because of the latter part, but they should be explicit, and even in particular teach you not to free on systems which will clean things up, because otherwise you'll waste the users time when exiting the program. Then again if it's a nice arena or something where you have all your allocations it shouldn't take too long anyway
@sanderbos4243
@sanderbos4243 Год назад
@@Urre5 Totally agree, we pretty much learn to use malloc() and free() however we like, as long as we don't have leaks. We're not told basic performance stuff like it maybe being braindead to use malloc() and free() unnecessarily all over the place, and without telling us about stuff like big O. Every exercise is a PDF, and our school (look up Codam or 42 school) deliberately doesn't have any teachers nor books we have to read, so everyone helps each other, and we spend a ton of time reading up online. It's incredibly freeing since the school is open 24/7 and you aren't required to be there for that many hours per week, but it isn't for everyone, since it's your own responsibility to become an awesome programmer. Oh, and it's completely free. :)
@CodePagesNet
@CodePagesNet 2 месяца назад
Thank you for the helpful video and C videos in general. I encourage and promote the understanding that C has advantages over OO, even though people may not understand that yet (OO is merely a code format, and an inflexible one at that).
@ahmadhadwan
@ahmadhadwan Год назад
Very interesting video dr. Jacob, I'm glad you decided to expand on the last video.
@adambishop328
@adambishop328 Год назад
wow thank you for strcspn, i've been looping through my character arrays for a long time to try and format them into null-terminated strings without any return or newlines. Sweet function
@tiramihai1152
@tiramihai1152 Год назад
A 41 minute Jacob Sorber video? I'm in for a ride
@sortof3337
@sortof3337 Год назад
I thought you stopped syaing without further ado. hahah. very good video. The reason I am good at c is because I have all your videos I can reference. :D
@mr.erikchun5863
@mr.erikchun5863 Год назад
Thank you Jacob for making these videos.
@Nohope__
@Nohope__ 6 месяцев назад
I'm going to have to watch this 10 times. (TYSM for the amazing material < 3)
@TheSulross
@TheSulross 9 месяцев назад
Just had to implement an open addressing hash table using linear probing and and double hading to reduce clustering - and I validated that, yes, double hasing does reduce clustering and the second has function can be very cheap and practically no cost. In my case the has table is allicated up front to some size and does not have to be increased in size over operational life time. Only keys are stored in the has table so a lookup returns an index. So the data resolved to is kept in a separate array that is of the same max entries size as the hash table itself. So the very same hash table can be used to lookup different data structure values depending on context - something that is the case in my domain.
@greg4367
@greg4367 Год назад
Looking forward to part 2
@randomscribblings
@randomscribblings Год назад
strdup() == malloc() + strcpy()
@svenvandevelde1
@svenvandevelde1 10 дней назад
Just know that malloc and calloc are implemented using a heap structure, which is much more complex than a hash table. Why not creating this case study without malloc and calloc. Through static memory allocation. The usage of malloc and calloc slow down the logic dramatically. Also, if your hash table size and the structure size can be binary calculated through bit shifts, the key calculation can be made using a rotating random binary calculation. Which will result in blazingly fast key calculation.
@russelwestbrick3023
@russelwestbrick3023 Год назад
wonderful teaching!!
@SimGunther
@SimGunther Год назад
strager had a whole video on hash tables and it turns out that a better hash function based on the understanding of the keys going into the table equals a MUCH faster hash table! 🎉
@marcossidoruk8033
@marcossidoruk8033 Год назад
That video is completely miselading or at the very least it seems it has mislead you. Whats being implemented in this video is a general purpose hash table, what the video you mentioned shows is a perfect hash table that only works with a predefined, hardcoded set of words because he needed that for a JavaScript compiler. Those are two completely different problems and tbh his solution is quite dumb because for such a specific problem and such a limited set of keywords if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
@tommasobonvicini7114
@tommasobonvicini7114 Год назад
Folks, look at the number of thumbs up SimGunther received, then look at marcossidoruk ones: welcome to the software industry.
@godnyx117
@godnyx117 Год назад
@@marcossidoruk8033 Is it really faster tho? If that is the case, then a genera purpose language with good mata-programming features (like D) can easily create a library that does that!
@MaxCoplan
@MaxCoplan Год назад
it’s pretty obvious strager didn’t really know what he was talking about and just made the video for the clickbait. Did you see the thumbnail? On his stream today he said he didn’t even go to college! How anybody can take software engineering advice from him is beyond me.
@strager_
@strager_ Год назад
> if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table. You should leave a comment on my video with your suggestion.
@austinraney
@austinraney Год назад
Is the calloc call at 13:16 not incorrect? I thought it was number of elements then size of t. Right, it still will allocate the same amount of memory, I just would have expected that you would need to cast to make the compiler happy. What am I missing?
@austinraney
@austinraney Год назад
Having thought about it for a second, I guess the calls would be functionally equivalent. Are there ever cases when they aren’t?
@Uerdue
@Uerdue Год назад
@@austinraney From the manpage, I cannot find any evidence that swapping the arguments could ever mess things up. I would however argue that it could potentially hurt performance, because the `calloc` implementation might use that extra bit of information you provide by specifying what's the amount of items and what's the size. For example, it might try to align the memory such that no single item in the array will cross a page boundary. For this, it would need to know what's the element size. Interestingly, I haven seen more people supplying the arguments in the wrong order than doing it correctly.
@Uerdue
@Uerdue Год назад
Update: I checked the libc implementation on my machine, and found that it doesn't care: It multiplies the values, makes sure no overflow happened, and then just goes on to allocate a block of memory as large as the result of the multiplication. Other libc implementations might differ.
@austinraney
@austinraney Год назад
@@Uerdue thanks for doing some digging and sharing! It’s much appreciated! I was curious in particular about potential page alignment problems like you mentioned.
@JacobSorber
@JacobSorber Год назад
Yeah, calloc is typically just a multiply, a malloc call, and a memset (or the equivalent). Sorry if I caused any confusion.
@zxuiji
@zxuiji Год назад
What I'd like to see is a vid on the new lattice based encryption algorithm, be one I'd definitely save for later and I'm sure a number of peops here would end up using it in future jobs or existing ones if they have them.
@FelixNielsen
@FelixNielsen 11 месяцев назад
I have a question may be relevant or entirely unrelated. I'm not actually sure. In short, I have a problem, the solution to which could well be a hash table. My keys are known to be unique and equal length, that is to say in terms of bytes no more than a few, or not strings, but rather integers, if you so desire. The question the becomes, is there a special category of hash functions (or other method), which can convert these keys into values in a given range, naturally ranging from 0-(n-1) for n items? Mind you I can think of other solutions for doing what I need to do, which is basically a runtime defined and/or modified switch case like functionality, but none I can think of are entirely well suited. Thanks for your efforts.
@anon_y_mousse
@anon_y_mousse Год назад
It's not a bad start, but it would help if you made it slightly more generic. You could use _Generic and specialize on a few known types, or you could use a union and associate traits with whatever user defined data gets passed in. It would help to have the user pass in a hashing function and a comparison function and have flags to determine if data should be copied or merely pointed to.
@greg4367
@greg4367 Год назад
Jacob, the Subscribe button on your WEB page is inop.
@erbenton07
@erbenton07 Год назад
points-- for unnecessary use of feof Jacob, long video's are fine
@randomscribblings
@randomscribblings Год назад
strdup() again
@randomscribblings
@randomscribblings Год назад
In delete you're leaking the key memory.
@JacobSorber
@JacobSorber Год назад
Did you watch the video? 😀 Yeah, I know. The example is unfinished. See you next week.
@randomscribblings
@randomscribblings Год назад
@@JacobSorber Yeah... the comment was made as I was watching.
@zxuiji
@zxuiji Год назад
10:25, you already typedef'd it, you don't need another typedef, rather that's just asking for compile time errors
@aniritri8635
@aniritri8635 Год назад
Have you checked Zig yet ? Seems like a nice language to overview and compare to C.
@djazz0
@djazz0 Год назад
And Nim! :)
@RobertaPROTO
@RobertaPROTO 12 часов назад
Hi there, for a School project i do have to use a double linked list to record frequent items ( for each node i ll put the density and the time it was updated) however the problem is that i have to "organize my double linked list as a hash table using a function H" How is it possible? It also said that i do have to make the pointer to points to the next in term of time of updating
@JaccovanSchaik
@JaccovanSchaik Год назад
33:12 strdup()!
@JacobSorber
@JacobSorber Год назад
Very true. Thanks
@69k_gold
@69k_gold Год назад
Please make a video about terminal (termios.h) and different terminal modes etc
@Ido-Levy
@Ido-Levy Год назад
Hey, thank you for putting out these videos! I'm learning a lot from you :) Why aren't you checking for memory allocation failures?
@mytriumph
@mytriumph 9 месяцев назад
in my experience, it generally isn't necissary on modern computers. The odds that so much of your computer's total memory is being taken up by other processes, so much so that the program fails to allocate a comparatively small amount of memory, is small enough on modern computers that you can reasonably rule it out. Now, is it good practice to check anyway? Yes, absolutely. But it ultimately doesn't end up making that big of a difference
@thomaswillson1107
@thomaswillson1107 Год назад
Hi, can you show us how did you custom your vscode (comparaison operators like `!=`, etc...), thx for the video !
@strager_
@strager_ Год назад
Those look like ligatures. You need a font with code-oriented ligatures, and you need an editor which supports ligatures. I don't know what font Sorber uses, but Fira Code is a popular font which has ligatures.
@jvp5000
@jvp5000 2 месяца назад
@@strager_ thanks
@surters
@surters 6 месяцев назад
If you want to make a generic hash table, you need a lot of helper functions that knows the type that you would need to pass along, that gives a lot of extra parameter. Or you could just pass along a point to a struct with all those extra functions, each of them function pointers for that type. Some of the extra pointers could be print_obj, destroy_obj, initialize_obj, copy_obj, assign_obj etc.
@zxuiji
@zxuiji Год назад
Considering how you use the allocations should've really just used calloc everywhere to avoid runtime issues
@Ido-Levy
@Ido-Levy Год назад
Also, why are you using uint32_t instead of just int?
@soniablanche5672
@soniablanche5672 9 месяцев назад
uint32_t is always gonna be unsigned 32 bit, int size will depend on your machine
@_veikkomies
@_veikkomies Год назад
When you write "tmp = tmp->next" (e.g. lookup function), don't you have to define what "tmp->next" means? Where was that done?
@IBelieveInCode
@IBelieveInCode Год назад
"next" is a field of the struct "entry". You probably missed that 🙂
@_veikkomies
@_veikkomies Год назад
@@IBelieveInCode Ahh yeah, probably. Thanks
@IBelieveInCode
@IBelieveInCode Год назад
Good Game 🙂
@IBelieveInCode
@IBelieveInCode Год назад
I've just written my own C "Hash Table" module. It's on my channel. Without sound. My english is badly written, but it's worse when I try to speak.
@johanngambolputty5351
@johanngambolputty5351 Год назад
Just to be cheeky, the thing is, you don't have to type text from video anyway, you can use optical character recognition, I like to do `spectacle -r -b -o /tmp/screenshot.png && tesseract /tmp/screenshot.png stdout --psm 6 | xclip -sel clip` set to a keybind, then you can just paste into your favourite editor ;)
Далее
Learn C++ With Me #17 - Maps
24:35
Просмотров 76 тыс.
Avaz Oxun - Yangisidan bor
14:29
Просмотров 430 тыс.
Faster than Rust and C++: the PERFECT hash table
33:52
Просмотров 573 тыс.
Pulling Back the Curtain on the Heap
21:38
Просмотров 37 тыс.
Understanding and implementing a Hash Table (in C)
24:54
Hash Tables - Data Structures and Algorithms
20:16
Просмотров 38 тыс.
Hash Table in C
2:11:31
Просмотров 65 тыс.
Fixing our "better" hash table's memory leaks (in c)
10:04
Hash Tables - CS50 Shorts
18:47
Просмотров 146 тыс.
Hash Tables and Hash Functions
13:56
Просмотров 1,6 млн
31 nooby C++ habits you need to ditch
16:18
Просмотров 787 тыс.
Avaz Oxun - Yangisidan bor
14:29
Просмотров 430 тыс.