Тёмный

I Made a FAST Search Engine 

conaticus
Подписаться 58 тыс.
Просмотров 152 тыс.
50% 1

Опубликовано:

 

28 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 184   
@conaticus
@conaticus 6 месяцев назад
Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1
@AWIRE_onpc
@AWIRE_onpc 6 месяцев назад
no
@xulaxwtf
@xulaxwtf 5 месяцев назад
no
@aryanszone4963
@aryanszone4963 5 месяцев назад
no
@noviui
@noviui 4 месяца назад
no thanks
@user-uv3nv2bc6v
@user-uv3nv2bc6v 3 месяца назад
no
@jaymarksum6542
@jaymarksum6542 6 месяцев назад
I’m impressed, can’t wait to see you build a multithreaded web server in assembly
@da40au40
@da40au40 6 месяцев назад
Why do I find it super funny 😅😅😅.
@ArthursHD
@ArthursHD 6 месяцев назад
@@da40au40 Me too :D
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV 5 месяцев назад
it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.
@KibitoAkuya
@KibitoAkuya 5 месяцев назад
​@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites
@DanskeCrimeRiderTV
@DanskeCrimeRiderTV 5 месяцев назад
@@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.
@lifeofme702
@lifeofme702 6 месяцев назад
I don't know what this guy said, and still was mind-blown of all the effort this guy puts
@conaticus
@conaticus 6 месяцев назад
Thanks much so 🙏 It would not be possible without your support
@asm_x86
@asm_x86 6 месяцев назад
That's really impressive, I can't even figure out how to run it.
@ZuperPotato
@ZuperPotato 6 месяцев назад
Nice username
@conaticus
@conaticus 6 месяцев назад
Just added some instructions to the READMEs if you're interested :)
@asm_x86
@asm_x86 6 месяцев назад
@@conaticus thanks, I'll do that
@greensporevalley
@greensporevalley 6 месяцев назад
SERBIA MENTIONED 🎉🎉🎉
@RealMephres
@RealMephres 6 месяцев назад
​@europa_the_last_battle>goes to comments >sees meme comment >looks at replies >only a LARPer replied lol
@jawadmansoor6064
@jawadmansoor6064 6 месяцев назад
that name rings a bell, maybe from some kind of Serbian movie?
@RealMephres
@RealMephres 6 месяцев назад
@@MAXHASS-ph5ib tell that to the LARPer dawg
@slimeyar
@slimeyar 5 месяцев назад
​​@@RealMephrestell that to yourself 😊
@RealMephres
@RealMephres 5 месяцев назад
@@slimeyar you first
@6IGNITION9
@6IGNITION9 5 месяцев назад
filter out JS for another 10x bandwidth savings alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)
@SG-kn2jl
@SG-kn2jl 5 месяцев назад
Why did you choose TF-IDF instead of word2vec or any context aware model?
@skorp5677
@skorp5677 5 месяцев назад
+1 Woule like to know
@R_Y_Z_E_N
@R_Y_Z_E_N 5 месяцев назад
Google also does the same but with disstributed computing to reduce the overall time . Just scale the database horizontally and mimic googles apporach
@aryakvn6051
@aryakvn6051 3 месяца назад
You could calculate and cache TF values on the fly so you don’t fill up your ram as quickly but still get a decent response time.
@polyshrub
@polyshrub 6 месяцев назад
This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big
@turb0004
@turb0004 6 месяцев назад
Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉
@miro5182
@miro5182 3 месяца назад
You can use a chrome like TLS config to not get blocked by cloud flare in a lot of cases, using a browser for scraping isn’t viable when tracking about scanning the internet.
@ExpandedCuber
@ExpandedCuber 6 месяцев назад
Let's go another conaticus video
@synapsenova299-fp7tf
@synapsenova299-fp7tf 6 месяцев назад
>goes to youtube homepage >finds this video >yipeee >oh >lets try it
@errplane_
@errplane_ 6 месяцев назад
oh my fuck i saw this on your github last night
@GermanTimecrafter
@GermanTimecrafter 6 месяцев назад
such a cool video! i love the way how you explain what you are doing :) random question but what is your editor font?
@conaticus
@conaticus 6 месяцев назад
Appreciate it :) I'm using Jetbrains Mono it's free to download
@alexmoses3215
@alexmoses3215 4 месяца назад
Programming 🤝 martincitopants…match made in heaven
@foqsi_
@foqsi_ 6 месяцев назад
Love this dude and his video projects
@conaticus
@conaticus 6 месяцев назад
🙏
@lonelybookworm
@lonelybookworm 5 месяцев назад
Well of course it is very fast, it only has like 200 websites
@jugurtha292
@jugurtha292 5 месяцев назад
very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank
@a224kkk
@a224kkk 5 месяцев назад
Nice, you re-invented the lucene library
@MortonMcCastle
@MortonMcCastle 5 месяцев назад
Good! The world needs a new Google Search, one that's more like how it was in the 2000s.
@jsalsman
@jsalsman 5 месяцев назад
I believe it's "inverted indexing", as inverse indexing is something else.
@HyperCodec
@HyperCodec 5 месяцев назад
Bro managed to memleak in js
@joenutt1232
@joenutt1232 6 месяцев назад
Create your own database engine for shits and giggles
@conaticus
@conaticus 6 месяцев назад
B+Trees 💀
@ColorVisor
@ColorVisor Месяц назад
whats the link?
@schoolbreakyay
@schoolbreakyay 2 месяца назад
Can i not use brightdata?
@allenfpascua
@allenfpascua 6 месяцев назад
Super good editing 🫡🫡🫡🫡
@conaticus
@conaticus 6 месяцев назад
Would not possible with your breathtaking animations 😄
@iritesh
@iritesh 5 месяцев назад
Awesome effort ✨
@v037_
@v037_ 5 месяцев назад
I found a worthy opponent
@binpersonal
@binpersonal 6 месяцев назад
"some fucking genius" lmao
@danielisop3182
@danielisop3182 5 месяцев назад
What did u mean by the websites u shouldn’t have searched
@AttaaH
@AttaaH 3 месяца назад
0:33 🤨
@gaimnbro9337
@gaimnbro9337 5 месяцев назад
Nice job :D
@fangg194
@fangg194 6 месяцев назад
you seem ok
@larry_berry
@larry_berry 6 месяцев назад
Lol. Got notif after clicking the video.
@Macellaio94
@Macellaio94 5 месяцев назад
Liked and subbed
@J0Y22
@J0Y22 5 месяцев назад
shockedd
@planktonfun1
@planktonfun1 5 месяцев назад
Still not fast and scalable enough. The result is not even relevant, you made bing not google
@LaugeHeiberg
@LaugeHeiberg 5 месяцев назад
wow really? Im also surprised one single guy didnt manage to make a product rivaling Google
@gamefun2525
@gamefun2525 3 месяца назад
wow Sheldon, you got your Nobel yet?
@ph03n1x_dev
@ph03n1x_dev 6 месяцев назад
You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀
@conaticus
@conaticus 6 месяцев назад
All open source and ready to play around with 😂
@Ayymoss
@Ayymoss 6 месяцев назад
MAKE LONGER VIDEOS
@monkshee
@monkshee 6 месяцев назад
damn
@vrljk
@vrljk 5 месяцев назад
SRBIJAAAAAA
@FaZekiller-qe3uf
@FaZekiller-qe3uf 5 месяцев назад
Disappointing
@ccost
@ccost 5 месяцев назад
7:40 flashing those questionable websites in a sponsored video is quite the move
@twitchizle
@twitchizle 5 месяцев назад
You scared of porn?
@coderx8634
@coderx8634 6 месяцев назад
Love your content. You and your quality have really improved. Keep it up ❤
@conaticus
@conaticus 6 месяцев назад
Thanks so much, your support means a lot ♥
@devinlauderdale9635
@devinlauderdale9635 6 месяцев назад
The problem is this approach is susceptible to SEO spamming/invisible SEO keywords
@conaticus
@conaticus 6 месяцев назад
Yeah for sure, realistically it should be moderated based on user interaction as well
@AquaQuokka
@AquaQuokka 6 месяцев назад
Rewrite your genetic code in Rust.
@pyyrr
@pyyrr 6 месяцев назад
i would rather be bug free so i will pass
@dreamsofcode
@dreamsofcode 6 месяцев назад
🔥🔥🔥
@susannerudolph8469
@susannerudolph8469 5 месяцев назад
then brightdata makes captchas useless
@educacionespecialchannel3756
@educacionespecialchannel3756 5 месяцев назад
Captcha's effectiveness has been in question for quite some time now.
@brettmiddleton5013
@brettmiddleton5013 3 месяца назад
protects against amateurs but keeps it simple enough that an expert won’t breach/destroy their data to get what they want.
@Horn7xBG
@Horn7xBG 5 месяцев назад
hub 🎉🎉
@coderan5029
@coderan5029 5 месяцев назад
This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own
@stayhappy-forever
@stayhappy-forever 6 месяцев назад
thats insane, hows this only at 12k views
@Miluum
@Miluum 5 месяцев назад
1:06 automatically solve captchas? i knew these things exist just to waste our time and energy
@monotonedevelopment
@monotonedevelopment 5 месяцев назад
If only windows file explorer could do the same
@SandWire
@SandWire 5 месяцев назад
For this we have thing named Everything :)
@iCrimzon
@iCrimzon 3 месяца назад
Cant wait for you to rewrite JS in binary 🎉🎉
@80sVectorz
@80sVectorz 5 месяцев назад
3:07 Best pronunciation of Euclidean I have every heard :P
@CrazyDiamondo
@CrazyDiamondo 5 месяцев назад
Where?
@80sVectorz
@80sVectorz 5 месяцев назад
@@CrazyDiamondo I added a timestamp
@brettmiddleton5013
@brettmiddleton5013 3 месяца назад
So you’re telling me I can access restricted data by telling it to, basically, ignore restrictions??? I Have been calling myself dev, admin, ownr, root in vain for far too long
@juniordevmedia
@juniordevmedia 6 месяцев назад
what TF is IDF ?!!
@neofox2526
@neofox2526 6 месяцев назад
idk man but watching it makes me feel smart
@jamesbarret4240
@jamesbarret4240 6 месяцев назад
Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf
@kavinbharathi
@kavinbharathi 6 месяцев назад
Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi
@conaticus
@conaticus 6 месяцев назад
Thank you, I'll do this if I ever pronounce it again 😂
@_DarkLiquid
@_DarkLiquid 6 месяцев назад
discord clone when
@rafaelpereiracoias1047
@rafaelpereiracoias1047 5 месяцев назад
Nice video and nice code, keep up the good work!
@neologicalgamer3437
@neologicalgamer3437 5 месяцев назад
Bro sounds like WilburSoot
@AhmedMo-ec4kz
@AhmedMo-ec4kz 4 месяца назад
Great video 😊 FYI: bright data is an Israeli company 😮
@Serhii_Volchetskyi
@Serhii_Volchetskyi 5 месяцев назад
🔥🔥🔥 I was looking for that algorithm and didn't know its name.
@--bountyhunter--
@--bountyhunter-- 3 месяца назад
bro thought he could scrape my web and get away with it.
@gopallohar5534
@gopallohar5534 5 месяцев назад
ain't see rust there!
@flashyexe1
@flashyexe1 5 месяцев назад
this result dont make any sense xha... very fast
@deepfan14
@deepfan14 4 месяца назад
Bro make a compiler programming language
@humanontheinternet6510
@humanontheinternet6510 4 месяца назад
Auto solve captcha you say🧐
@igrb
@igrb 5 месяцев назад
nice
@ALTERRAa8
@ALTERRAa8 5 месяцев назад
6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀
@latrapa918
@latrapa918 5 месяцев назад
105
@animeworld4775
@animeworld4775 5 месяцев назад
what is things that i should to know or learn to create like these projects
@gamedirection_us
@gamedirection_us 5 месяцев назад
🍎 👀 .. Apple being like "when will it be ready?".
@Raven-fu1zz
@Raven-fu1zz 5 месяцев назад
Remember, never return an over 18 site without an over 18 word in the search request
@maksymilianglowacki1409
@maksymilianglowacki1409 5 месяцев назад
is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it? or was it dust a peak or somthing you made cuz ( you where bored or smt )
@gammongaming9081
@gammongaming9081 5 месяцев назад
yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths
@madalenaferreira3018
@madalenaferreira3018 5 месяцев назад
great video, gave me ptsd from my information retrieval class though
@lukamajcenic1172
@lukamajcenic1172 5 месяцев назад
This is just an ad for BrightData. Compared to previous videos very low effort.
@MySachincool
@MySachincool 4 месяца назад
Subscribed & notifications on :) you deserve more recognition bruh
@yorailevi6747
@yorailevi6747 5 месяцев назад
how much did you pay for the web scraping service in total?
@ethanstewart1011
@ethanstewart1011 5 месяцев назад
How did you manage to get a node.js memory leak??
@callowaysutton
@callowaysutton 5 месяцев назад
Next time use the Common Crawl dataset ;)
@_sohom
@_sohom 6 месяцев назад
Make a better version of VSCode.
@thekwoka4707
@thekwoka4707 5 месяцев назад
How much did the scraping cost if it wasn't free?
@SlimyFrog123
@SlimyFrog123 6 месяцев назад
Now make your own email system to go along with it. 😉
@Xanmattauri
@Xanmattauri 5 месяцев назад
@google acquire this man
@sleepybraincells
@sleepybraincells 6 месяцев назад
Why is there Rust in the thumbnail? This was written in Javascript
@conaticus
@conaticus 6 месяцев назад
Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations
@ssoka-m5n
@ssoka-m5n 5 месяцев назад
rust is a real badass❤❤
@a6gittiworld
@a6gittiworld 6 месяцев назад
Supa dope. I would like to use this search engine of yours
@upcoming-k6p
@upcoming-k6p 5 месяцев назад
You should host it
@Tech_Code127-76
@Tech_Code127-76 6 месяцев назад
Good
@Faeest
@Faeest 5 месяцев назад
why disallow and user-agent matter? can't you just scrap everything?
@skorp5677
@skorp5677 5 месяцев назад
You can but it might be illegal
@TheRealMangoDev
@TheRealMangoDev 6 месяцев назад
good vid
@Nerdimo
@Nerdimo 6 месяцев назад
Impressive, seriously!
@daemonkisure2952
@daemonkisure2952 6 месяцев назад
how can i install this search engine?
@conaticus
@conaticus 6 месяцев назад
Instructions are on the Github repos :)
@etherbeans
@etherbeans 6 месяцев назад
da goat
@mahrezjanati3426
@mahrezjanati3426 6 месяцев назад
first time watching a vid of yours ... i have one question : why are you vibrating ??
@-rate6326
@-rate6326 5 месяцев назад
Cause he is vibrator
@InioluwaFalade-Tolulope
@InioluwaFalade-Tolulope 3 месяца назад
don't know either
@chiroyce
@chiroyce 6 месяцев назад
What are the consequences of scrapings sites you aren't allowed to?
@conaticus
@conaticus 6 месяцев назад
Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt
@314cubed
@314cubed 6 месяцев назад
wastes their resources and yours
Далее
So, you want to build apps & websites?
9:34
Просмотров 186 тыс.
Китайка и Максим Крипер😂😆
00:21
Kenji's Sushi Shop Showdown - Brawl Stars Animation
01:55
ТАРАКАН
00:38
Просмотров 370 тыс.
What Happened To Google Search?
14:05
Просмотров 3,1 млн
This is How I Scrape 99% of Sites
18:27
Просмотров 82 тыс.
The cloud is over-engineered and overpriced (no music)
14:39
I Optimised My Game Engine Up To 12000 FPS
11:58
Просмотров 679 тыс.
I made my own Web
15:43
Просмотров 553 тыс.
10 FORBIDDEN Sorting Algorithms
9:41
Просмотров 865 тыс.
I Tried Java (it's horrible)
8:02
Просмотров 64 тыс.
I Hacked a Discord Bot, the Owner said this...
9:09
Просмотров 1,3 млн
I Made a FAST File Explorer (in Rust)
4:33
Просмотров 223 тыс.
How One Line of Code Almost Blew Up the Internet
13:47
Китайка и Максим Крипер😂😆
00:21