Тёмный

The creators of TikTok caused my website to shut down 

MattKC
Подписаться 463 тыс.
Просмотров 314 тыс.
50% 1

and i thought charli d'amelio was the worst thing bytedance had done to me
▶SUPPORT on Patreon and watch videos like this early and ad-free: / mattkc
▶FOLLOW on Twitter: / itsmattkc
▶FOLLOW on Twitch: / mattkclive
▶FOLLOW on Instagram: / itsmattkc
▶Music by DDRKirby(ISQ) used with permission: ddrkirbyisq.bandcamp.com/
"I Can Feel it Coming" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
creativecommons.org/licenses/b...

Игры

Опубликовано:

 

17 авг 2023

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 1 тыс.   
@christianwolff497
@christianwolff497 9 месяцев назад
the biggest crime here is naming it ByteSpider and not SpiderByte
@Your_Average_Stickman_WasTaken
@Your_Average_Stickman_WasTaken 9 месяцев назад
I hate bluey
@ErikKoev
@ErikKoev 9 месяцев назад
no, the biggest crime is actually naming it ByteSpider and not SpiderDance
@pausebreakreviews
@pausebreakreviews 9 месяцев назад
God forbid it bite ya. Don't let 'em bitecha. That SpiderByte! Hurt! Hurt SpiderByte! That SpiderByte HURT!
@codyryan9789
@codyryan9789 9 месяцев назад
​@@pausebreakreviewsthat spider bit me where the good lord split me
@KOMEOyt
@KOMEOyt 9 месяцев назад
SpyderByte
@philo23
@philo23 9 месяцев назад
You probably want to block them in cloudflare rather than on your server, currently they’re still wasting your bandwidth (just in a much much more reduced form) by blocking them in cloudflare they shouldn’t end up wasting any of your bandwidth at all, they’ll never even touch your server. A simple page rule should do the trick, and even on the free tier you should get 3 page rules.
@lxpe5269
@lxpe5269 9 месяцев назад
Cloudflare also gives 5 WAF rules for free. With these, you could create a rule to block the user agent and add any other user agents, IPs, ASNs, etc in the future within a single rule.
@JustAWalter
@JustAWalter 9 месяцев назад
It says in the video blocking doesn't help
@philo23
@philo23 9 месяцев назад
@@JustAWalter in the video he's talking about Cloudflare's automatic bot detection, which is going to let legitimate web crawlers like ByteSpider through. I'm talking about a custom rule to specifically block that user agent at the Cloudflare level
@porglezomp7235
@porglezomp7235 9 месяцев назад
No, it says in the video that cloudflare’s automated DDOS protection doesn’t help. Explicit traffic rules would help.
@gggkiller
@gggkiller 9 месяцев назад
Since the bot follows links, if the homepage returns a 403, it won't spam the other pages as it has no links to follow I assume, but yeah, blocking in CF would still be ideal as it'd mean even avoiding that initial single 403 request's bandwidth.
@GooveG
@GooveG 9 месяцев назад
ByteSpider checking for updates on the Lego Island series every 100 milliseconds
@underscore.
@underscore. 9 месяцев назад
0.1 milliseconds*
@Yazan_Majdalawi
@Yazan_Majdalawi 9 месяцев назад
​@@underscore. 0.1 seconds
@UnitSe7en
@UnitSe7en 9 месяцев назад
Chinese hackers still have not managed to fix the framerate. Spy on the West.
@tomtrublu
@tomtrublu 9 месяцев назад
0.1 nanoseconds.
@leonkernan
@leonkernan 9 месяцев назад
Can’t fault them for wanting an update
@PoignantPirate
@PoignantPirate 9 месяцев назад
I definitely appreciate the PSA, you literally just saved me from having to diagnose the same issue on one of my servers.
@erikkonstas
@erikkonstas 9 месяцев назад
Wait really? The coincidence!
@NaoPb
@NaoPb 9 месяцев назад
​@@erikkonstasyou mean coincibytedance.
@adamkuster
@adamkuster 9 месяцев назад
@@erikkonstas You mean CoinciDance.
@FlameSoulis
@FlameSoulis 9 месяцев назад
Can confirm. I've been bitten by the stupid spider now that I reviewed my logs. If it isn't Russia trying to access a non-existant CPanel, it's now this.
@glossymouse7712
@glossymouse7712 8 месяцев назад
​@@erikkonstasIt might not be a coincidence as they are probably launching a huge data gathering campaign for a possible AI.
@asriel09
@asriel09 9 месяцев назад
Looks to me like they're downloading any and all images they can find. Could be for training an AI model. Looks like you have a forum, so that's why there's tonnes of requests coming your way.
@TsoLIt
@TsoLIt 9 месяцев назад
I've seen this before on my company's website. We host a lot of blog posts for business communications systems. Our site traffic trippled in a span of a week, and pretty sure it was one of these crawlers for AI
@mr.whimsic6902
@mr.whimsic6902 9 месяцев назад
Imagine a timeline where tiktok makes an ai of mattkc
@TuriGamer
@TuriGamer 9 месяцев назад
"Could be training for ai" No
@bonkwonkelchip7569
@bonkwonkelchip7569 9 месяцев назад
@@TuriGamer yes
@Survivalist_Redo
@Survivalist_Redo 9 месяцев назад
@@TuriGamer yes it's likely not training for an AI, could very easily be dataset gathering to then later train an AI though
@Gunbudder
@Gunbudder 9 месяцев назад
oddly enough, a chinese crawler completely tanked my college professor's homework submission website. it was extremely persistent too! i remember the entire system was down for a few days while they worked out exactly how to block it. from what i remember, they eventually just blocked every IP not from the USA lol
@IdentifiantE.S
@IdentifiantE.S 9 месяцев назад
You’re just too strong 😂
@Kyrmana
@Kyrmana 9 месяцев назад
Very sus
@steve_1507
@steve_1507 9 месяцев назад
Digital racism
@GavinFromWeb
@GavinFromWeb 9 месяцев назад
@@steve_1507no, not really. It’s a uni in the US. Unless they let you study online in other countries it shouldn’t be a problem.
@erikkonstas
@erikkonstas 9 месяцев назад
Um, sorry to be a killjoy, but "every IP not from the USA" is not an objective statement... an IP address by itself does not contain information regarding its origin on the planet, the job is usually done by ISPs (of all levels) who hand these addresses out to customers while reporting back to geo-IP database hosts at the same time; if one ISP of a high enough level goes rogue, you're toast...
@TheGreatSteve
@TheGreatSteve 9 месяцев назад
You'd think non-malicious inadvertent DDoS would be the easiest thing for Cloudflare to spot and block? Maybe it's whitelisted?
@capsey_
@capsey_ 9 месяцев назад
I mean, is it though? I am no expert, but I think gradual exceed of bandwidth limit is harder to spot than active DDoS attack. Why do you think it is easier?
@semaja2
@semaja2 9 месяцев назад
CF can block this traffic, you could deploy rules to block the user agent, or pay for their bot features, but this isn’t a DDoS Alternatively adjusting the server code to be more cache friendly would also help
@FurriousFox
@FurriousFox 9 месяцев назад
the caching part would in fact not work, the bytespiders will only scan all urls once, not multiple times, so caching wouldn't solve anything
@monad_tcp
@monad_tcp 9 месяцев назад
probably
@x_x_w_
@x_x_w_ 9 месяцев назад
Increase the cloudflare query caching level
@rudolfpast9243
@rudolfpast9243 9 месяцев назад
i dont know if its still a thing, but back in the day i implemented a spidertrap to all of my websites. easy thing. you need a 1x1 pixel transparent image on every site linked to your trap-script and in your robots.txt you declare the script as disallowed. so good spiders wont go there and bad ones will be blocked...
@Nik.leonard
@Nik.leonard 9 месяцев назад
At least they had the “decency” of using a proper UA. They could (and hope that not will) just use the Chrome UA or worst, weighted random UA’s
@JessicaFEREM
@JessicaFEREM 9 месяцев назад
may be worth it to block the entire country if it's bad enough.
@FlamesRunner
@FlamesRunner 9 месяцев назад
@@JessicaFEREMBlocking countries is moreso a last resort, and shouldn't be considered so long as other options are available. CloudFlare, for instance, offers the ability to selectively block user agents, which would do the trick here.
@saiv46
@saiv46 9 месяцев назад
@@JessicaFEREM What's why many websites just outright block China (and now Russia, but for other reasons)
@undefinedchannel9916
@undefinedchannel9916 9 месяцев назад
@@JessicaFEREMApparently they use different hosts like AWS so the country may show up as the US for some requests.
@HappyGick
@HappyGick 7 месяцев назад
​@@undefinedchannel9916 Enough requests appeared with Singapore/China as location, so he could block those countries and he would be fine.
@ondrejpavlik4210
@ondrejpavlik4210 9 месяцев назад
I'd recommend you set up a simple email notification that the server would send to you if an arbitrary bandwidth threshold you'd consider too high was exceeded. This way you could resolve the issue before any downtime occurs.
@arjix8738
@arjix8738 9 месяцев назад
or just implement a cooldown that returns 429 when the same IP makes too many requests in under a specific amount of time
@jacksoncremean1664
@jacksoncremean1664 9 месяцев назад
@@arjix8738 from what he's shown in the access log that will be tricky to pull off since they are crawling very slowly
@jacksoncremean1664
@jacksoncremean1664 9 месяцев назад
a better idea would be to just set Cloudflare security level to IUAM
@randomblock1_
@randomblock1_ 9 месяцев назад
That's what his first email was about. The second was the notification that it ran out
@JordanPlayz158
@JordanPlayz158 9 месяцев назад
​@@arjix8738true, otherwise the crawler has no reason to assume there is a rate limit (perhaps there are even standard crawler headers to dictate how often they should scrape?)
@SlavTiger
@SlavTiger 9 месяцев назад
I'm just sick of us being expected to foot the bill for something a large corporation does without your consent. These days our data makes us look like little more than dollar signs instead of people to a lot of those tech company execs.
@ZeroRiskAppetite
@ZeroRiskAppetite 9 месяцев назад
Maybe the crawler gets into an infinite loop. Might be the classic 'detecting cycles in an undirected graph' problem.
@erikkonstas
@erikkonstas 9 месяцев назад
Pretty sure that would make more of an exponential curve though, the one I saw in the video was a bit too linear...
@n3ishere
@n3ishere 9 месяцев назад
@@erikkonstas not necessarily, if it got stuck on some pages that in some way link to each other in a loop, it could be linear like that as more spiders go there and get stuck in a repeating loop (source: ive made web spiders before and this was a problem i had to fix with it)
@some1and297
@some1and297 9 месяцев назад
Yeah, I mean is this case I can't imagine bytedance designing a production webcrawler so terrible it can't cache URLs. It might have more to do with unique get request parameters being generated from page links.
@n3ishere
@n3ishere 9 месяцев назад
@@some1and297 unless the loop has enough pages that the cache gets cleared beforehand
@erikkonstas
@erikkonstas 9 месяцев назад
@@some1and297 I don't think crawlers should take into account whatever follows a question mark in the URL... like yes, there might be that one rare case where it doesn't mean what we think it means, but come on, it's just a spider...
@EpicLPer
@EpicLPer 9 месяцев назад
OH MY GOD ARE YOU KIDDING ME... so THIS was the reason my site went down too??? I suddenly couldn't reach my website at around July 11th or something too, and a few minutes later my provider sent me a mail saying they temporarily disabled my site till the figure out what's going on, it also looked like a DDoS in the logs and everything... wow... Now that mystery is solved, thanks! :)
@erikkonstas
@erikkonstas 9 месяцев назад
I'd say check if this was *really* it tho (like "ByteSpider" and everything).
@GeorgeSukFuk
@GeorgeSukFuk 9 месяцев назад
It's the squinty-eyed commies!
@notniko6914
@notniko6914 9 месяцев назад
Sue them for the 10$
@Howtheheckarehandleswit
@Howtheheckarehandleswit 9 месяцев назад
It's ByteDance, unless they do something bad enough to spark an international incident, the CCP will protect them from the consequences of their actions
@new_simsons
@new_simsons 9 месяцев назад
Bruh
@adorable_yangire
@adorable_yangire 9 месяцев назад
​@@new_simsonsBruh translate to English
@new_simsons
@new_simsons 9 месяцев назад
@@adorable_yangire wtf?
@Roach18
@Roach18 9 месяцев назад
​@@adorable_yangireToo bad, I translate to polish
@johnbucki5567
@johnbucki5567 9 месяцев назад
When exceeding bandwidth, VPS's should not be suspended. I believe they should just shut off all network access, so the KVM console would still be accessible for troubleshooting. Also, if the reason is a DDOS attack, it will stop reaching the server and you can check where the traffic is coming from.
@shishsquared
@shishsquared 8 месяцев назад
Yeah it's crazy that there's not an out of band console
@WackoMcGoose
@WackoMcGoose 9 месяцев назад
Why do I get the feeling that Bytedance paid Cloudflare to look the other way and ignore their aggressive crawler shenanigans...
@jcfawerd
@jcfawerd 2 месяца назад
Not surprised, since cloudflare is allowed to operate in china, coincidence? I don’t think so
@blikthepro972
@blikthepro972 9 месяцев назад
knowing how tiktok spies on and tracks phones like crazy, their web crawlers being extremely overkill just to scrape every last bit of data makes sense
@haileymccurry3756
@haileymccurry3756 9 месяцев назад
google et al spies on and tracks phones like crazy and yet thier crawlers are doing fine
@blikthepro972
@blikthepro972 9 месяцев назад
@@haileymccurry3756 true, but google's tracking is still not as bad as tiktok's. how "not as bad" it is i don't know, but that's the vibe i have gotten over the years
@internet_userr
@internet_userr 9 месяцев назад
Bing Chilling
@RAFMnBgaming
@RAFMnBgaming 9 месяцев назад
@@haileymccurry3756 google is certainly more practiced at "keeping their heads down", insofar as that's possible for one of the biggest companies around.
@nicepotato5755
@nicepotato5755 9 месяцев назад
the tiktok thing is mostly propaganda, all major tech companies do this.
@OROO111
@OROO111 9 месяцев назад
Thank you, I have the exactly same problem with my website, I don't even host anything on that website besides the default WordPress website but I had a huge amount of "users" accessing my site
@Zowiezo101
@Zowiezo101 9 месяцев назад
Yeah, I'm very glad to know about this as I have my own website as well and now I am prepared if this would happen to me!
@erikkonstas
@erikkonstas 9 месяцев назад
Were they all "ByteSpider"?
@rkvkydqf
@rkvkydqf 9 месяцев назад
Since auto-regressive language models are so trendy these days, and there might be fears of export bans for using already collected corpus like CommonCrawl, they might be trying to build their own. Maybe some Snapchat-esque annoying "friend" for lonely teens.
@piemadd
@piemadd 9 месяцев назад
Bytespider has been active for years not (4ish last I checked) so this isn't anything new.
@XeZrunner
@XeZrunner 9 месяцев назад
5:46 Have you tried contacting their email address from the UA string? In case it is a legitimate issue, they might want to hear about it.
@U20E0
@U20E0 9 месяцев назад
if they are actually just making a search engine, i doubt they want to waste their own resources like this
@foreskin
@foreskin 9 месяцев назад
I dont mean to be that guy but they probably legitimately dont care since its already been brought up multiple times before matt
@U20E0
@U20E0 9 месяцев назад
@@foreskin probably.
@FirstLast-gw5mg
@FirstLast-gw5mg 9 месяцев назад
If they ignore robots.txt I don't think it's likely that they care much about complaining emails.
@XeZrunner
@XeZrunner 9 месяцев назад
@@foreskin In that case, I agree with blocking them in this scenario.
@MagicalPhi
@MagicalPhi 9 месяцев назад
Now to find out if it was Tik or Tok who was responsible for this.
@Junimeek
@Junimeek 9 месяцев назад
or their lost cousin Tak
@SomeRandomPiggo
@SomeRandomPiggo 9 месяцев назад
Definitely Tek
@TheRedOwl
@TheRedOwl 9 месяцев назад
I'm pretty sure it was Tuk
@havesomerespectandspoilthe5880
@havesomerespectandspoilthe5880 9 месяцев назад
Or Tyk, if they ever come out of exile
@Zowiezo101
@Zowiezo101 9 месяцев назад
Don't forget Tyk and Tøk
@CoolJosh3k
@CoolJosh3k 9 месяцев назад
Would be nice is certain user agents, like web crawlers, had a default limit on how often they could access the site. While obvious malicious crawlers could get around this, reputable ones that wish to stay whitelisted by default would obey.
@notalostnumber8660
@notalostnumber8660 9 месяцев назад
You can make php scripts to rate limit bot crawlers based on user agent In fact, you can try to Denial-of-Service them by using a GZip bomb or PNG/GIF/WebP bomb, since those can look legitimate, but end up causing havok for a short while
@erikkonstas
@erikkonstas 9 месяцев назад
"reputable ones that wish to stay whitelisted by default would obey" nah, malicious would just become the new reputable.
@Zettymaster
@Zettymaster 9 месяцев назад
UAs are super easy to spoof (since they are supplied by the software that SENDS the request) so that would only force them to crawl using spoofed UAs, which they allegedly already do.
@CoolJosh3k
@CoolJosh3k 9 месяцев назад
@@Zettymaster Oh. Then I stand corrected.
@someguy4915
@someguy4915 9 месяцев назад
This is in part what robots.txt is for but as the video shows, ByteSpider does not obey robots.txt... Used to be, for crawlers to not get blocked by everyone, they had to obey robots.txt, seems like ByteDance didn't get the memo...
@chaosmagican
@chaosmagican 9 месяцев назад
10 bucks for 100GB? Jeez, I'm paying 1€ per TB over here, that is just straight up robbery
@sesad5035
@sesad5035 9 месяцев назад
10 aussie dollars.
@thewhitefalcon8539
@thewhitefalcon8539 4 месяца назад
Sounds like cloud. I get 1€ per TB too.
@sosman64
@sosman64 2 месяца назад
​@@sesad5035then its even more robbery
@alex13902
@alex13902 Месяц назад
​@@thewhitefalcon8539for bandwidth? Or storage. The two are very very different
@MyHandleIsAplaceholder
@MyHandleIsAplaceholder 9 месяцев назад
I believe Bytedance wants to create a new Chinese web browser to compete with the blocked ones
@zyxwv
@zyxwv 9 месяцев назад
Another? What about TouTiao?
@IdentifiantE.S
@IdentifiantE.S 9 месяцев назад
@@zyxwvWhat is Tatiao ?
@zyxwv
@zyxwv 9 месяцев назад
@@IdentifiantE.S A Chinese Web Browser by Bytedance.
@hi12167pies
@hi12167pies 9 месяцев назад
make a browser to compete with browsers chinese people can't even access 💀
@f3rny_66
@f3rny_66 9 месяцев назад
not a browser, but a search engine, I had the same bot and also PetalBot, from the huawei people and their search engine petal crawling client servers. But it can be filtered tho, just needs configuration. bytespyder is banned by default in AWS iirc
@shadowtheimpure
@shadowtheimpure 9 месяцев назад
The old adage applies here: Never attribute to malice what can be easily attributed to incompetence.
@GreyMaria
@GreyMaria 9 месяцев назад
Found the ByteDance employee
@shadowtheimpure
@shadowtheimpure 9 месяцев назад
@@GreyMariaWhat? I'm literally calling them stupid rather than malicious. Their web crawlers are not malicious, just very poorly coded.
@itsTyrion
@itsTyrion 9 месяцев назад
@@GreyMaria it's literally just "Hanlon's razor"
@MrTriple3D
@MrTriple3D 9 месяцев назад
evil people make it look like incompentence when it really is malice.
@erwannthietart3602
@erwannthietart3602 9 месяцев назад
​@@MrTriple3Dthe problem is, if we apply this idea to everytime incompetance looks evil, you may unjustly treat something actually incompetant, which can be just as useful tk the "evil people" as hiding behind a veil of incompetance
@CoolJosh3k
@CoolJosh3k 9 месяцев назад
Assuming it really was Byte Dance, I expect this was not intended behaviour. It would cost them bandwidth too, though maybe so little in comparison that it just looks like regular background noise. An earlier alert would be been very useful here.
@Thesnugglebottom
@Thesnugglebottom 9 месяцев назад
They would be doing this tho thousands if not millions of sites though so the bandwidth in their side would be gigantic
@f3rny_66
@f3rny_66 9 месяцев назад
is the cost of bussines, just like google crawls the web, the issue with bytespyder and other chinese bots is that ignores robots.txt and other shady stuff
@spykillergames8402
@spykillergames8402 9 месяцев назад
it probably was....as i reckon they are using iamges from his site to train an AI model...modified webcrawlers can do that thing
@CoolJosh3k
@CoolJosh3k 9 месяцев назад
@@Thesnugglebottom I figure maybe it is so rare that only very few sites would have the issue.
@JordanPlayz158
@JordanPlayz158 9 месяцев назад
​@@Thesnugglebottombut if they know they are using a ton, they may opt for servers with no bandwidth limit
@SeraphimKnight
@SeraphimKnight 9 месяцев назад
Good thing this is happening in the age of DDOS-prevention. Imagine getting fucked by a spiderbot back in the days when you'd host your website on your home network and your ISP charged you by data usage.
@airnith
@airnith 9 месяцев назад
this is very useful information. I been thinking about putting together a website for some friends, so now I know that I might need to look out for this.
@diegopescia9602
@diegopescia9602 9 месяцев назад
Luckily your site has a static limit with a fixed price. Imagine the costs if it were an uncapped pay-as-you-go service like most cloud services
@csbauder
@csbauder 9 месяцев назад
Really interesting stuff. I've considered making a website before, but I wasn't aware of stuff like this. Thanks for the heads-up!
@thebunsenburner
@thebunsenburner 9 месяцев назад
That's a wild ride for sure.
@kur0kiba
@kur0kiba 9 месяцев назад
i thought for sure that it would be the same as a friend of mine had about 10 or more years ago. he kept a travel log website where he uploaded photos to because he was a nerd who liked to travel. he eventually visited the original Starbucks and uploaded a picture of the original logo. a more popular website used the photo but they didn't download and host the photo themselves. instead they just linked to the photo so when you loaded up the more popular website it would give your browser a link to where the photo was located on my buddies website so it could display it. his traffic skyrocketed. i believe it has a name for when people do this but i don't know it. he did find a fix for it where any website linking to any photo on his website like that would then be blocked.
@erikkonstas
@erikkonstas 9 месяцев назад
I've seen the term "hotlinking" for that, and yep, that's exactly why it's frowned upon.
@DarkGob
@DarkGob 9 месяцев назад
It's called hotlinking, and has been a discouraged practice for decades.
@HappyTinfoilCat
@HappyTinfoilCat 9 месяцев назад
That's when you swap out the photo for something like goatse
@thewhitefalcon8539
@thewhitefalcon8539 4 месяца назад
It's called hotlinking and it's traditional to change the picture to pron. That could be illegal in some countries though.
@mrscrewu1199
@mrscrewu1199 9 месяцев назад
Feel like cloud flare should detect this sort of activity and automatically block the user agent. At least temporarily too see if it stops or continues. Instead of suspending the client.
@Mark_Rober
@Mark_Rober 9 месяцев назад
"and i thought charli d'amelio was the worst thing bytedance had done to me" The description is the best part of this video XD
@cptpotatoface386
@cptpotatoface386 9 месяцев назад
This reminds me when i had a minecraft server running for me and my friends. Woke up one day and went to check on it to see the that the server command window was full or disconnected messages. Did some stuff like editing the hosts file to make it redirect the IP back to itself or simular (prob did nothing) but eventually just went with running malwarebytes since it blocks suspicious requests
@jonmayer
@jonmayer 9 месяцев назад
I'm interested if you could get a response or not by emailing the support. Probably not, but it would be funny to see their reply.
@8ullfrog
@8ullfrog 9 месяцев назад
It's a shame you can't invoice them for the bytef**king they did.
@LilacMonarch
@LilacMonarch 7 месяцев назад
I mean, you can still try. Just send an invoice and see if they'll pay it lol
@DorAntCr
@DorAntCr 9 месяцев назад
It's always a good day when Matt uploads a new video. And rants about a random company as well.
@nj5374
@nj5374 9 месяцев назад
Surely as this becomes more common cloudflare may begin to implement a catch for similar overzealous crawlers?
@robyc9545
@robyc9545 9 месяцев назад
Kinda irony that your sub is 404k now. Stay safe out there
@imaxvi
@imaxvi 9 месяцев назад
“im not that popular” hits hard 😭
@TravellingTARDIS
@TravellingTARDIS 9 месяцев назад
funny you use the spider-man 3 in that clip about bytespider because im fairly certain the font from the bytespider logo is that same one from the sam raimi spider-man films lmao
@ToadyEN
@ToadyEN 9 месяцев назад
Worth noting that Twitter / X and lots of other sites have stopped bots from crawling them now, something todo with them training their AI with content from their sites.
@erikkonstas
@erikkonstas 9 месяцев назад
Except that I believe Twitter's case has become common knowledge to a wider audience, because, well, it did hit actual people with rate limits often too.
@Sammysapphira
@Sammysapphira 9 месяцев назад
Facebook and RU-vid are obviously rate limiting. I get the same posts nonstop on Facebook for literal weeks no matter how many times I refresh or even if I open it on a different device. A lot of people are getting the same behavior. Twitter was just rhe only ones that were public about it.
@official-obama
@official-obama 9 месяцев назад
@@Sammysapphira it would go "oh no! something went wrong and we can't tell you" instead of doing that. it might be caching or nobody's posting anything
@zyxwv
@zyxwv 9 месяцев назад
I would find it strange for them to be making a new SE. I believe TouTiao would not really need a remake, saying as it already has over 100 million users daily
@rkvkydqf
@rkvkydqf 9 месяцев назад
Since auto-regressive language models are so trendy these days, and there might be fears of export bans for using already collected corpus like CommonCrawl, they might be trying to build their own. Maybe some Snapchat-esque annoying "friend" for lonely teens.
@zyxwv
@zyxwv 9 месяцев назад
@@rkvkydqf That does make a lot of sense. However, googling the issue in the video (ByteSpider) shows that this has been going on for a long time. I saw a Stack Overflow post from 2019.
@___aZa___
@___aZa___ 9 месяцев назад
always happy to see you upload :)
@General12th
@General12th 9 месяцев назад
Hi Matt! I love storytime with Matt! You're really fun to listen to.
@wesleyfournier6278
@wesleyfournier6278 9 месяцев назад
cheers on the psa, the more people that share knowledge like this in unbiased ways like this the safer we can all be on the interwebs :)
@niepytajdl
@niepytajdl 9 месяцев назад
truly a chinese moment
@matthewforan6397
@matthewforan6397 9 месяцев назад
I've also noticed a ton of traffic from Singapore recently, and my domain just has the default parking page!
@realcrashie
@realcrashie 9 месяцев назад
Not the type of MattKC video we expected, but the one we deserved. Always happy to see you have uploaded, no matter the content ❤
@pcislocked
@pcislocked 9 месяцев назад
ur uncached traffic ratio is really low tbh, maybe also take a look at that to take more load from your webserver.
@TheFinnishTechie
@TheFinnishTechie 9 месяцев назад
You KNOW it’s going to be a good day when MattKC posts a video. Keep up the good work man
@kennethbeal
@kennethbeal 9 месяцев назад
Thank you, excellent analysis!
@grubdotwebsite
@grubdotwebsite 9 месяцев назад
ByteSpider's logo using the Raimi Spider-Man font is incredibly silly
@gluttonousmaximus9048
@gluttonousmaximus9048 9 месяцев назад
...And several years ago here I simply failed to see several of the classic cartoon blogs simply because I was detected using VPN. Tom Scott has warned us well. The internet is a cesspool of patchwork offense and defense, shady strategies and clumsy turf war.
@wchorski
@wchorski 9 месяцев назад
Please more content like this. I host websites and services and this helps me keep up on new threats and how to deal with them
@v1mja
@v1mja 8 месяцев назад
I work at a cloud provider. We have a wide band of customers and I'm afraid to say that we have seen all sorts of issues with search engine bots. Not just from fringe ones either. Even the large ones can cause weird issues. The problems we have observed include big spikes in PHP-FPM processes, tens gigabytes of cache being generated by weird access patterns and even extremely high database loads... Funny how that goes sometimes.
@JohnLasseter-ct5in
@JohnLasseter-ct5in 9 месяцев назад
Old man yells at cloud
@Pesthuf
@Pesthuf 9 месяцев назад
I hope they won't stop using that user agent string, or else you've got an issue. It's weird how they give you that string, but do everything else in their power to stop you from blocking their crawler.
@grass6317
@grass6317 9 месяцев назад
3:11 who tf uses android 5.0
@rockpie
@rockpie 4 месяца назад
People who don't want to upgrade
@Daniel-hz6pt
@Daniel-hz6pt 4 месяца назад
They’re almost certainly collecting mass training data for a new AI model
@RetroJack
@RetroJack 9 месяцев назад
Handy to know - thanks for the heads-up!
@Jergling
@Jergling 9 месяцев назад
The fact that the requests were coming from seemingly random Singapore IPs still suggests a botnet. I wonder if there's a Bytedance app doing ill-conceived distributed computing in the background. You wouldn't need any kind of app permissions to browse the web, nor would any one user notice it the way crypto leech apps tend to be noticed.
@d9zirable
@d9zirable 9 месяцев назад
nah singapore is just a colony of china
@zwz.zdenek
@zwz.zdenek 2 месяца назад
They are not random at all, they are ranges owned by cloud services.
@JulianR2JG
@JulianR2JG 9 месяцев назад
New video from Mr. LEGO Island
@LethalBubbles
@LethalBubbles 9 месяцев назад
gotta love their use of the spider-man movie font
@donutsndcoffee
@donutsndcoffee 9 месяцев назад
Friggin fascinating mate
@JTCF
@JTCF 9 месяцев назад
That was a nice reminder to check my home server nginx access logs. Thank god I set it up correctly before opening up to the world.
@mandarina1367
@mandarina1367 9 месяцев назад
set it up in a way to avoid this from happening?
@Serverfrog
@Serverfrog 9 месяцев назад
fail2ban with BadBots Rule should also do the job ;) then it would already block the IP Address temporarily in iptables (or other Firewall thing that fail2ban was configured), which reduces more the Traffic they will produce
@Napert
@Napert 8 месяцев назад
giving them the benefit of the doubt is like giving a serial killer a benefit of the doubt it's just moronic
@Space_Reptile
@Space_Reptile 9 месяцев назад
it seems to be grabbing every single image file on your forum block that thing asap
@JustPyroYT
@JustPyroYT 9 месяцев назад
How's the Lego island decompilation doing?
@ruairim2283
@ruairim2283 9 месяцев назад
Openly showing this is the best thing you can do. Even if you can't prove this is malicious, you're still providing info for the Internet. Maybe more OCD users will get to it. Who knows?
@Kyrmana
@Kyrmana 9 месяцев назад
Happy 404k subs! 😄
@monkeypox21
@monkeypox21 9 месяцев назад
NEW MATTKC FINALLY
@autiboy08
@autiboy08 9 месяцев назад
Hi Matt, love the content you make! Looking forward to this watch!
@jps915
@jps915 9 месяцев назад
nn
@johnsmith34
@johnsmith34 9 месяцев назад
Another thing to note is that your site doesn't have a robots.txt file. I can't say if it matters though.
@CaptainGibbons
@CaptainGibbons 9 месяцев назад
The screenshot he showed said they didn't respect it anyways.
@SoLemerald
@SoLemerald 9 месяцев назад
Its funny that he made the spiderman reference because it uses the Toby Maguire spiderman font
@bananapl0
@bananapl0 9 месяцев назад
The reveal on stream was epic.
@Boxuga
@Boxuga 9 месяцев назад
All the crazy tech corporations been in the news recently LTT, now Bytedance again its crazy and also keep up the good work MattKC
@nunyabiznesse6917
@nunyabiznesse6917 9 месяцев назад
They always have been on the news though
@Tigermoto
@Tigermoto 9 месяцев назад
All two? Have i missed something?
@Rainmotorsports
@Rainmotorsports 9 месяцев назад
Didn't see the email contents on mobile but if your provider wouldn't spin the VPS up with the external IP blocked so you could access it through a virtual console id probably ditch them lol.
@burp2019
@burp2019 9 месяцев назад
the VPS provider likely wouldn't know what was going on and he only got to it after they locked it
@erikkonstas
@erikkonstas 9 месяцев назад
That could very well open it up to abuse tho... no, the client wouldn't earn anything from the abuse, but if the client is evil-minded and delusional they can wreak havoc like that.
@Rainmotorsports
@Rainmotorsports 9 месяцев назад
@@burp2019 You aren't saying anything against this though. Spinning the server up with no connection to the outside world allows the customer to access their logs. Virtual console is a method to replace the crash cart you would use if you were inside the data center.
@Rainmotorsports
@Rainmotorsports 9 месяцев назад
@@erikkonstas How? All you are allowing a customer to do is see their logs and make config changes before deciding what to do. Selling them more bandwidth first is in poor faith and might not last long enough to solve the issue. With absolutely no connection to the outside world except a virtualized KB/VGA which by the way is soo much worse than using an SSH client there isn't much you can do. You won't be able to install software thats not on the machine, you wont be able to backup and retrieve your files. You can enter text and take screenshots thats about it.
@erikkonstas
@erikkonstas 9 месяцев назад
@@Rainmotorsports Is that actually very common...??? I was thinking the SSH or similar way, where you could just have another VPS with your credentials stuck to it, but which is open to the whole world, and totally not what is intended to be allowed.
@Sharan25
@Sharan25 9 месяцев назад
Matt KC is back fr
@mos6581com
@mos6581com 9 месяцев назад
These guys are a pain in the ass to block, you can't even just blackhole the entire bytedance IP range because the gits operate the crawler from other AS numbers. They're constantly in my home servers logs.
@Geomedge
@Geomedge 9 месяцев назад
New Matt KC video 🎉
@donatj
@donatj 9 месяцев назад
Have you sent an email to the feedback email address in the user agent string?
@BluesM18A1
@BluesM18A1 9 месяцев назад
Learned quite a lot of new things today. I'll have this in mind in case my website gets any run-ins with unwanted attention
@supernenechi
@supernenechi 9 месяцев назад
My mail server logs were an absolute mess before. I then selected a bunch of countries that I don't care about and blocked them in my router. And suddenly? Silence. Peace and tranquility
@chrisakaschulbus4903
@chrisakaschulbus4903 9 месяцев назад
I was a webmaster once. Then i realised that i want to get older than 30 and stopped.
@brycem8161
@brycem8161 9 месяцев назад
That bad?
@ThePenisMan
@ThePenisMan 9 месяцев назад
So a tech company with way too many resources incompetently played with tech and now everyone else has to pay for it
@brunothedev
@brunothedev 9 месяцев назад
4:03 Where i can apply?
@yukimoe
@yukimoe 9 месяцев назад
I remember it happened to me years ago with another one of these Chinese crawlers, I think it was Yandex or something Those guys never learn
@bosch5303
@bosch5303 9 месяцев назад
Yoo new mattkc viv
@MarcoGPUtuber
@MarcoGPUtuber 9 месяцев назад
Never watched Tiktok. Never will. Definitely will not now.
@Zair_Ahmed_1313
@Zair_Ahmed_1313 9 месяцев назад
Same
@griffonboi
@griffonboi 9 месяцев назад
Your loss I guess I watch lots of tech related content on there.
@MarcoGPUtuber
@MarcoGPUtuber 9 месяцев назад
​@@griffonboi Strange. I don't feel any loss.
@Iaotle
@Iaotle 9 месяцев назад
not sure how the company having a crawler makes you more confident about not watching an unrelated shorts platform
@joshfromsmosh3352d
@joshfromsmosh3352d 9 месяцев назад
​@@Iaotlewhy do they even need one in the first place then?
@paulinet68
@paulinet68 9 месяцев назад
not people commenting on the video before even watching it assuming this is about people who publish content on tiktok and immediately jumping the gun, noo, that could never happen to a video that's actually about a web crawler
@Junimeek
@Junimeek 9 месяцев назад
considering that tiktok creators are infamous for committing intentionally malicious acts completely unrestricted, i think that makes perfect sense personally
@Light_Dies_07
@Light_Dies_07 9 месяцев назад
NEW MATTKC UPLOAD!!!!!!! AND I FUCKING **MISSED** IT BC I WAS ASLEEP ALL DAY.... 😭
@slipperynickels
@slipperynickels 6 месяцев назад
wow. being unable to view my own access logs because my website’s bandwidth allocation has run out would be a MASSIVE dealbreaker for me. that is ridiculous.
@pdlbackup
@pdlbackup 9 месяцев назад
I love this shorter type of informational video! It took me a bit to notice the lack of music, which might be why something felt off to me. Also not seeing you in the video as much as usual felt different. Totally not against you experimenting with it though, cause I can imagine that it would save some time on making the video and for me I don't think the video really suffered too much from it.
@muffinking3149
@muffinking3149 9 месяцев назад
i call this a dub
@RhodderzX
@RhodderzX 9 месяцев назад
Had few run-ins with this as well, it even ignored robots.txt majority of the time as well so added few rules on CF to just drop it.
@Jayenkai
@Jayenkai 9 месяцев назад
Yep, I had to block them last week. It's an evil little runt.
@7isAnOddNumber
@7isAnOddNumber 9 месяцев назад
Oh hey it’s the Lego island guy
Далее
Installing Viruses on Windows 98 - MattKC
9:46
Просмотров 337 тыс.
The time I got suspended from school...
14:12
Просмотров 686 тыс.
How to corrupt PS2 Games? - Tutorial
6:25
Просмотров 2,1 тыс.
Introduction to Python
1:43
Просмотров 189
Buying a $1 Wii from Japan
14:38
Просмотров 5 млн
North Korea made a Flash game... and I played it
21:37
Просмотров 612 тыс.
So I started decompiling LEGO Island...
21:19
Просмотров 866 тыс.
My Silent Xbox Has A Problem...
12:13
Просмотров 376 тыс.
Buying a $3 Gamecube from Japan
8:52
Просмотров 1 млн
Which level took the longest??
0:34
Просмотров 3,9 млн
Любимые игры | меллстрой
0:10
Просмотров 870 тыс.