it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.
You can use a chrome like TLS config to not get blocked by cloud flare in a lot of cases, using a browser for scraping isn’t viable when tracking about scanning the internet.
very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank
This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own
So you’re telling me I can access restricted data by telling it to, basically, ignore restrictions??? I Have been calling myself dev, admin, ownr, root in vain for far too long
Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf
is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it? or was it dust a peak or somthing you made cuz ( you where bored or smt )