This is awesome, thank you! I have been trying to use this video to help me scrape certain aspects of flippa and apparently the commands I'm giving ChatGPT aren't correct. lol I want to scrape flippa every 5 minutes after the hour and I want it to scrape the site for site name, type, industry, monetization, age, net profit, and the site description. Any help would be highly appreciated. Thank you!
Thanks for posting the video. Is there any way to do this for a site that requires a sign in? I have a sign in for a site and want to use a scraping tool to scrape all of the data from it. The data is behind a click, i.e. you have to click the listing in the contacts database to see the email address. Any idea how to do this?
So basically you tell ChatGPT all the things that you'd tell any other web scraping tool as well. How genius! ChatGPT is now as intelligent as Scrapinghub was 10 years ago. 😅
Wow, this is really amazing, thank you for showing us how to tell ChatGPT to scrape the web page to get the content we need, this is a valuable lesson, than you so much 🙏
Due to the many questions and comments about ChatGPT and OpenAI Playground, here are some notes. 1. Yes, ChatGPT and OpenAI Playground are not exactly the same. That said, if you use the prompts I created in this video on ChatGPT, it'd generate code that scrapes the website just like with Playground. 2. Why did I use Playground instead of ChatGPT? Speed. I only wanted to get the code and not the bla bla bla you get with ChatGPT. I like the explanation ChatGPT gives, but, when recording the video it's a bit annoying. 3. Some people say Playground isn't free. At least, it was free for me at the moment I recorded the video. I never gave credit card information and it's still working fine for me today. 4. ChatGPT/Playground currently generates code for Selenium 3 and not for the last version (Selenium 4), so keep that in mind when using its generated code.
The SOME REASON is it was trained on data on the internet, there is more data about the older version than the new one, have you tried asking it specifically to write for selenium 4?
Twitter loads only so many tweets at once, so when you scroll too far, it will remove the objects from HTML on the top. I learned it when I was searching for something with ctrl+f in the browser.
Hey brother can you tell me why i am getting only [ ] this in output whenever I try to scrap data of some websites? Does this happen due to they use javascript? Or it happens because javascript is not supported by beautiful soup? So is selenium best for all types of javascript, css and html websites? Also please make a video or provide your email i am facing issues in installation of chrome webdriver.
Got error at Playground OpenAI You've reached your usage limit. See your usage dashboard and billing settings for more details. You got to pay to play :(
It's a very simple and understandable educational video, thank you... I have a few questions. Is it possible to scan current ASINs (product identification numbers used by Amazon)? Of course, I need to enter location information to fetch data from the Amazon website. My other question is, how can I select the current profiles in my Chrome browser while using chromedriver?
Everyone says the name as Amazon at the start but the tutorial is about a different website scraping. The real part is you can not scrape Amazon content easily and if you go by this same method it'll fail as the DOM element won't be available and no one is telling you this.
Great tutuorial. Thank you. Question, i am trying to create a chrome extension to scrape products in every category that have no videos. Will this be possible using this same approach? TIA
Thankssssssssssssssssssssssssssssssssssssssssssssssssssssss, for the first time I scraped a website by writing the exact code and also understood it completely, what was happening inside it
Chatgpt is gpt3 3.5 fine tuned to have discourse, "playground" is all of their models which are all based off go t3.5 at the moment and fine tune to do specific things like codex writes code, davinci does embedding and generation of tex
will this work on gambling websites that provide live stats? could you code it to constantly check the website for updated info? or would you need auth api for that.
Great video, could you do an example to scrape facebook pages and get post? I was trying this the other day and it seem there no way to scrape public data from fb
I’ve never tried to scrape fb. What’s the issue? There are websites like LinkedIn though that have strong anti scraper systems that make web scraper very challenging. As I mentioned in another comment, this AI tool gives you the code, then you have to come up with something like proxy rotation to deal with anti scraping systems
You need to have the Chromedriver version that matches your Chrome Browser release version. So, as your Chrome browser updates, you'll need up update your Chromedriver.
I’m tryna make a program where I can give a python script a bunch of top 20-50 ad agency website list and then it gets each of those websites and then scrapes for all their email so I can contact them offering my video editing services
The ai will replace programmers bro, they say. Just ask on chat overflow how to write English question ;) I didn't intend to be mean, just found it funny ;)
whats more amazing is guess how much the Ai has grown in the last 4 weeks....now it knows what every hustler and hacker wants...feeds it back to: Google, Facebook, Insta, and the FBI and hey presto, secured loop. All for a fee sure, MS servers don't run themselves.
It loses a bit of usefulness when you have to go look at the source and figure out for the ai where and how to scrape. Is there no ai that can "search" the html structure for the content you want?
I have never uderstood how bots work. Like many people I have dabbled with code but haven't needed to go further. I didn't realise that the classes and IDs etc were to help others acess your ste data. It all makes so much more sense. Thanks.👍
I just asked the same thing two months after your question. I guess he either missed your question or I just won't get an answer. Your question is all I really care about :)
@@KCM25NJL The low-level code I mention, means the actual codes, like python, c or c++. The higher level code I mention means the natural language which explain the fundamental logic of the program. But I do get what you mean.
This is a high-level code. Languages such as Python or Ruby are amongst the least verbose. Create a code snippet once and use it later. What is the point writing essays in English? ;) Great for learning, though.
Isn't what you're using (playground) running GPT-3, not chatGPT? You said yourself that chatGPT doesn't have Web access (which is correct). But GPT-3 does. You keep saying chatGPT in your video (and title), which doesn't make sense. Or is playground a special version of chatGPT (aka GPT-3.5) that has Web access?
You’re just pseudo coding at that point. You’ll need to know how the code will look to ask the question in the first place. I don’t really see a point in doing this. What would be useful is dropping in html code and giving examples of what you want scraped
I see zero point in using Chat GPT as merely a direct, 1-to-1 translator from natural language to code. If I need to write the instructions at the same level of abstraction as the code, I might as well just write the code. That's largely why programming languages exist! They are specialized languages that, unlike natural languages, can precisely and succinctly express low-level technical requirements in a human readable form.
Es para ahorrar tiempo. Personalmente no he tocado HTML hace años, como electronico no me era necesario hasta el momento. Podria retomar los cursos, actualizarme, pero es otro tiempo adicional. Esto estrecha esos tiempos. Y de que maneras, refresca la memoria en los diversos lenguajes.
Much like the other comment by a Lance below? This is NOT ChatGPT! You're misleading your Viewers. This is GPT-3 using the Davinci Text 3 option. Maybe you're going for more Views by saying "ChatGPT" ? But you are going to Confuse those that are Not too familiar with this technology. However, having said All of the above? Just wanted to make sure everyone knows. All in all.... This is an Excellent tutorial !!
I shouldn't have used the terms ChatGPT/Playground interchangeably. That said, the prompts I used generated the same code in both ChatGPT/Playground. I just decided to use the latter because it generated the code way faster than ChatGPT without wasting time explaining the code.
Waittt is the ai actually going to that link and looking at the html of the page? No right? Then how is it knowing from the url how to scrape the website?
In the case of books.to.scrape, it simple takes a script already built from the internet. In case on Amazon/Twitter, we're simply giving instructions on how to build the scraper. The tool never gets to see the HTML (at least not in this tutorial)
I dint see a use case for it in the real world. When you first need to dive into the source code to find the right divs, or scc, what is the point using ai. The code itself can be written in 5 mins manually and probably in 1 min with github copilot.
Yeah, I shouldn't have used the terms interchangeably. That said, you'd generate the same code with both ChatGPT/Playground. I know because I tried both. I've just made the video with Playground because it was faster.
Also "Playground chatGPT" is not chatGPT. It's another language model, that's called GPT-3. ChatGPT is a GPT-3 based model that can communicate in conversations.
Yo tenia en mente, aplicar un generador de resumenes. Imagina para un chat cualquiera, poder pedir generar un resumen de ## tiempo atras. Supongamos una conversacion que duro 30 minutos, para no leer todo, poder analizarla primero en forma de resumen, poder extraer las palabras claves, y determinar si resulta relevante revisar el contenido, o no... Asi omito el tiempo de leer comentarios de relleno. Felicitaciones, Saludos, repeticion de contenido, cosas fuera de contexto... Pero el primer paso es extraer la informacion de una pagina, luego procesarla con tecnologias NLP [GPT3], y obtenemos resultados que nos ahorran mucho tiempo....