Great video John, as usual! I started using playwirght a few months ago and prefer it to selenium or helium, it is much faster, way less error prone and it is being updated constantly.
You have the absolute best videos on RU-vid!!! I'm resisting the urge to type in all caps right now lol but seriously, this video just helped me finish a $200 project!! Thanks again for all you do for the community 🙏🙏🙏
After seeing many videos and trials to do web scraping on secured websites, this has finally brought the solution. Thank you so much! Attention on cookies: Playwright acts as a new/clean browser. So opening a website from the script is like visiting it for the first time. I discovered that the website I wanted to scrape, started with a cookie banner that you have to click. So before filling in the username and password, I had to do a page.click('button#btn-accept-cookies')
Thanks again for intro very useful! Had a quick question what could be the cause of browser failing to launch despite using headless=False please? For context my code below in PyCharm Windows What I've tried to do A)I've tried using webkit, chromium and firefox to check if it's a browser issue B)Checked the code, but I get the message in PyCharm "Process finished with exit code 0" which implies nothing wrong with the code, C)I've search Stack Overflow and Playwright also without success for a solution Thanks in advance Script Below I'm using in Pycharm ================================================== from playwright.sync_api import sync_playwright def main(): with sync_playwright() as p: browser = p.webkit.launch(headless=False) page = browser.new_page() page.goto("www.google.com/") page.wait_for_timeout(5000) =====================================================
Hi John. Just wanted to say thank you and please keep making these videos. I have been studying Data Analytics online and just got a job offer for analytics position. Even thought it does not directly require programming skills your helped me to stay motivated, opened up opportunities for automation and inspired to do some interesting projects. Thanks again and keep it up!
Great content as always john. Can you cover about record command codegen in playwright? python -m playwright codegen --help Usage: index codegen [options] [url] -o: save the recorded script to a file --target: Specifies the language for generating scripts, there are two types of JS and Python, the default is Python -b: Specify the browser driver Example : python -m playwright codegen --target python -o 'main.py' -b chromium www.youtube.com
Guys, please help me! Let's take a simple scenario, open a browser, go to google, search for 'word', press search, and the scripts ends. In selenium, after the search, the browser is still open and usable, i can browse through the search results. In playwright, the browser closes, even if i did not us browser.close(). How can I keep my browser open and analyze the search results of my google query?
Hey! It’s being run in the context manager which automatically closes the browser when the code is finished. In the docs there is a bit about running it without the context manager this is what you want
@@MancePax no problem, look here: playwright.dev/python/docs/intro and go to the section "Interactive mode (REPL)". this code will work in your code editor too and you should be able to take it from there
I've been religiously watching your videos for the last week or so. Such a great source of information, you're a great teacher, very direct and to the point! I've succesfully set up a project scraping data from betting sites to find arbitrage opportunities - mainly via hidden API's. But some pesky websites seem to restrict their APIs - hoping to solve this with playwright :).
This video is also awesome. Thanks for sharing your knowledge with us. But I got the following error. Can you please help me for solving the error? File "D:\Project\My_Py\untitled2.py", line 10, in with sync_playwright() as p: File "C:\Users\user\Anaconda3\lib\site-packages\playwright\sync_api\_context_manager.py", line 45, in __enter__ raise Error( Error: It looks like you are using Playwright Sync API inside the asyncio loop. Please use the Async API instead.
Mate you're the best for this stuff. Your deadpan style also makes me laugh. I bet you have a wicked sense of humour. Remind me of the russians. Dry as anything, and wizards with code!
10 out of10 again! Haven't installed Playwright yet and wondered how you found it for speed vs Selenium? In an earlier reply you mention that you prefer PyCharm now over VS Code. Will the community version work for most or do we need the Pro version?
I sure do. If I need to run a headless browser I use playwright the other tools like httpx and selectolax do different things and are my go to for making requests and parsing html
Thanks for the tutorial... I think the demo site has changed though, the last part of the script does not work. In particular the html output of page.inner_html('#content') looks nothing like the demo and the subsequent steps do not return the results in the tutoral.
thanks. unforuntately this is often the case, things change- this is why i try to demo the methods rather than specific sites. but it just furthers the need for me to build my own web scraping test site!
@@JohnWatsonRooney I have the same issue. Although it doesn't make sense because I can see the h2 tags in the html enclosed in online. It seems like playwright is ignoring the h2 tags. When I print(html) after the line html = page.inner_html('#content'), the result in the editor does not show any h2 tags. It doesn't come close to the section of code I see online.
Traceback (most recent call last): File "c:\Users\Sellitrage\Desktop\playwright test1\main.py", line 1, in from playwright.sync_api import sync_playwright File "C:\Users\Sellitrage\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright\sync_api\__init__.py", line 25, in facing this error while running dont know how to solve this....please guide me.
@user-ul6tf3dp9v il y a 1 seconde no way to succeed the second test. got (ERROR tests/test_search.py::test_basic_duckduckgo_search[chromium] - playwright._impl._api_types.Error: Executable doesn't exist at C:\Users\flosr\AppData\Local\ms-playwright\chromium-1071\chrome-win\chrome.exe) and solutions found on internet to upgrade the robot thing are ineffective. Playwright is useless
Hii..Mr. John, I'm working on a playwright Python project where I want to print the response.json() of a particular response. Kindly make a video on the request-response in the playwright.
Thanks for your great video. I have 2 problems can you help me about it: 1. Use playwright to crawl website. But after click on button --> ajax call --> how i can reload data from ajax response. 2. After use playwright to login, can we use scrapy to send new request and crawl data.
I have faced this error while applying the code, any suggestion? Looks like Playwright was just installed or updated. ║ ║ Please run the following command to download new browsers
I am logging to a site with 2F authentication. First there is a captcha and then after keying captcha there is an OTP. How do we code to accept user input of captcha and OTP (selenium or playwright)? Help will.be appreciated
Thanks, Koushig. I have a question: When I log into google I get the following message: "This browser or app may not be secure" error when trying to sign in with Google on desktop apps
I'm wondering how we could scrape multiple pages, I've watched the crawl and follow links with scrapy video but I don't know if FormRequest is the way to go instead of playwright.
Thank you very much for the awesome tutorials. I have tested the code and I got a result but after the result, I got such a message RuntimeError: Event loop is closed
Hey John, I was precisely looking up for a technique like this for an upcoming project I'm aligned to where we need to login inside one of our company's internal web tool and scrape the leads generation table that appears post loggin in, write it to an excel file and resulting file will be attached to a Bi dashboard for automatic updates and publishing. Will this technique of yours work or would you care to give some more of yours experts advice? Thanks for being there. As a self taught pyhtoneer new to programming, you give me a lot of hope with your content. Thanks for being there for people like me. ❤❤❤❤
I scraped a site with playwright inside my express js app.get() function. How would i deploy this scraped data to a website such as vercel? It works locally but when i try to host it on heroku i just get error 500 ?
It’s probably because the host ip on heroku has already been blocked from accessing the site. Cos it’s shared ip it’s fairly common- you’ll need to use proxies to bypass this
Great video mate! really helpful! Question - any idea how page content can be displayed while using pytest with bs4? my tests passing successfully but i can't "scrape" data from websites so i can't see all the information in the inner_html. I'm using vscode both as IDE and terminal and besides passed tests in terminal, there is no other information. any ideas?
Hi John. When I got the the line, sync_playwright, I got the error, saying that I was using the sync api inside the asyncio loop. Do you know how to resolve it?
Thanks! Depends what I am trying to achieve. If it’s something like this I wouldn’t bother with Scrapy. One of the videos coming up with be scrapy + playwright
@@JohnWatsonRooney cool! Looking forward to that! I digged through your videos and got a bit confused with Itemloader. Should i use it if I just have to get just very static job info? I don't really need to process the data
@@vt2788 if your project is working for you without it then no don;t worry. it does make is easier when adding to databse etc as you can use it to clean your data out properly and stucture it