Kris, I had my physical AI robot watch this video with me! I'm sending frames of image data, and telling GPT it's the robot's CAMERA POV. It's such a cool experience to have a video-watching buddy! (He likes your tripod setup!)
Great demo. I was thinking about something similar whereby you compare images to a ring doorbell output: So when new image detected compare to my known stored images (wife, kids etc), then tells me who is at door. Also if not in my known images then categorize (postman, deliveroo, charity collector, unknown) etc..
Imagine the possibilities of doing something multimodal like this with a super fine-tuned local LLM where you can compare images generated every second.
Comparing image to image is a good idea.. I had it describe the initial image, and used the description with prompting to check for differences. Prompted to look for any issues. Just like telling someone to look out for any unusual activity.
How are you taking images with the iPhone every 5 minutes and how do you automatically upload them to Google Drive are you using some sort of iOS Shortcut or?
@@kevlaaaaar there is an app called Lens Buddy, which allows you to select how many photos, and how often an image should be taken then I think he just uses Google photos with automatic back ups because I’m not sure how or if iCloud has an API but if iCloud has an API that would be a better choice for an iPhone I assume it’s the best way that I could figure out how to do it.
Great, practical video. Another interesting take on a system like this could be one where the snapshots are triggered by motion instead of (only) time. This would probably be more power-consuming, especially for an iPhone, but thinking of existing off-the-grid systems like wildlife and game trail cameras, it seems possible using some different devices.
Nice. Love the way a combo of natural language and Python + APIs works so well. Would be horrendously expensive though to send one photo a minute which is what you really need to make it work as a CCTV
📝 Summary of Key Points: 📌 The video demonstrates how to use an iPhone and the GPT 4 Vision API as a security camera. It involves taking images every 5 minutes, uploading them to a Google Drive account, and using the Google Drive API to download the latest images. 🧐 The downloaded images are processed using the GPT 4 Vision API to compare them and detect changes. The changes are logged and summarized using the GPT 4 API, and an email report is generated using the Mailgun API. 🚀 The Python code for the project includes functions for authentication, downloading the latest image, analyzing image comparisons, and sending emails. A loop runs every 350 seconds to check for new images and perform the necessary comparisons and logging. 💡 Additional Insights and Observations: 💬 "The project worked as intended and the author was satisfied with the results." 📊 No specific data or statistics were mentioned in the video. 🌐 The video did not reference any external sources or references. 📣 Concluding Remarks: The video provides a step-by-step guide on using an iPhone and the GPT 4 Vision API as a security camera. By leveraging the Google Drive API, GPT 4 Vision API, and Mailgun API, the project successfully captures and analyzes images, detects changes, and generates email reports. The Python code provided in the video allows for easy implementation of this security camera setup. Made with Talkbud