We're doing another giveaway! Subscribe to the Forward Future Newsletter for your chance to win a 24" Dell Monitor: gleam.io/q8wkK/dell-nvidia-monitor-2 (NA Only)
If it’s a full hour of audio-output-only then at $0.24/minute it’s $14.40/hr. If you assume a 50/50 split (half input at $0.06/minute and half output at $0.24/minute) then it’s $9/hr. Better than the Psychic Friends Network in the 90s that cost you $4.99/minute to talk to Miss Cleo!
open source voice models *WILL* be the future. uncensored (to whatever the user desires) private, locally run, secure, etc. Closed source and centralized AI will also exist, but open source, private, decentralized, locally run, uncensored etc. is truly for the people.
There is a graph that shows the cost of AI over time... It shrank by 2000% I've read, and it's going even faster thanks to NVIDIA chips. What a time to be alive.
I don't think you quite understood what prompt caching is. Imagine you send a prompt that contains: "your entire codebase + some request/question". If you send that prompt without prompt caching with 3 different requests/questions, GPT-4o will re-process the entire input prompt every time, including the massive codebase. However, with prompt caching, the model can remember all the compute required to process the codebase, and only actually process that final request/question at the end. Basically it allows the model to remember "oh yeah, I've seen this bit of this prompt before" and then only process the new bit of the prompt. this means: 1) it isn't something you can do yourself, as you don't have access to the model weights or raw matrix outputs 2) it It just so happens that storing the KQV matrix results means taking up a lot of storage, and so they charge for it. 3) the model will not respond the same way every time to the same prompt, as it will still randomly sample from the prediction distribution.
@@megafoxatron3rd521 I'm sure he's dealt with caching in his previous company, as he seems to understand that concept fine. It's just that he assumed what prompt caching was without first looking it up to make sure.
I just recently unsubbed from this channel. Every video is the SAME. He just puts a tweet on his video and READS IT OUT LOUD. The BS in the thumbnails same hyperbole in EVERY VIDEO!!!
Can't this be achieve with embeddings? You create embeddings for the context and prompt and save the response for that, if the prompt matches the embeddings you send the same result. I don't know much about this, feel free to correct me.
@@lseder1 I don´t how it is where you live, but right here in Brazil is almost impossible to think about local models at scale. A Rig to run something with a good quality at portuguese would be so expensive that I would be able to buy a ridiculous amount of tokens in OpenAI or Claude's API
Well, the name OpenAI has been a misnomer for some time. I hope that when they become a for-profit that thy change the company name. ClosedAI would be a good start...
@@cbnewham5633 I don't think the name matters that much, people fixates too much on that, their VISION is what should matter. They aren't sharing much anyway, being called OpenAI, CloseAI or SamAI won't change that.
Matt, the caching they are talking about, you can not do yourself. They are caching the actual inference weights in the model so it does not have to redo the calculations. They are trading off memory for compute (but probably copying to less expensive storage). They are not just caching the text. This can be done on open source models as well but the overhead is not always worth it, I suspect.
Exactly. That also means, unless I misunderstood, that modifying any part of the cached data (for example, modifying a specific function buried in a previously cached code repository) need all the data to be fed again ?
@@rlfrYes. It is really for cases where you are loading a large code base or documents (text or pictures) and then querying it. Possibly using a very large system prompt.
День назад
Two minute papers is awsome. looking his channel for years now. Thumbsup for your taste!
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, I'm eagerly awaiting the release of real-time video for Plus users from OpenAI, as it was originally mentioned as part of the ChatGPT Omni update, which sadly never reached us. This feature will be revolutionary, enabling us to tackle a wide range of daily tasks more efficiently. Real-time video integration within ChatGPT would greatly enhance productivity by allowing for interactive, dynamic assistance and more streamlined workflows. It would be especially useful for tasks like desktop sharing-being able to visually assist and collaborate on real-time activities is just phenomenal. I hope this feature rolls out soon, as it could drastically improve how we approach everyday challenges.
I think you're misunderstanding how caching in LLMs. *They aren't caching responses*. They are caching the token setup time for the attention in the model. This is why they ask you to set the static content at the start of the request and dynamic content at the end, because they can always add more to the end of the model.
Don't bother to test LFMs (at least now) they suck, it's just hype. Check their benchmarks, everything is 4 to 5 shots to score and some are "reported by developers" (quoting their website)
The Voice API pricing is unrealistic. $15+ an hour for audio output? Way too expensive to use it. It’s useful when it’s cheaper than a human, but more expensive than a human? No. I can get similar for $0.05 a minute, like from Azure, etc. Maybe for media production, but too expensive to use in conversational systems.
For the average prompt that has something "cacheable", it could certainly be argued that only 50% of the prompt may be different, or that the re-prompt results in 50% of the OG prompt's neural activations. Bottom line though, is that they don't have to make anything cheaper for anyone..... but they are.
@@KCM25NJL If they didn't have to make it cheaper they wouldn't have. Making it cheaper indicates they feel pressured to compete on price to maintain market share.
@@KCM25NJL they didn't make it cheaper, they found a way to make extra money, they save more but they don't pass on all the saving. It's like they're selling ice cream for $2 that cost them $1 to buy, but now they are selling the ice cream for $1 but it only costs them 10c to buy.
i just asked chatgpt to list all roman emperors and sort by length of reign.. it failed then gave up. not sure Id want it controlling my car just yet.. the DISCLAIMER and T&Cs for any app built using AI would be huge ;)
I don't know. I think at this point if they had to regulate AI the attention should be geared towards "alignment and mislagnment". AI is a huge black hole. It's "oblivion" and "agnoticism" on the inside. I would suggest that all companies participate on the "alignment and mislagnment" issue and make their findings available publicly. So making this part "opens source" only. Just a thought.
I wouldn't exactly call that realtime API cheap, isn't that about $24 per hour all in? You could hire someone in a 3rd world country like the UK for that 😄
This is prefix caching, which means the tail end of the prompt varies. Imagine a really long and detailed prompt specifying the schema for output format, if you need to bring in the same 2-8k tokens each time you will see some savings. For general caching you’re right and that’s something more aligned with the business idea you had a video on (asking someone else to start it)… any service that would be a stable interface that could be plugged in to apps (so that the apps aren’t tied to one single vendor) then caching might be key.
I think anyone working at OpenAI deserves a break from it, I can’t imagine the immense pressure they’ve been under and if they have been making good money then a sabbatical is in order.
counterpoint: you just cannot prove the lack of regulation was the cause of google, amazon and facebook becoming what they became. If something it was all the help the state gave to these companies that made them that giant.
I think in the sense of unlimited usage at a fixed price like cell service. Nothings free. but it certainly is nicer to have unlimited GB usage at a fixed price, than talking to someone long distance for 10 minutes and stressing that the call cost $8.80. Those days sucked.
I think it's over-simplistic to frame regulation as a stifler of growth. Regulation can take many forms, and in many cases is more just there to direct, and avoid worst case scenarios. Lack of regulation can end up being much worse than having it.
They left because they were afraid of the product that they saw being produced... without control and the implications of what that would mean. Fear...
By caching OpenAI means that they keep your prompt in memory. It's not a response cache, it's an input prompt cache. So, if you have a prompt "Do xyz on the text below" you can cache it and follow it up with a normal prompt in the same thread. I think there is a misunderstanding over this in the video and in the comments.
Might internal caching allow the equivalent of false prompt injections? False caches could inject undesired operations into immediate "memory" that in turn could be encorporated into other users product.
I do not understand fully the caching thing: 1) LLMs are generative models so not supposed to give necessarily the exact same answer with the same prompt, unless the seed used for the generation of the output is the same, aren't they? 2) if the model answer has a dependency on inputs changing over time, such as news scraped over the net or weather forcasts, I do not think that you want the same question to give you the same ouptut over and over. So this caching strategy can be interesting but can comes with degraded performance depending on your use case no?
We need a "robot tax". If an agent is taking a human job, that agent needs to pay taxes. This is the AI regulation we need. And the tax should be used for funding UBI.
I'm curious how you think we should measure this. Say a new company is started that only uses AI agents for its work. How do you determine if it is replacing human jobs and if so how many?
The entire purpose predicted on doing the right thing for humanity philosophy to now revamp the mission, vision and strategic imperative to anchored on profit imperative.
I disagree the anonymous Internet and Social Media should have been regulated much sooner. Social Media in particular has a huge negative effect on society we may never recover from.
They have been using caching in OpenAI gpt4 and 4o and Bing chat for a long time. Have you guys not noticed the absolute garbage first responses to common questions. Where it ignore what you actually asked and just went on a tangent from some key words you used.
@@MichaelForbes-d4p there isn’t a day that goes by that I don’t feel excited and grateful to talk about AI. But some days I don’t feel like recording vids about it 😀
I believe the most logical and beneficial way to fund AI development would be through taxes, making it freely accessible to everyone. With well-designed safeguards in place, this would encourage broader use of AI's capabilities, accelerating progress that benefits all of humanity. Additionally, decentralizing AI in this way would promote more equitable access and innovation.
Instead of output caching, we need support for git integration and gitpatches. I should be able to point the LLM at a git repo, and instead of it generating the full content in output or parts that I have to hand integrate, it would be better to just produce a gitpatch that I can apply to my local git.
A new theory on why so many are leaving could actually be a smart and good thing. They created this algorithm to ensure AGI and, later, superintelligence. But since this requires so much compute and needs to be spread out to benefit the world, they must create hype and attract investors globally. And since they partnered with Microsoft early on, they put themselves in a bind. The agreement with Microsoft was based on the assumption that it would take decades before real, significant progress. With so many in the know on how to build this AI structure, they can branch out to other companies and create more investments worldwide, potentially spreading the mission of prosperity. They have the "commandments" in prompt form and now need to spread them before regulators have a chance to distort or change them. I could actually benefit humanity more than it looks.
The newly released chat GPT voice mode on phones is just the tip the iceberg of what could be done with that technology. That said, I just made a video and uploaded it to my RU-vid channel that describes what might lie under the water of that iceberg. The video is called , No more Mice, Hey, It the 21st Century! And show how one use Chat GPT voice mode to rapidly and easily enter graphical data and such!
I already built a translation app for Android that does near-time real-time language translation with multiple speaker diarisation (multiple voices stored in a vector db and given a different voice). So far I only have English to Japanese and Japanese to English.
They say we are living in the age of the Anthropocene. I hope we are entering the age of the Metaciviliztion. Humanity will enter an age where not are we not only open, but we form a symbiosis with AI.
They started the company as nonprofit with mission and vision to go with it. Now going for profit it is misleading the original founder, mission and vision.
SignalWire has been offering a real time voice API to their AI since Oct 2023, and their pricing is pretty close at $0.25min. I wonder if this application- auto-attendants upgraded with AI, is the market OpenAI had in mind when arriving at their pricing? I'm not sure what AI SignalWire is using, but I will say the voice leaves something to be desired (compared to OpenAI).
Will people finally give up including Sora when they talk about AI video generation? It seems that OpenAI have quietly shelved it. As for the API pricing - that's hideously expensive, not "cheap". I haven't got a problem with what they charge as I'm sure some companies with deep pockets will have no problem paying those kind of charges. But to claim that is cheap is really not correct - not for small developers. Those "pennies" will soon add up to large amounts of money.