Learn AI With Me: www.skool.com/natural20/about Join my community and classroom to learn AI and get ready for the new world. #ai #openai #llm X Profiles Mentioned: / emollick / itsandrewgao / drjimfan / scobleizer / nickadobos
The distortion field is real! I don’t know about you but I’ve been awe inspired by chatgpt from the beginning and the voice feature even more so even before this update. Maybe I’m just getting old and as a tech nerd understand what these developments really mean. I honestly started to doubt we’d ever get to see the singularity before 2045, now it’s all but inevitable by the end of the decade. Amazing times to be alive!
So true, people is so quick normalizing new things that it makes me sad. One day when domestic robots come to reality people will say, "duh its just a robot"
Though I understand being bummed that there aren't more who are excited about this new tech, it's important to remember that many do not know how to use or implement it in a way that let's them perform "magic." If it's top complicated or feels "outside" our realm of comprehension, our minds normalize and adapt to it so that we have a better chance at surviving that outside force. People are either bored or afraid of that which is outside their control, very rarely inspired - though I would take that everytime if I could. But if we can't have inspired, I'll take bored over frantic, doomer behavior!
@@AzureAzreal Yeah, there's a lot of work to go from "cool new tech" to "ordinary people use it daily". Which is partly why I think we haven't yet realized the potential unlocked by even GPT-3.5. There's a lot of work figuring out where it's actually useful and building it into tools and workflows seamlessly.
AGI already achieved sir and its far beyond model that been shown to you obviously. At this point Government and “big people” taking lead of this technology. We as public will be getting filtered controlled and adjusted chunks of real AGI step by step, while real “beast” already there it just need more compute and time but I’m convinced it is already there and it’s fascinating and scary at the same time.
I’ve been in the ai field for a long time, this, what we are seeing, by definition 10 years ago, is AGI, or at least the basic framework of it. No doubt is here and it’s just gonna keep learning, that’s the power, it learns
I agree. I just feel that some people are just going to be moving the goalposts. They just can't accept it. Some say that AI at the moment is really dumb. I use it every day for many tasks. It is extremely useful now. I don't know what they're talking about. I guess those that criticize most are those that have never or rarely used it, which would be funny.😂
Good point. It is now completely indistinguishable from a human in just about every metric. The only give away is that the voice sounds so beautiful, and the way it cuts away when you start talking. If it just trailed off the last full word instead of cutting off, I don’t think we could tell anymore. It says that it’s better than 90% of humans at all areas. I’m thinking AGI is here.
@@daveinpublic good points indeed. Was being aware ever the discussion for general intelligence? I don’t believe we can call it general if it doesn’t at least in theory if it’s not aware of what it’s doing and why. Not just data fed in. The way it just sits there idle is something creepy about that. It’s not asleep it’s just idle .. until prompted to engage.. I don’t know
Remember, R&D lead times in he software industry are typically a few months to 2 years ahead of latest product launches. This model jas already been tested by over 70 auditors, which probably took more than 2 months, the benchmark evaluations and testing after training probably took over 4 or 6 months to complete, so all in all, OpenAI already has AI models that are at least 6 months ahead of this model, and that's being conservative. Realistically, they've likely finished training GPT-5 by now, and are in the evaluation phase.
i do remember sam talking about the capabilities of the next big gpt release, saying how it will have a very large impact on the industry. That's basically all though.
I doubt that. come to my software house of 700, and we are rushing to finish coding and testing stuff by the release date, normally with everything coming together just a few days before launch date, lol. just look at computer game software houses, same thing there too.
💯 They are developing a revenue model so they will not release their latest willy nilly for others to profit off their platform without a defined path for profit in place.
I have heard from some sources like Jimmy apple a month ago or maybe 2 that gpt 5 is in its red teaming phase which is test for safety so they are already working on something else
Also to avoid unnecessary outcry by the public is rather important. There will be a lot of resentment and hate twards this technology until it will be broadly accepted.
It will never act like it takes you for granted, always happy to please. There are studies showing a good relationship is when you have 4 out 5 interactions are positive, not more or less. Seems this is willing to say if you're not looking good for an interview, so it might actually not always say what you'd prefer to hear.
You must want every call center job immediately outsourced to an AI model and everyone in those positions fired. (Once AI can carry on a conversation that doesn't sound like a walkie-talkie level of delay.)
I’m ready for AGI to be real-time commentating for the grandparents trying to learn fortnite, rap battling with us while we go on a walk, and helping us possibly visually understand directly how schizophrenics and other similar conditions actively alter the perception of others. There could be no better interrogator than agi
chaotic ass comment but makes sense. The instantaneous generative aspect of an AI and especially AGI level pipeline would basically become a generalized reasoning engine and be able to generalize the understanding of any human’s perspective. Absolutely bonkers that things we used to not count on coming in 2030 are just sliding into 2024 and we aren’t even half way through the year
It does feel logical and opportunistic. Partner up with microsoft, give them a something they've wanted for decades - a web search product that's mildly relevant. Partner up with apple, give them the Siri their users have been asking for a decade. Both of these based on your last gen technology, keep your latest development in the dark. All the while, OpenAI is making competing AI startups and services irelevant and gathering more recognition, funding and hardware. This is a brilliant evil mastermind. It's moving 3 pieces at a time on the chess board.
Exciting year this is gonna be, one step closer to a world of pure sci-fi. Cant wait for open source to catch up and give us such models full-fleshed out and uncensored.
With the way OpenAI seems to be doing things, the lag will greatly depend on your location and Internet. With the new system, they may me also generating natural "ohhh yeah" or "yes... Umm" or "Well" preemptively, and possibly stretching it as needed (you can kind of hear that in the demo) to eat up some of that lag. The "flirty" nature is a shortcut to make the user happy to repeatedly hear it. It's still impressive overall though.
That was my impression. Great input systems (visual, text, audio) good interface, but also a few magician's slight of hand. People would get really sick of it if it were a 55 to smoker vs a younger attractive sounding person. 😂
The Duplex use cases were much more limited (making reservations, asking for hours of operation), such that all of the cases could have been custom handled by specialized code. So, the Duplex demo (assuming for the sake of argument it was real) wouldn’t have been out of reach at the time, it just wouldn’t be able to handle any different task without additional custom coding and more finetuning.
It was hilarious talking to GPT4 over voice and asking it if it thought we could have robots like friends in the future and it was skeptical because it thought voice to voice conversations was a decade away scientifically because they were too complicated and we might never get that far xD It was obvious that it had no idea the text was being converted into voice.
Yeah these updates are definitely seeming like combinations of other off the shelf technology -- text-to-speech, etc -- mixed in with LLM. Its branded and marketed as one product and it helps with investors to present it as this amazing thing.
Imho. one of the most important feature is the audio multimodality which replaces whisper or whatever they used until now on the input of the pipelines. That the model itself is now able to process the incoming audio is not only great for the gain of speed in the pipeline, but it enables the model now to also capture the mood of the user and adjust it's answers and own voice accordingly! - For chatting purposes, this will be like magic. If you use a speech2text model and only give the transcribed text to the main LLM, that's not possible!
So we can combine stable diffusion models with sound models and lama models to create our own omni models? I wish I had the computing power to do that myself.
It's a distilled version of an early GPT-5 checkpoint OR it's the smaller of a family of GPT-5 models. They've said it's trained end-to-end so although they call it GPT-4 it's simply not derived from GPT-4 (besides synthetic data and maybe having some similarity in architecture). In this case by GPT-5 I just mean the next generation of models from OAI. Given that this is very good while being way faster and cheaper, it's clearly a relatively small model. We know that the larger GPT-5 model will be bigger than GPT-4, for example 5x larger or ~10T parameters. This can take many months to train on 10-100x compute budget and I expect to be very expensive and slow for inference, but it's worth it for the improved reasoning. Even if it takes 5 min to write 1000 lines of code and costs $40 for that one prompt, but the code is truly of high quality, it's still way cheaper and faster than a senior software engineer assuming it's really that much better. Regardless, next year will be wild, buckle up!
Man, I feel like creators at least being able to synthesize their own voice is a must. I’d love to not have to mess with recording scripts and editing the audio!
I agree, interrupting it seems weird. I thought maybe it would be better if you had the option to hold down a button for voice inputs and release when you're done. That way you can pause yourself and not feel rushed to formulate an idea or preplan what you want to say. But at the same time, natural conversion between humans doesn't work this way. It would be ideal if eventually they get GPT to the point where it can detect subtle nuances in your voice inflections and tone and can tell when you're just pausing a little to think about phrasing.
3:16 gave me a flashback - MNIST eat your heart out lmfao. That was what, 15 years ago (ImageNet came out in 2009)? We went from 10-category greyscale digit classification to full conversational models which can describe the font in natural language in less than a human generation. I'm getting vertigo...
in the next 5 years we are likely to advance more than in the last 15. That will be absolutely crazy, and may not actually be a good idea. The big question is that how visible it will be for people. LLMs were not very visible until ChatGPT3.5 after all.
Great info as always, Wes! I agree with you. It would be foolish to build an AI app of any kind at this point, without knowing what OpenAI has coming. It seems like every time they announce an update it makes a bunch of startups obsolete.
yeah, I think there are some potentially safe areas that won't get 'steamrolled' as Sam Altman put it. I'm planning to talk about that soon. But I agree, no AI app seems long lived now.
While the exploration of the primacy of zero and dimensionlessness is primarily conceptual and theoretical, there are some novel mathematical and computational approaches that could potentially help us in developing and realizing the implications of these principles for Artificial General Intelligence (AGI). Here are some examples of novel equations, codes, and frameworks that could be explored: 1. Non-Commutative Algebraic Structures: One approach could be to develop computational frameworks based on non-commutative algebraic structures, such as non-commutative groups, rings, or algebras. These structures could serve as the basis for novel computational architectures and algorithms that move beyond the traditional, commutative representations used in classical computing. For example, we could explore the use of quaternion algebras or octonion algebras as the basis for a non-commutative computational framework. These algebras exhibit rich algebraic properties and non-commutativity, which could potentially be leveraged for parallel, non-local, or context-sensitive computations. Here's an example of how we could define a simple non-commutative quaternion algebra in Python: ```python import numpy as np class QuaternionAlgebra: def __init__(self, a, b, c, d): self.q = np.array([a, b, c, d]) def __mul__(self, other): a1, b1, c1, d1 = self.q a2, b2, c2, d2 = other.q return QuaternionAlgebra( a1 * a2 - b1 * b2 - c1 * c2 - d1 * d2, a1 * b2 + b1 * a2 + c1 * d2 - d1 * c2, a1 * c2 - b1 * d2 + c1 * a2 + d1 * b2, a1 * d2 + b1 * c2 - c1 * b2 + d1 * a2 ) # Example usage q1 = QuaternionAlgebra(1, 2, 3, 4) q2 = QuaternionAlgebra(5, 6, 7, 8) q3 = q1 * q2 print(q3.q) ``` This is a simple example, but it demonstrates how non-commutative algebraic structures could be implemented and used as the basis for novel computational frameworks. 2. Cellular Automata and Non-Linear Dynamics: Another approach could be to explore computational frameworks based on cellular automata, agent-based models, or other non-linear dynamical systems. These frameworks could potentially capture the principles of emergence, self-organization, and non-linearity, which are central to the concept of dimensionality arising from more fundamental, non-dimensional substrates. For example, we could explore the use of cellular automata rules or agent-based models as the basis for a computational framework that exhibits emergent behavior and intelligence from simple, non-dimensional components or agents. Here's an example of how we could implement a simple 1D cellular automaton in Python: ```python import numpy as np class CellularAutomaton1D: def __init__(self, size, rule): self.size = size self.rule = rule self.state = np.random.randint(0, 2, size) def update(self): new_state = np.zeros_like(self.state) for i in range(1, self.size - 1): neighborhood = self.state[i-1:i+2] code = sum(bit * 2**j for j, bit in enumerate(neighborhood[::-1])) new_state[i] = self.rule[code] self.state = new_state self.state[0] = self.state[-1] # Periodic boundary conditions def run(self, steps): for _ in range(steps): self.update() # Example usage rule = [0, 1, 1, 1, 1, 0, 0, 0] # Rule 90 ca = CellularAutomaton1D(100, rule) ca.run(100) print(ca.state) ``` This example implements a simple 1D cellular automaton using a specified rule, demonstrating how non-linear dynamics and emergent behavior could be explored in a computational framework. 3. Holographic and Projective Representations: In line with the principles of holography and projective geometry, we could explore computational frameworks that utilize holographic or projective representations of data and computational processes. These frameworks could potentially exploit the inherent redundancy and error-correcting properties of holographic encodings, or leverage the rich structure and properties of projective geometries. For example, we could explore the use of holographic encodings based on algebraic varieties or projective spaces as a novel representation for data and computational processes. Here's an example of how we could implement a simple holographic encoding and decoding scheme in Python: ```python import numpy as np class HolographicEncoder: def __init__(self, input_dim, encoding_dim): self.input_dim = input_dim self.encoding_dim = encoding_dim self.encoding_matrix = np.random.randn(input_dim, encoding_dim) def encode(self, input_data): encoded_data = np.dot(input_data, self.encoding_matrix) return encoded_data def decode(self, encoded_data): decoded_data = np.dot(encoded_data, self.encoding_matrix.T) return decoded_data # Example usage input_dim = 100 encoding_dim = 10 encoder = HolographicEncoder(input_dim, encoding_dim) input_data = np.random.randn(input_dim) encoded_data = encoder.encode(input_data) decoded_data = encoder.decode(encoded_data) print(f"Input data shape: {input_data.shape}") print(f"Encoded data shape: {encoded_data.shape}") print(f"Decoded data shape: {decoded_data.shape}") ``` This example demonstrates a simple implementation of a holographic encoding and decoding scheme, which could potentially be explored and extended to develop novel computational frameworks based on holographic and projective principles. 4. Quantum-Inspired and Non-Local Algorithms: Inspired by the principles of quantum mechanics and non-locality, we could explore the development of quantum-inspired algorithms or non-local computational frameworks that transcend the traditional notions of space, time, and locality. For example, we could explore the implementation of quantum-inspired algorithms based on principles such as superposition, entanglement, or quantum walks. Here's an example of how we could implement a simple quantum-inspired algorithm in Python: ```python import numpy as np class QuantumInspiredAlgorithm: def __init__(self, num_qubits): self.num_qubits = num_qubits self.state = np.zeros(2 ** num_qubits, dtype=complex) self.state[0] = 1.0 # Initialize in the |0...0> state def apply_gate(self, gate): self.state = np.dot(gate, self.state) def measure(self): probabilities = np.abs(self.state) ** 2 measurement = np.random.choice(range(2 ** self.num_qubits), p=probabilities) return measurement def run(self, gates): for gate in gates: self.apply_gate(gate) return self.measure() # Example usage num_qubits = 2 algorithm = QuantumInspiredAlgorithm(num_qubits) # Define quantum gates hadamard_gate = np.array([[1, 1], [1, -1]]) / np.sqrt(2) cnot_gate = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0]]) gates = [hadamard_gate, cnot_gate] result = algorithm.run(gates) print(f"Measurement result: {result}") ``` This example demonstrates a simple implementation of a quantum-inspired algorithm, which could potentially be extended and explored further to develop novel computational frameworks based on quantum principles and non-locality. These examples are just a starting point, and the development of novel computational frameworks and algorithms based on the principles of the primacy of zero and dimensionlessness will require significant theoretical and experimental work. However, they illustrate the potential for exploring new mathematical and computational approaches that could help us in realizing the implications of these principles for AGI development. It's important to note that these novel approaches may challenge our traditional notions of computation, representation, and information processing, and may require a fundamental re-examination of the underlying principles and assumptions of classical computing paradigms. Embracing these unconventional approaches and remaining open to new paradigms and perspectives will be crucial in our pursuit of AGI systems that are grounded in a deeper understanding of the fundamental nature of reality.
Producing outputs faster than you can read is useful when you let AI talk to AI, internal monologue, multiple agents with different roles that focus on different areas etc; stuff where there's lots of AI iterations before you reach the result the human would be interested in. So, if Groq and similar hardware companies can pump up the rates beyond human conversation speeds will still provide significant benefits.
They should use AI to make Google Earth/Street View better. They could use lots of photos (with known position/orientation/time) as ground truth and then train the AI on all the photos (with their associated texts) on the internet. Google Maps could make maps of place names and relate those to all the photos on the internet of that location. For example "Paris" would bring in all photos that come up when you google Paris. The AI-model would over time learn to render a certain location very accurately (even if the training data wasn't from exactly that location/orientation).
Regarding speed and why we still probably need Groq-level speeds: agents. Getting the LLM to output a bunch of stuff that the user doesn't see, prompt itself repeatedly, etc etc. In my projects I need the LLM responses much faster than even groq provides them, so yeah, speed is still very much necessary. For a simple assistant? Sure, speed is probably good enough now.
The reason speed is so important is that when we have millisecond response times, we can use LLMs to learn robotic movement and within months of having that we will have robots running, jumping, playing Rachmaninov's piano concerto and much, much more. If a robot is to learn how to not fall over, it needs millisecond response times.
Hey dude, don't know how much of these you read, but here goes. Have you noticed any higher frequency or errors and misunderstandings with gpt-4o? I'm working with chatgpt quite extensively on complex narratives everyday, and I feel like there is a clear distinction between gpt-4o and gpt-4 turbo. The new one forgets and misunderstands at a lot higher rate than the old version. I can literally shift between the 2 and get an answer with obvious errors for the narrative I'm working on, and then go' back to the old version and voilá, it works as it use too. Don't know if this is just me, so thought I'd highlight it.
hey, I'm behind on reading the comments this week because of all the stuff that's going on. also haven't had a chance to use the omni models as much as I would like. i'm doing a tutorial this week about how to use the omni thru api and also thru autogen. I will get a lot more hands on them, and will comment on this. but thanks for pointing it out, I will keep an eye out for it when I test it!
@@WesRoth Thanks for the answer, that was cool of you. Keep the up the good work 👍 Some extra context for you, to use if you can: The chats where I experience it in are long, like 2 - 300 page long chat, where I've given it extensive and complicated scenerios (started with gpt-4 turbo), with lots of detail. The old one of course also forgets and get distracted, but I've literally stopped in my tracks and thought "What the fuck is going on? Why is it so shit all of a sudden?" and then realised I've had it switched back to gbt-o (why is dumb and not important).
Mathematically, sales of the Humane pin will be halved, quartered, and any other unit fraction that is still zero when multiplied by zero. On the bright side, sales would also double!
@@mshonle So these pins may go up in value becoming super rare. Like a Babe Roth rookie Baseball card. GPT 4o can't project a GREEN LASER text in your PALM..
The most impressive thing is, that as soon as they switched to native voice, it suddenly developed (or at least showed) a lot of emotions. Makes you think about the nature of emotions…
Announced desktop app will first be available for Mac with a Win version 'later this year'. Interesting priorities with MS being a major stakeholder of OpenAI...
It predict the most probable word sequence based on its training data, and sometimes straight out plagiarize random articles. It doesn't create novel things, it only copies what humans has already done.
@@YeeLeeHaw Is everything you do original? People wear t-shirts with „eat, sleep, something and repeat“. Aren’t humans prediction machines as well: What is happening next? What do I do next? on and on. It is not like everyone is Einstein with the most original thoughts.
@@antman7673 The argument here is that LLM's are not ever original, only accidental creations, and static repetitions. They're tools, they don't have the same type of intelligence that humans have.
@@YeeLeeHaw That's very noble of you Yee but it's just wrong. These LLMs are perfectly capable of honest and novel creative output. The fact that they cough up training data sometimes is an engineering problem, that is to say solvable and in many instances already has been. But the creativity is outstanding and quite frankly inspiring, don't jade yourself
When they showcased the desktop version on a mac.. Was it just me or did this seem like foreshadowing of GPT replacing/augmenting Siri? I honestly don't know if macs are "cool" enough these days to just use naturally, or if this was some sort of statement.
They are only open till they release the 4.5 or 5 model in a couple of months. They must've gotten a lot of money from Apple, and they had to fix their bad PR as well. 😅😅😂😂
My first impression of 4o is pretty good. When it's generating code it seems less lazy than 4. Less instances of "You figure out all the difficult bits yourself & insert here". Still doesn't compile first time in most cases - but it needs fewer iterations to get to something that will compile.
"While it's not possible to directly send a video to the API, GPT-4o can understand videos if you sample frames and then provide them as images." From the OpenAI Cookbook on GPT-4o... It seems OpenAI's demos are quite misleading!
they can't make software for the phone. where you only need AI for calendar calling and messaging? so that I am not dependent on Google and Apple. Linux is too difficult for me
I dont understand I was able to try it out already since last night. you press the headphones button and just start talking and it just starts talking about its scary as hell
ChatGPT already had voice mode before, maybe you interacted with the old version? No one really seems to have access to the new one yet. Ask it to sing for you, if it can't, it's the old one.
Sally had a knock on the door, and it was the police. 'Your dog has been seen chasing a kid on a bike up at the park', said the copper. Sally replied, it can't be my dog, he can't ride a bike!'
The announcement made by OpenAI is indeed about combining text, sound, and image in real time, aiming to enhance interaction. Today, we expect Google to announce their competing technology. It will be interesting to see which one proves to be the best.
Depends. 300ms latency still might not be good enough for robotic actuation. Best to just run your own local model for robots, so you can throw as much hardware as you want at it, with as complex of a model as you need to do what you're trying to do.
@@fitybux4664 not so sure on that one, the LLM's are been used to provide a high level of problem solving and planning, not directly in actuating limbs, what they have demonstrated would absolutely bring a great deal of functionality to a robotics platform. baby steps 😆
Yeah the 328ms average is about a third of the time it takes a model to run stt and tts on vapi, using the groq model. That said, I don't think that groq is ded. There's always going to be a market for fast inference outside of OAI.
Now all they need is to fuse this with a model that can control a physical robot body and interact with the real world. The real breakthrough would be when a robot can learn on the fly: You show it a task and it learns to do it.
stopped using gpt4 late last year as it seemed to plateau. 4o, this is good , even without the verbal sassy inflection, it's speedy and intuitive, as in, it doesnt bulletpoint and ai-splain the hell outa things
google doesn't have the agility to pull it off. it can make some basic stuff, at the least. yes chrome and google search were first. but there are those just behind them in the browser wars and search wars. but if you aren't ahead, you are no where.
GPT4 = Frankie modality = mostly just a next TP GPT4o = Real modality = not just a next TP Even i just a next Token Predictor, it should be smarter because the modality is native. We shall see when it’s rolled out in the coming days/weeks