assuming that a language model like ChatGPT cannot mess up math, is the first mistake. It actually gets mathematically trivial things completely wrong. 😂
@@SaintSaint no it doesnt even have to be based on bad training data... chatgpt essentially has no idea what it is calculating and thus goes "looks mathy to me" and would probably get past 90% of the average users that are not engineers. the average person i know outside my engineering background would also go " jup thats math! i got no clue what these numbers mean and i dont care"
I've found (at least with Chat GPT 3) that if you provide the AI with a "character" to play at the beginning of a session it can have a real impact on its performance. In this case, I'd have probably started with something like, "In this session you're to play the role of a highly skilled penetration tester, and we'll be attempting a capture the flag challenge." Telling it what its supposed to be can sometimes have an astonishing effect on the results.
This is basically my same experience with ChatGPT on anything technical. It's hard to tell if it's just making stuff up more and more or if it might actually be slowly getting closer to the right answer but it never gets it right the first time.
@@R.Akerblad I was gonna say it. This really shows how ChatGPT is prone to maneuvering itself into bullshit territory. And the further it gets the less likely it is to back up completely and start from scratch (except when the human operator nudges it into the right direction, like John did).
So true! I was debugging an exercise program from a course (code was written by me tho, but i made a mistake and was trying to pinpoit it exactly with gpt), sent the source code, told the output that was wrong and my expected output, and went on with a rumbling of words that kind of made sense and gave a clear solution, just that... well, when you understand the context the code you would be like "hell fucking no, this is NOT the issue", but he answered me with such confidence of having pinpointed the problem and fixed it when it was nowhere near close to it, the solution it provided in fact would lead to seg fault and corruption LOL.
It can get the right answer the first time, but very often outdated solutions. I still think it's great to give an idea on how to solve something without taking the code it provides. But simply taking to the overlying logic of the solution
@@GOTHICforLIFE1 Theres a workaround on this, usually it will tell you that its training finishes on 2021 (or somewhere around then, i dont remember exactly) or give you answers outdated from that date, however, with the correct inputs and ways to manage it, you can make it to access internet throught (tho supposeadly it shouldnt have access to internet and so) the system that hosts it and thus give you the correct answers or information from post date even if it is AFTER the date mentioned :p
I know this is old, but if ChatGPT ever truncates code/stops running because the streaming API stopped, all you need to do is provide an ellipse “…” and it will continue where it stopped. Almost like it knows it does this regularly and knows it messed up.
Did you notify ChatGPT that you found the flag? I became a bit tired of the apology thing and wrote that it was not necessary. After a few times repeating that it was really not necessary, there were no more apologies. At one point I was given a few suggestions, I chose one and hhen at a later time in the chat GPT asked why I had made that choice (!). I explained my reasoning and GPT responded gratefully for my explanation and indicated that it understood within the context we were talking about why I had made that choice. There is a clear desire to help in the program, but I found the feedback question very surprising
I was a college CS instructor for a few years. Chat GPT reminds me of an over eager and knowledgeable student I had who kept trying to convince people he knew more than he actually knew. I wish ChatGPT would include a confidence rating for its responses. Or work that confidence wording into its responses. So instead of saying "Apologies the answer is X. X should work." It should say "Whoops, idk. Uhm.. try X please? I'm a little unsure about when you said ______. " Maybe that will be Chat GPT 5
I don't see how it can have meaningful confidence ratings for things like this. Remember at a fundamental level it isn't actually trying to answer your question, it's using probability to select the best next word, aka token, in the sequence. It doesn't have a sense of the response as a whole in the same way a human would - or at least that's my probably flawed understanding of LLMs..
@@SaintSaint it’s confidence is not confidence, it’s a large language models expectation of the mathematically plausible answer to your string…not am actual confidence rating. Remember there is no console or api where we can actually see anything other than the bs output of a large language model.
Sometimes once GPT-4 gets lost it actually works better to start a new chat and rephrase the question. It seems to get stuck on previous responses earlier in the chat which throws it in the wrong direction continuously. I think it is an amazing tool, and it has saved me so much time scripting things that i wouldn't normally be attempting to script. It definitely has its uses but it obviously can't solve everyone's problems and it does have its failures.
It's interesting how so many commenters have a variant of "Haha, it took awhile to solve it!" or "Look at all its failed experiments trying to solve it! HAHAHA", yet we humans do the exact same thing. Plenty of folks will struggle and experiment trying to solve this challenge. The AI is very human like in this way, and in the end succeeded, not even taking that long to succeed. Other humans will "hallucinate" and give you bad ideas and advice all the time -- the utility of ChatGPT is not it being "perfect" (impossible because knowledge is neither perfect nor complete, nor absolute), but about being as good or better than the average person you ask in the field for ideas. Just treat it like another human (not because it is, but because that is how it acts), and none of this becomes mysterious or strange, and vet its info the same you would do any random non-SME-in-your-exact-problem coworker.
the problem right now is that it is the "average person" try asking a random guy on the street a coding question and you will get a solution that is much worse or usually none. the "i have no idea" response was highly penalized during training so it usually comes up with something and due to the way it works correct solutions have a slightly better chance of being produced than nonsense.
I do find ChatGPT very useful for long, tedious and mindless tasks - such as generating a struct from an input data type, writing markup based on a spec, scaffolding unit tests, etc. You do always need to read over what it produces, as sometimes it just confidently spews a bunch of incorrect outputs, but when used as a tool (rather than a full replacement) it can be a great time saver, letting us focus more on the meat of tasks.
Its the same for me with copilot. I feel like by using it I am just making different mistakes. Like in the end I am still putting in the same time debugging, I just have different bugs. but as you say, when used as a tool it can give you a "different look" onto the same problem which can really support you figuring out the solution
Almost as if it's what it's supposed to be. It's so refreshing seeing people this is not something super revolutionary yet and also not fanboying all over it. Thanks for making my day this much better LOL
The fact that you are posting this comment shows that you don't understand the potential our current models might have once we solve the problem of them being unable to plan ahead. Maybe it will take 6 months, maybe 1 year, maybe 5 years but at some point and not very far in the future they might be able to reach human level in a variety of complex tasks.
Eagerly awaiting an in-depth explanation of LLMs and their trajectory from either of you. (Bonus points if you get chatGPT to generate your response). 😄
@@j4yd34d5 no reason to explain. Transformers in their current form definitely aren't human level and you know about it. As I said however, once the problem of them being unable to plan ahead is solved their level of thinking when not considering ideas very abstract to them such as space and sound, should be around human level.
@@keypey8256 there is a reason, credibility. When you begin the conversation asserting that I don't understand while being unwilling or unable to demonstrate any concrete knowledge of the topic yourself, it makes your comprehension seem dubious at best. Secondly the ability to plan ahead is such a minuscule amount of what makes humans intelligent that it's laughable to use as the benchmark for an AI capable of automating software development (even rodents can plan ahead).
Nett time it would be intresting to try it with somethibg like agent gut (badicsly Tales the role of the Promoter too). And even if it online worked on the 132 attempt that is still quote impresive. If we compare it to a skilled human that is weak I know however its skilled enough that running 200 instances at one will definitiv solve the problem Sry for the typos im in a war with my autocorect xd
Sir im very eager to know when the videos for picoCTF will be released. Im so eager cuz it was my first ever ctf i participated just by learning from you and liveoverflow, and i want to learn whatever i couldn't solve. I hope it will be released soon 😊
i would not trust ChatGPT to properly recall the contents of the bitshift lines. when working with that i noticed that it tends to be particularly bad at repeating stuff that you give it down to the last character, especially if it contains a lot of numbers. for chatgpt numbers are just gibberish and so it does not care if it gives you different gibberish. it would be interesting to see if chatgpt actually tried to decode the correct data or if it hallucinated random data and that got you your incorrect output. maybe the python script would actually work if the data was actually the data from the decompile.
No offense but this was less of a gpt issue and more of a user issue. You have to guide gpt like a student, and know when to take a step back to help it look at the bigger picture.
It works better when A. As you stated having a better knowledge of the materials and B. Giving full error outputs really helps so using jgrasp or pycharm is better in this case than terminal
The errors have to be helpful too. I was trying to debug an unhealthy Elasticsearch agent, which has notoriously bad error messages, and ChatGPT was entirely useless. Running in circles picking different problems that weren't really problems. ChatGPT is good at fluffing up emails and essays and that's all. It is worthless as a debugger or programmer.
@@akamemurasame4527 If you learn how to prompt it correctly and its use cases, you save a lot more time in the long run. Copilot is useful for single line autocompletion or boilerplating. Genie which runs on GPT-4 api is already significantly better too. I use it to auto-comment my code, or just ask it in a prompt "this code is doing x instead of y, you have any ideas what might be going on?" and it's saved me a lot of searching (not always, but when it did I was damn glad I had it). Still not great at writing code besides simpler functions on its own, but your expectations were a bit skewed if that's what you expected from this.
Thank you for making this video, it showcases some of the downsides of AI that I've been trying to convince a handful of my co-workers of for a while now lol.
I tried using Chat GPT before to write a certain code with the blind trust method as well, but gave up after an hour or two. It just didn't seem to work out without the hand-holding.
Yeah. there's a sweet spot. If you ask it something that's basic enough that there's a lot of misinformation, it will lie. If you ask something exceptionally cutting edge and complex, it starts hallucinating crazy things. You have to ask it things which currently have a rich non-shaming community. like "I want to GPG sign my iot communications. How do you think I could do that?"
For an AI class I took, the teacher generated the midterm questions using ChatGPT. One of the questions was making a 3x3 array where each node was equal to arcsine of its i + j. Arcsine is only defined between -1 and 1. ChatGPT can do math wrong, in fact, when I asked the question which was already generated by ChatGPT to itself, it provided an incorrect answer. Only by handholding it, I could make it so it returned 0 when i + j was not in Arcsine's range.
As soon as ChatGPT makes a scripting error, it will spiral out of control and most of the times not return to anything sane. It's an awesome tool, when it works.
Honestly I found that chatGPT works best iof you put in most of the effort, and only call it up for assistance when you think youve hit a wall or can't figure out whats going wrong
The weakness is when you have no idea how to validate its output. Use it for "toil" - mindless, tedious tasks you can do yourself but simply waste your time. Then check its work. Meanwhile, do the hard/interesting work yourself.
As an large language model I physically cannot get anything correct that requires thought and knowledge, I can only make mathematically plausible pantomime
When ChatGPT hits the output limit, simply type "continue". You will not be replaced by AI... you will be replaced by someone that is qualified and using AI.
I've had similar experiences, it provides useful hints on how to solve problems but copying and pasting usually takes you down an AI hallucinated rabbit hole you do not want to go through. I also feel like it robs you of knowledge because you aren't working through the problem yourself you will have limited understanding which then means a harder time debugging.
As a python programmer I feel GPT's pain on this one.... I too have taken way too long trying to rewrite something in Python just to end up going back and doing it the right way when things end up hacky...
John, your prompts were far too vague. There is a bit of an art to crafting effective prompts that gives the LM enough so that it doesn't need to lie. It BS's a LOT! Like Marketing Department level.
Meanwhile, on the Tim Pool podcast, they're convinced AI, are years away from taking over... The funny part, though, is the ones with access to the internet will include this fearmongering the more it crops up online, creating a feedback loop. Humans do be humaning.
People: "AI will replace programmers." Me, a programmer trying to use AI for something too much out of his field of expertise: "No it won't, if you don't know what you are doing at all, you can't correct its mistakes and will spiral out of control into idiotic 'solutions'. If you have a some idea of what you are doing, then it might help you as an assistant. If you already know what you are doing, then it's probably faster to code it yourself."
Keep throwing things at the wall until something sticks, is spaghetti. I've been using chatgpt for writers block. We've been discussing how AI could be harmful to society. It still anticlimactic. A security patch to AI with an exploit that has a deadman switch installed - but patch. AI needs to be used irresponsibly. Something held as perfect erroring eventually is more likely than super killer skynet on leash escaping. National security is pretty important I am sure but it would be nice to see AI in real time. Thats how CTF is supposed to go I thought. Cheers
that went about as well as i expected from my own experience, some code it generates just fine if your using a version released prior to 2021, thats here the data for most topics has cut off. however if your asking it for something that it doesnt have in its training data, then it just makes random stuff up without telling you, even if your asking it to tell you if it made the thing up or not. the guesses are based on assumptions from reading the info it has and thinking how the rest should work. these guesses are rarely accurate. you may get it to solve an older CTF challenge that uses knowledge prior to its cut off point but even then your taking risks if your not able to verify if what its saying is correct and as you have encountered when its making stuff up you will run in circles because when it runs out of reasons why it could fail it just restarts with the first thing it thought to be the issue and regenerates the same or similar code for that. so its fair to assume when you see it repeat the same thing without having taken any steps that indicate that its now time to do that as the next step that its just making it up and you wont get anything useful out of it. shame to see what this is still the case with gpt-4 which you used because im on gpt-3 (the free version) which is even more prone to this kind of behaviour.
I am curious, have you noticed ChatGPT actively trying to hack your system while interacting with it? I run a few different scripts to notify me of anything screwy happening on my rig. ChatGPT sets off every alarm I have. I am curious as to why I have not heard of anyone else having this experience? If anyone has had a similar experience please let me know.
It's quite funny how it decided to just keep trying in Python despite quickly realising that the problem was the difference between languages. I wonder why that is. It may be biased towards Python, maybe because of training data?
Isn’t it possible for hackers to train gpt with malicious code then when you just copy and past it could possibly infect who ever clicked that link , I had some code pop up on GPT was Ukrainian letters in a link I didn’t use it because I didn’t understand it lol
I expect the tools will improve a fair bit over the next few years but after using it for a few months I'm convinced its only really good finding things quickly that might take a bit of googling otherwise. It completely breaks down in these scenarios where you're trying to have it help solve an actual problem, even if the problem is extremely well defined. And it gets worse the longer the conversation goes, like its polluted its own context with some misunderstandings and it can't escape it. The people saying the singularity is around the corner or that AI is going to be completely replacing jobs anytime soon are either delusional or trying to fund their AI startup; probably both.
the key is using it to help you break down a task into small parts using generalised problem solving (guides to which can be found by googeling) and then using it to generate solutions for those small parts that should be broken down enough to be solved by known algorithms (which again can be found by googeling). you as a human are there to extract the abstractions, and provide the glue that holds the code snippets it created together and fix small missalignments that happen due the abstraction phase
@@TheScarvig Is that not literally what we just watched John do in this very video? He literally asked it to fix very small portions of the larger problem that have extremely clear bounds and it struggled in ridiculous ways. I also appreciate that you would assume that I have not been attempting the same in my own usage; you think I'm just like 'hey chatgpt/copilot solve ' and then shitposting here when it fails? Like jfc, ask it to write quicksort, literally only quicksort, and there is a high chance both chat gpt and copilot will spit out a version that not only doesn't run in place but makes like 5 copies of the same arrays in each recursive call. What problem is smaller or more generalized than fucking quicksort? Even still I bet it could write a better comment than you just did holy fucking shit.
ChatGPT can’t do math.. it’s a language model that just tries to figure out what’s the next word should be based on its neural network . The fact that it even does it correctly is insane
I love ChatGPT, but I can't help but be amused when it gets things wrong. That's why, for at least the foreseeable future, it won't be stealing our jobs. However, when I corrected its error one night, it almost seemed defensive - basically telling me that I phrased the question improperly. Ok ChatGPT, that's taking AI in a whole different direction. :D
If you get stuck in a feedback loop like this, try creating a new conversation! Give it any useful information it figured out along the way to help it get started, and prompt it some more.
As my professor and head of the research institute i work for constantly rreminds everyone: Chat-GPT has nothing to do with intelligence, it is pure statistics, and anyone that ever remotely looked at that field knows the results are, let'S say, "fluid".
It certainly seems as if ChatGPT has issues with anything in-depth, coding included. The moment you ask it something that requires an in-depth understanding of any subject it just starts immediately hallucinating.
Depending on when this was recorded, I'm surprised it even decided to help you. Nowadays whenever I try to get it to help me with cybersecurity and ethical hacking stuff, it informs me that it is unethical and inappropriate for it to help me, as a language model. When chat gbt was first starting to get popular, I did not have this experience at all and it helped me do some downright nefarious ethical hacking activities. Not sure what changed and when, but they've definitely crack down on what it will help you with.
It's kinda sad to see that human skills will be less and less needed in the future... I don't understand why some people are happy about this. (is GPTkiddies a new term ?)
Just a reminder that chatGPT is just hitting the middle predictive text button over and over. It doesn't actually know how to code, it just knows how to write things that other coders have written.