Тёмный

How Well Does Chat GPT Know Commander Cards?  

The Trinket Mage
Подписаться 12 тыс.
Просмотров 4,6 тыс.
50% 1

#mtg #thetrinketmage #trinketmage
Patreon:
/ thetrinketmage
Sorry about some of the breathing noises my noise gate seemed to not work when recording this. Let me know what your score was!
Channel Art by Beevuu:
Insta: / beevuu
Twitter: / beevuu
All the Music is by Chillpeach:
/ @chillpeach

Игры

Опубликовано:

 

31 май 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 61   
@admiralatom5990
@admiralatom5990 Месяц назад
The biggest take away is that CHATGPT doesn't refer to itself in its answers. Some of the people used "I" when they answered.
@thetrinketmage
@thetrinketmage Месяц назад
Didn’t notice that!
@teeeea1741
@teeeea1741 Месяц назад
I also saw chatgpt using I.
@devan9197
@devan9197 Месяц назад
Honestly for me it was pretty obvious and that gave it away a lot
@natelagrassa9337
@natelagrassa9337 Месяц назад
Yeah AI don’t refer to themselves very often… in formal writing to drive a point home you don’t use the word “I.” I picked up on that too, lol.
@mr.whistler6114
@mr.whistler6114 Месяц назад
Remember : ChatGPT will never think outside of the box. ChatGPT is the box. Edit : In the AI response for Forcefield, ChatGPT talks about it not being ''essential for most decks withing its colors''. Forcefield is colorless, so only an AI would think of it the same way as Black/Blue/Green/Red or White because those are the six options plausible for MTG deckbuilding. Furthermore, we can deduce that ChatGPT didn't understood a ''colorless card'' as the idea of a card devoid of colors, but as a sixth color that, in the MTG rules, can be blend in every color types of decks, thus why it speaks of ''its colors'' in plural.
@Jerma985_fan
@Jerma985_fan Месяц назад
woah demonic tutor trip me up I'm surprised someone gave that a B.
@thetrinketmage
@thetrinketmage Месяц назад
That threw me for a loop as well!
@atticussalmon9064
@atticussalmon9064 Месяц назад
The AI says "decks within it's colors" or some variation of that A LOT, kind of a giveaway
@trevordumais2117
@trevordumais2117 Месяц назад
Bros rating Mechanized Production as a D forget that treasures exist. It all goes back to Smothering Tithe.
@thetrinketmage
@thetrinketmage Месяц назад
Smothering tithe really was the hero of this story
@DMZZ_DZDM
@DMZZ_DZDM Месяц назад
ChatGPT will use "creative" language to fill up space and will always expand on surface level issues while only brushing on more nuanced details that affect the broader game. Also, its trained mostly on business emails, pamphlets and guidebooks so it has an inherently sanitized vibe to its responses (unless asked to use a different tone)
@Mwarrior1991
@Mwarrior1991 Месяц назад
without fail, chat gpt would repeat itself "demonic tutor is an incredibly powerful card allowing you to search your library for any card"... "its ability to fetch any card greatly increases consistency..." redundant information each time.
@solarupdraft
@solarupdraft Месяц назад
The Assassin's Trophy one was interesting, because it makes you consider who would be more likely to make that mistake in their writeup. It's also inconsistent, saying "any nonland permanent" in one line and "any permanent" in a later one. For me the Mechanized Production one came down to "which author is likely to go off on a non-magic tangent?" Also, the final sentence of the right hand text seems to contradict the entire message preceeding it, depending on how you define something "being a riot."
@thetrinketmage
@thetrinketmage Месяц назад
Yea those are the AI hallucinations which causes it to be wrong
@Xhosant
@Xhosant День назад
The ikea giveaway was that it started its (surprisingly poetic) metaphor along the lines of 'it's a doomed project', and then twists to 'and when it works it's neat'. That sudden context switch was suspicious. Speaking of context, overall ChatGPT will provide too much of it explicitly, compared to human answers using it implicitly and with less regard about you having it. From needless clarifications to tying back to the assignment's phrasing, that was a pattern for ChatGPT, feeling like a grade-school essay - answering as it expected you wanted it to answer. Contrasting, the humans would often use subtler slang or context cues.
@KirioGameNote
@KirioGameNote Месяц назад
I really need to hear more on the patron’s thoughts on giving demonic tutor a b
@thetrinketmage
@thetrinketmage Месяц назад
I made it anonymous so unless they tell me, I also won’t know more
@DMZZ_DZDM
@DMZZ_DZDM Месяц назад
I would have given it an A, but yeah, it isn't an S imo
@teeeea1741
@teeeea1741 Месяц назад
I noticed thay chatgpt has a habit of reiterating the prompt. It always talks like its checking off items on a checkbox.
@thetrinketmage
@thetrinketmage Месяц назад
I feel like that’s how a lot of AI work looks.
@cinderheart2720
@cinderheart2720 Месяц назад
I swear they didn't used to and now they always do it, in any context. Its very frustrating.
@violetto3219
@violetto3219 7 дней назад
it's got the vibe of trying to fill space in a high school writing assignment you reeeeally don't want to do
@aleksihakli1125
@aleksihakli1125 Месяц назад
"ChatGPT doesn't care about budget" You're telling me. I asked some recommendations to my food token life drain deck. It recommended such affordable cards like anointed precession (60€) doubling season (36-40ish €) teferi's protection (48€) parallel lives (33€) exquisite blood (23€) and many, many more cards well over my budget. I think the cheapest card it recommended was beast whisperer that I already HAVE IN MY DECK.
@XiaosChannel
@XiaosChannel Месяц назад
16:09 that's why you either use the API or always restart a new conversation per case
@thetrinketmage
@thetrinketmage Месяц назад
Yea I didn’t know it was gonna do that. Made for a funny bit though
@drunkcapybara7004
@drunkcapybara7004 Месяц назад
Dang, i actually got the Mechanized Production wrong as well, what threw me off was the mention of wasting 2-3 slots and getting "the combo", since there was no prior mention of what other slots are wasted for what combo, and these inconsistencies are a big problem of AI. Should have focused more on the same problem in the other text, the card being able to be "a riot" contradicting the D rating.
@thetrinketmage
@thetrinketmage Месяц назад
Yea it was such a weird response for that card
@Ent229
@Ent229 Месяц назад
Commenting before watching: I predict the LLM will have good syntax in its responses but will fail some of the semantics. Likewise I expect its fake "reasoning" to be heavily biased towards generalities and other common responses. I predict patrons of MtG to understand the semantics. I also expect those patrons to be capable of novel reasoning, but likely to give general answers. (common responses are common for a reason). As for the ranking, I would expect the LLM would have a higher mode in their answers and the Patrons answers would have a broader spread.
@thetrinketmage
@thetrinketmage Месяц назад
Novel reasoning ends up being a huge giveaway! I think you are spot on
@Ent229
@Ent229 Месяц назад
While watching (my guesses of the identities and tracking the rating scores). My guess for the AI in brackets. Actual AI in parentheses. 1 [(A)] or C. Initially guessed based on accuracy. Doubled down based on generic AI answer vs novel Patron answer. 2 [(C)] or C. Again, novel responses help guess the Patron. 3 [(A)] or S. One answer repeated itself in a redundantly redundant explanation. Huh, the AI downgraded it to nonland permanent. Was that due to generalizing answers or due to not understanding the semantics of the card? Both are factors but I wonder which had a bigger cause. 4 D or [(C)]. Initially guessed based on accuracy. Further confirmed by the novel Patron answer (silver bullet draft design). Even further confirmed by the LLM having no context for the lack of horsemanship. 5 [(S)] or B. Initially guess based on accuracy (unless the patron is trolling, or arguing that it is too powerful to fit in many commander decks without moving the deck away from the desired power level). Wow the reasoning is making me reconsider. The S ranking said "any deck within it's (Demonic Tutor's) colors (plural)". Why the implication of plural? There is also more redundancy in the S's reasoning. I am changing my mind. 6 [(A)] or B. The LLM likes listing literally the same logic repeatedly. The Patron response was more novel. 7 D or [(D)]. This one is tough. The left was more novel. Wow. I expected something like 55/45 odds there. Let's Go! 8 [(S)] or B. I initially guessed based on accuracy, but the B has the novel response, so it must be the Patron. LLM wouldn't do that. And once again the LLM uses "decks within it's colors" when talking about a mono white card. Why the plural? Also the card needs to fit within the deck's colors not the deck fit within the card's colors. 9 [(A)] or A. "Decks focused on defending against large attacks"? Also the Patron is once again the more novel answer. 10 S or [(S)]. Redundant LLM response is redundant.
@Ent229
@Ent229 Месяц назад
After the 10 scores: Patron scores: SSABBBCCDD (5 different ranks. Somewhat biased towards B but really spread out otherwise) LLM's scores: SSSAAAACCD (4 different ranks. High bias towards S or A) Since my 10/10 accuracy was based on my reasoning of the LLM's limitations, I think it is soft evidence that my predictions about its limitations might be accurate.
@Ent229
@Ent229 Месяц назад
Bonus Round? 1. [(B)] or C. The C had a novel response. Final thoughts: We already know ChatGPT does not try to evaluate cards, so it is not suited to evaluating cards. (Don't use a saw for a hammer's job). Beyond its lack of motivation to judge cards, it does not understand the card or their context enough to judge them. Additionally we see it's general answers as a clear marker of the LLM answer. It is trained to give a "reply-like" response that was a likely reply rather than a reply that was likely to be correct. Specificity and nuance are things it is trained to avoid.
@Ent229
@Ent229 Месяц назад
Your patron's evaluation seems within the norm for commander players. They can mostly evaluate cards, and there is some subjectivity that make the "surprising" evaluations still have merit.
@leax1337
@leax1337 Месяц назад
I recently build a Deck with ChatGPT aswell, the cards were so random i had to put it into a power level calculator, because i didn’t understand the deck myself, which put out a 10 for some reason. ChatGPT always tried to put Rhystic Studys in the Deck xD (It was green black)
@thetrinketmage
@thetrinketmage Месяц назад
That’s funny maybe I’ll need to try that too
@nahboh1897
@nahboh1897 Месяц назад
I agree with the demonic tutor Rating , but not its waste of space but because it is a tutor it makes the deck to consistent so the deck does the same thing every time and make it a less fun deck to play against.
@drunkcapybara7004
@drunkcapybara7004 Месяц назад
Valid point, especially in casual settings and for decks with a very clear and not super varied gameplan. For example, my Kathril deck only really wants to fill the graveyard with keywords, and i took Entomb out of it because i would always tutor up Zetalpa which made the deck play very monotonous (amplified by how terrible the precon is at filling its graveyard so i took a lot of mulligans, but Entomb of course was always keepable) and now that i'm replacing a ton of cards soon, i think i might also cut Vile Entomber and Buried Alive, and exclusively rely on what i happen to mill/sacrifice.
@FranciscoJG
@FranciscoJG Месяц назад
Oooohh, surprise Snail participation :D
@AutumnReel4444
@AutumnReel4444 Месяц назад
Yeahhh very not hard to guess. AI ain't killin us yet
@SwedeRacerDC
@SwedeRacerDC Месяц назад
Lord of Extinction: I was right from the grade alone Lightning Bolt: They had the same grade, so I guessed correct based on the description Assassin's Trophy: I wasn't sure on the grade, because I don't use it in 5C decks typically, but the description was obvious to me. Taoist Mystic: Obvious from the grading. Demonic Tutor: I honestly don't love using tutors that much, but I was wrong on this one. I think it's an A, right in the middle. Panharmonicon: I'm correct...Chat GPT is just stupid at this point. Lol Mechanized Production: Same grade, so had to guess based on description. Both descriptions were wild... But I was right. I think it's a C. It's fun and can win on the spot, especially now that we have Obeka, but even with extra turns. Smothering Tithe: I needed the description on this one, but got it right. I still think its a better grade than the human gave it. Ink Shield: I lost to this card. It's great. You will likely win if everyone else has been eliminated. I was right from the description. Tropical Island: The description helped. Right again. Forcefield: I was right and that's an interesting card. Of course it's on the reserved list. Chat GPT is fairly easy to sus out. But it's still interesting to see.
@Demoncoregobrrr
@Demoncoregobrrr Месяц назад
rad, got recommended your work early
@anabsurdlylongnameme8948
@anabsurdlylongnameme8948 Месяц назад
What version of chatgpt did yall use? 3.5 is terrible, 4 is great but behind a paywall. If yall used 4, did u put any additional reference info in?
@l1ghr
@l1ghr Месяц назад
10:40 interesting option
@v3rsatile_V3
@v3rsatile_V3 Месяц назад
tbh instead of running demonic tutor you should run it until you play it, then whatever you search for just get another version of that effect, if you search for a boardwipe, put another in the deck. simple really
@thetrinketmage
@thetrinketmage Месяц назад
I do like this idea, though the flexibility of a tutor I think makes it worth it!
@Jacob-km4yb
@Jacob-km4yb Месяц назад
What he said the flexibility to get let's say a board wipe OR a single target removal spell because you have a big board presence makes it way better imo
@v3rsatile_V3
@v3rsatile_V3 Месяц назад
@@Jacob-km4yb . . . I know
@BS-bv5sh
@BS-bv5sh Месяц назад
I enjoy your content.
@thetrinketmage
@thetrinketmage Месяц назад
I’m glad! I know this one is a bit different so I’m happy you like it
@CD-sl7ld
@CD-sl7ld Месяц назад
I love you
@epi1763
@epi1763 Месяц назад
Next time ask chat gpt to write like a normal personnor dumb it down and feed it other peoples reviews so ot wrotes on a similiar context
@robertomacetti7069
@robertomacetti7069 Месяц назад
to be fair to chat gpt it never played commander, freacking out over lord of extinction is a classic noob mistake
@orobors
@orobors Месяц назад
Personally, I think Demonic Tutor is a B or even C in most casual metas. If I were to pull out a Demonic Tutor, I'd probably get focused on because my playgroup doesn't run $50 cards unless we're proxying high power or cEDH. In a lot of games, Demonic Tutor is just too focused/good to be worth slotting in, since it gets people to target you.
@thetrinketmage
@thetrinketmage Месяц назад
Interesting I often think of it as a charm effect. I don’t know if I’ve ever been explicitly targeted because of it
@hoffedemann5370
@hoffedemann5370 Месяц назад
"highly desirable" "in its colors" "extremely valuable" "versatility" "particularly those in XYZ strategies" are dead giveaways. Also Ai do be yappin' with way too eloquent words all the time
@Raghetiel
@Raghetiel 10 дней назад
Whats funny, chat gpt learned to talk about MtG from real people chats. So if you're gonna blame anyone, blame reddit)
@ellie6091
@ellie6091 Месяц назад
boooooo. AI is dumb, and you shouldn't be feeding it more data.
@thetrinketmage
@thetrinketmage Месяц назад
Me making articles or videos feeds it data. Not me asking questions. Just asking questions isn’t really training it
@Ent229
@Ent229 Месяц назад
I would not be surprised if the questions are saved as more raw data to feed it later.
Далее
Every Deck is a 7 #mtg
15:21
Просмотров 23 тыс.
Turn MTG into Uno with this Simple Trick! #mtg
14:32
Просмотров 18 тыс.
Big Mouse 😂
00:13
Просмотров 116 тыс.
HELLUVA BOSS - THE FULL MOON  // S2: Episode 8
23:10
Просмотров 3,9 млн
100❤️
00:20
Просмотров 1,8 млн
Answering Questions | 10k Q&A Special
13:06
Просмотров 8 тыс.
Commanders that Kept Up With Power Creep
10:53
Просмотров 26 тыс.
Sol Lands in Legacy
14:44
Просмотров 466
These commanders are stuck... | Deck Driver
10:03
Просмотров 18 тыс.
Is Gonti Confused or am I Confused?
16:35
Просмотров 53 тыс.
My Playgroup's Best Deck is $20
12:32
Просмотров 308 тыс.
The State Of Magic: The Gathering Secret Lairs
34:49
Просмотров 150 тыс.
5 Commander Cards That Should See More Play
13:31
Просмотров 10 тыс.
Новый XBOX будет РЕВОЛЮЦИЕЙ!?
0:47
МЯСНОЙ ЦЕХ - Страшилки Minecraft
37:24
@Hannahxxrose Minecraft bedwars
1:01
Просмотров 3,3 млн