How Well Does Chat GPT Know Commander Cards?

Подписаться 12 тыс.

Просмотров 4,6 тыс.

50% 1

#mtg #thetrinketmage #trinketmage
Patreon:
/ thetrinketmage
Sorry about some of the breathing noises my noise gate seemed to not work when recording this. Let me know what your score was!
Channel Art by Beevuu:
Insta: / beevuu
Twitter: / beevuu
All the Music is by Chillpeach:
/ @chillpeach

Игры

Опубликовано:

31 май 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 61

@admiralatom5990 Месяц назад

The biggest take away is that CHATGPT doesn't refer to itself in its answers. Some of the people used "I" when they answered.

@thetrinketmage Месяц назад

Didn’t notice that!

@teeeea1741 Месяц назад

I also saw chatgpt using I.

@devan9197 Месяц назад

Honestly for me it was pretty obvious and that gave it away a lot

@natelagrassa9337 Месяц назад

Yeah AI don’t refer to themselves very often… in formal writing to drive a point home you don’t use the word “I.” I picked up on that too, lol.

@mr.whistler6114 Месяц назад

Remember : ChatGPT will never think outside of the box. ChatGPT is the box. Edit : In the AI response for Forcefield, ChatGPT talks about it not being ''essential for most decks withing its colors''. Forcefield is colorless, so only an AI would think of it the same way as Black/Blue/Green/Red or White because those are the six options plausible for MTG deckbuilding. Furthermore, we can deduce that ChatGPT didn't understood a ''colorless card'' as the idea of a card devoid of colors, but as a sixth color that, in the MTG rules, can be blend in every color types of decks, thus why it speaks of ''its colors'' in plural.

@Jerma985_fan Месяц назад

woah demonic tutor trip me up I'm surprised someone gave that a B.

@thetrinketmage Месяц назад

That threw me for a loop as well!

@atticussalmon9064 Месяц назад

The AI says "decks within it's colors" or some variation of that A LOT, kind of a giveaway

@trevordumais2117 Месяц назад

Bros rating Mechanized Production as a D forget that treasures exist. It all goes back to Smothering Tithe.

@thetrinketmage Месяц назад

Smothering tithe really was the hero of this story

@DMZZ_DZDM Месяц назад

ChatGPT will use "creative" language to fill up space and will always expand on surface level issues while only brushing on more nuanced details that affect the broader game. Also, its trained mostly on business emails, pamphlets and guidebooks so it has an inherently sanitized vibe to its responses (unless asked to use a different tone)

@Mwarrior1991 Месяц назад

without fail, chat gpt would repeat itself "demonic tutor is an incredibly powerful card allowing you to search your library for any card"... "its ability to fetch any card greatly increases consistency..." redundant information each time.

@solarupdraft Месяц назад

The Assassin's Trophy one was interesting, because it makes you consider who would be more likely to make that mistake in their writeup. It's also inconsistent, saying "any nonland permanent" in one line and "any permanent" in a later one. For me the Mechanized Production one came down to "which author is likely to go off on a non-magic tangent?" Also, the final sentence of the right hand text seems to contradict the entire message preceeding it, depending on how you define something "being a riot."

@thetrinketmage Месяц назад

Yea those are the AI hallucinations which causes it to be wrong

@Xhosant День назад

The ikea giveaway was that it started its (surprisingly poetic) metaphor along the lines of 'it's a doomed project', and then twists to 'and when it works it's neat'. That sudden context switch was suspicious. Speaking of context, overall ChatGPT will provide too much of it explicitly, compared to human answers using it implicitly and with less regard about you having it. From needless clarifications to tying back to the assignment's phrasing, that was a pattern for ChatGPT, feeling like a grade-school essay - answering as it expected you wanted it to answer. Contrasting, the humans would often use subtler slang or context cues.

@KirioGameNote Месяц назад

I really need to hear more on the patron’s thoughts on giving demonic tutor a b

@thetrinketmage Месяц назад

I made it anonymous so unless they tell me, I also won’t know more

@DMZZ_DZDM Месяц назад

I would have given it an A, but yeah, it isn't an S imo

@teeeea1741 Месяц назад

I noticed thay chatgpt has a habit of reiterating the prompt. It always talks like its checking off items on a checkbox.

@thetrinketmage Месяц назад

I feel like that’s how a lot of AI work looks.

@cinderheart2720 Месяц назад

I swear they didn't used to and now they always do it, in any context. Its very frustrating.

@violetto3219 7 дней назад

it's got the vibe of trying to fill space in a high school writing assignment you reeeeally don't want to do

@aleksihakli1125 Месяц назад

"ChatGPT doesn't care about budget" You're telling me. I asked some recommendations to my food token life drain deck. It recommended such affordable cards like anointed precession (60€) doubling season (36-40ish €) teferi's protection (48€) parallel lives (33€) exquisite blood (23€) and many, many more cards well over my budget. I think the cheapest card it recommended was beast whisperer that I already HAVE IN MY DECK.

@XiaosChannel Месяц назад

16:09 that's why you either use the API or always restart a new conversation per case

@thetrinketmage Месяц назад

Yea I didn’t know it was gonna do that. Made for a funny bit though

@drunkcapybara7004 Месяц назад

Dang, i actually got the Mechanized Production wrong as well, what threw me off was the mention of wasting 2-3 slots and getting "the combo", since there was no prior mention of what other slots are wasted for what combo, and these inconsistencies are a big problem of AI. Should have focused more on the same problem in the other text, the card being able to be "a riot" contradicting the D rating.

@thetrinketmage Месяц назад

Yea it was such a weird response for that card

@Ent229 Месяц назад

Commenting before watching: I predict the LLM will have good syntax in its responses but will fail some of the semantics. Likewise I expect its fake "reasoning" to be heavily biased towards generalities and other common responses. I predict patrons of MtG to understand the semantics. I also expect those patrons to be capable of novel reasoning, but likely to give general answers. (common responses are common for a reason). As for the ranking, I would expect the LLM would have a higher mode in their answers and the Patrons answers would have a broader spread.

@thetrinketmage Месяц назад

Novel reasoning ends up being a huge giveaway! I think you are spot on

@Ent229 Месяц назад

While watching (my guesses of the identities and tracking the rating scores). My guess for the AI in brackets. Actual AI in parentheses. 1 [(A)] or C. Initially guessed based on accuracy. Doubled down based on generic AI answer vs novel Patron answer. 2 [(C)] or C. Again, novel responses help guess the Patron. 3 [(A)] or S. One answer repeated itself in a redundantly redundant explanation. Huh, the AI downgraded it to nonland permanent. Was that due to generalizing answers or due to not understanding the semantics of the card? Both are factors but I wonder which had a bigger cause. 4 D or [(C)]. Initially guessed based on accuracy. Further confirmed by the novel Patron answer (silver bullet draft design). Even further confirmed by the LLM having no context for the lack of horsemanship. 5 [(S)] or B. Initially guess based on accuracy (unless the patron is trolling, or arguing that it is too powerful to fit in many commander decks without moving the deck away from the desired power level). Wow the reasoning is making me reconsider. The S ranking said "any deck within it's (Demonic Tutor's) colors (plural)". Why the implication of plural? There is also more redundancy in the S's reasoning. I am changing my mind. 6 [(A)] or B. The LLM likes listing literally the same logic repeatedly. The Patron response was more novel. 7 D or [(D)]. This one is tough. The left was more novel. Wow. I expected something like 55/45 odds there. Let's Go! 8 [(S)] or B. I initially guessed based on accuracy, but the B has the novel response, so it must be the Patron. LLM wouldn't do that. And once again the LLM uses "decks within it's colors" when talking about a mono white card. Why the plural? Also the card needs to fit within the deck's colors not the deck fit within the card's colors. 9 [(A)] or A. "Decks focused on defending against large attacks"? Also the Patron is once again the more novel answer. 10 S or [(S)]. Redundant LLM response is redundant.

@Ent229 Месяц назад

After the 10 scores: Patron scores: SSABBBCCDD (5 different ranks. Somewhat biased towards B but really spread out otherwise) LLM's scores: SSSAAAACCD (4 different ranks. High bias towards S or A) Since my 10/10 accuracy was based on my reasoning of the LLM's limitations, I think it is soft evidence that my predictions about its limitations might be accurate.

@Ent229 Месяц назад

Bonus Round? 1. [(B)] or C. The C had a novel response. Final thoughts: We already know ChatGPT does not try to evaluate cards, so it is not suited to evaluating cards. (Don't use a saw for a hammer's job). Beyond its lack of motivation to judge cards, it does not understand the card or their context enough to judge them. Additionally we see it's general answers as a clear marker of the LLM answer. It is trained to give a "reply-like" response that was a likely reply rather than a reply that was likely to be correct. Specificity and nuance are things it is trained to avoid.

@Ent229 Месяц назад

Your patron's evaluation seems within the norm for commander players. They can mostly evaluate cards, and there is some subjectivity that make the "surprising" evaluations still have merit.

@leax1337 Месяц назад

I recently build a Deck with ChatGPT aswell, the cards were so random i had to put it into a power level calculator, because i didn’t understand the deck myself, which put out a 10 for some reason. ChatGPT always tried to put Rhystic Studys in the Deck xD (It was green black)

@thetrinketmage Месяц назад

That’s funny maybe I’ll need to try that too

@nahboh1897 Месяц назад

I agree with the demonic tutor Rating , but not its waste of space but because it is a tutor it makes the deck to consistent so the deck does the same thing every time and make it a less fun deck to play against.

@drunkcapybara7004 Месяц назад

Valid point, especially in casual settings and for decks with a very clear and not super varied gameplan. For example, my Kathril deck only really wants to fill the graveyard with keywords, and i took Entomb out of it because i would always tutor up Zetalpa which made the deck play very monotonous (amplified by how terrible the precon is at filling its graveyard so i took a lot of mulligans, but Entomb of course was always keepable) and now that i'm replacing a ton of cards soon, i think i might also cut Vile Entomber and Buried Alive, and exclusively rely on what i happen to mill/sacrifice.

@FranciscoJG Месяц назад

Oooohh, surprise Snail participation :D

@AutumnReel4444 Месяц назад

Yeahhh very not hard to guess. AI ain't killin us yet

@SwedeRacerDC Месяц назад

Lord of Extinction: I was right from the grade alone Lightning Bolt: They had the same grade, so I guessed correct based on the description Assassin's Trophy: I wasn't sure on the grade, because I don't use it in 5C decks typically, but the description was obvious to me. Taoist Mystic: Obvious from the grading. Demonic Tutor: I honestly don't love using tutors that much, but I was wrong on this one. I think it's an A, right in the middle. Panharmonicon: I'm correct...Chat GPT is just stupid at this point. Lol Mechanized Production: Same grade, so had to guess based on description. Both descriptions were wild... But I was right. I think it's a C. It's fun and can win on the spot, especially now that we have Obeka, but even with extra turns. Smothering Tithe: I needed the description on this one, but got it right. I still think its a better grade than the human gave it. Ink Shield: I lost to this card. It's great. You will likely win if everyone else has been eliminated. I was right from the description. Tropical Island: The description helped. Right again. Forcefield: I was right and that's an interesting card. Of course it's on the reserved list. Chat GPT is fairly easy to sus out. But it's still interesting to see.

@Demoncoregobrrr Месяц назад

rad, got recommended your work early

@anabsurdlylongnameme8948 Месяц назад

What version of chatgpt did yall use? 3.5 is terrible, 4 is great but behind a paywall. If yall used 4, did u put any additional reference info in?

@l1ghr Месяц назад

10:40 interesting option

@v3rsatile_V3 Месяц назад

tbh instead of running demonic tutor you should run it until you play it, then whatever you search for just get another version of that effect, if you search for a boardwipe, put another in the deck. simple really

@thetrinketmage Месяц назад

I do like this idea, though the flexibility of a tutor I think makes it worth it!

@Jacob-km4yb Месяц назад

What he said the flexibility to get let's say a board wipe OR a single target removal spell because you have a big board presence makes it way better imo

@v3rsatile_V3 Месяц назад

@@Jacob-km4yb . . . I know

@BS-bv5sh Месяц назад

I enjoy your content.

@thetrinketmage Месяц назад

I’m glad! I know this one is a bit different so I’m happy you like it

@CD-sl7ld Месяц назад

I love you

@epi1763 Месяц назад

Next time ask chat gpt to write like a normal personnor dumb it down and feed it other peoples reviews so ot wrotes on a similiar context

@robertomacetti7069 Месяц назад

to be fair to chat gpt it never played commander, freacking out over lord of extinction is a classic noob mistake

@orobors Месяц назад

Personally, I think Demonic Tutor is a B or even C in most casual metas. If I were to pull out a Demonic Tutor, I'd probably get focused on because my playgroup doesn't run $50 cards unless we're proxying high power or cEDH. In a lot of games, Demonic Tutor is just too focused/good to be worth slotting in, since it gets people to target you.

@thetrinketmage Месяц назад

Interesting I often think of it as a charm effect. I don’t know if I’ve ever been explicitly targeted because of it

@hoffedemann5370 Месяц назад

"highly desirable" "in its colors" "extremely valuable" "versatility" "particularly those in XYZ strategies" are dead giveaways. Also Ai do be yappin' with way too eloquent words all the time