Softmax Regression (C2W3L08)

Подписаться 325 тыс.

Просмотров 160 тыс.

50% 1

Take the Deep Learning Specialization: bit.ly/2xdG0Et
Check out all our courses: www.deeplearning.ai
Subscribe to The Batch, our weekly newsletter: www.deeplearning.ai/thebatch
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai

Опубликовано:

24 авг 2017

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 48

@HZLTV Год назад

I don't think I'll ever understand the maths behind this properly, but the fact that I even understood it *sort of* just proves how good he is at teaching... the visual example given helped a tonne

@mariusz2313 6 лет назад

"If you can't explain it simply, you don't understand it well enough. " Albert Einstein You definitely know the topic perfectly well! Thanks!

@Jonn123save 6 лет назад

so true

@roboticsresources9680 6 лет назад

Actually , it was Richard Feynman who said it

@bornslippy9109 6 лет назад

no einstein said it, feynman applied it perfectly

@est9949 4 года назад

This is a common logical fallacy: A implies B is not equivalent to B implies A.

@Jabrils 6 лет назад

thank you master andrew, this was super to the point.

@roniquinonez9715 6 лет назад

Spotted a wild Jabril in his natural environment! *Attempts to say Hello*

@mortezaabdipour5584 6 лет назад

Thank you, M.r Andrew, for always sharing your knowledge

@jjjj_111 6 лет назад

Fantastic, I love how easy it was to understand the material that was presented. If you have a donation page, please let me know!

@manuel783 3 года назад

Clarifications about Softmax Regression Please note that at 4:30, the text for the softmax formulas mixes subscripts "j" and "i", when the subscript should just be the same (just "i") throughout the formula.

@Jack-dx7qb 6 лет назад

So clear!

@mariabardas2568 4 года назад

Great lesson !!!! Very useful.

@ziku8910 2 года назад

That was very helpful, thank you!

@camilaonofri2624 4 года назад

If you use the sigmoid fx for a multiclass problem you have to make a decision boundary for each class against the others (1-vs-all algo), and then you get independent probabilities. How were the decision boundaries for the examples 10:33 calculated considering you had no hidden layers? (how were the line equations found?)

@scl144 4 года назад

번역해주신 분 누구신지 몰라서 동서남북으로 절했습니다. 너무 감사합니다.. 덕분에 훌륭한 강의 보고 갑니다

@AdarshMahabubnagar 2 года назад

Is it possible to show the softmax activation function graphically? If so, please provide

@84xyzabc 3 года назад

I think, in the denominator, its t_j at 4:58

@leagueofotters2774 2 года назад

Soft and soothing....kind of like the Bob Ross of machine learning.

@wolfisraging 6 лет назад

U r best

@fjficm 2 года назад

with "t" when you say normailsed, the probabilities would be different, wouldnt it? You would have to use 1 / sqr (t.t) in front of the 4x1 matrix and convert it into a unit matrix. Then use the dot products of the element to work out the probablities of each which will still work out to 1. Or is this wrong

@benw4361 4 года назад

It seems like the largest number is still selected as the predicted solution i.e. 5, so I'm confused by what the purpose is of softmax when you could select the largest value instead? Wouldn't that effectively translate to the class with the largest probability anyway?

@michael3698bear 3 года назад

Late reply but for anyone wonder. Yes you are correct, for prediction purposes it makes no difference (though still maybe you would like to see the "probabilities" generated), but you are correct that just the the max will be chosen as the "predicted solution". This however is not true for training. When training you need to be able to measure "how wrong" you were. This is where the softmax function comes in to give probabilities, which you can calculate loss from, and also calculate a derivative to update the weights.

@stefandimeski8569 6 лет назад

How were the decision boundaries for the examples 10:33 calculated? (how were the line equations found?)

@grez911 6 лет назад

Make a grid, let's say 100 by 100 points, and in each point calculate the activation. It's actually how he do this in programming exercises.

@stefandimeski8569 6 лет назад

Thanks man! Then I suppose it's impossible to get a closed-form equation for the decision boundary? Can you please provide link to the video where he uses the method you described here?

@camilaonofri2624 4 года назад

please let me know if you got it

@yilmazbingol6953 5 лет назад

i think at time 4.30,sum indices should be j=0 to j=3. if i m wrong, pls correct me.

@mathias8137 5 лет назад

This makes definitely more sense to me. The same applies for the sum written in 5:00

@wifi-YT 3 года назад

At 9:32 forward, what’s the reason the decision boundaries are all linear, when the softmax function is NOT itself a linear function? The softmax function uses, after all, an e to the z function in its numerator, which function is certainly not linear! So, similarly, why is it that Andrew says at 11:16 that only when you add hidden layers do you end up with non-linear decision boundaries?

@usf5914 3 года назад

4:03 (4, 1) or (1, 4)?

@boratsagdiev6486 5 лет назад

It should be tj not ti at 4:33 right?

@mathiasgustum858 4 года назад

yes :)

@lucylu2530 4 года назад

I believe that under the sigma function it should be i=4

@derrik-bosse 6 лет назад

Where do the numbers in the Z vector come from?

@kunhongyu5053 6 лет назад

just assumption

@derrik-bosse 6 лет назад

Kunhong YU sure, but i mean intuitively, what do they represent?

@kunhongyu5053 6 лет назад

Similar to simple Logistic Regression, Softmax just adds more output units rather than one. For logistic regression, output unit is just a 1 dimensional vector computing input X's linear "score", if it's larger than zero, then its label is 1 and vice versa. Softmax is like training multiple binary classifiers simultaneously, for a sample, each element in Z is also a "score", largest score denotes that sample may have corresponding label.

@DouglasDuhaime 6 лет назад

@derrikbosse, the Z vector identified here has 3 arguments: W{L}, A{L-1}, and B{L}. W{L} is the vector of weights within the last layer of the network. A{L-1} is the vector of outputs from the penultimate layer of the network. B{L} is the bias vector from the last layer of the network. If none of this makes any sense, check out Professor Ng's earlier discussion of logistic regression, which is the simplest kind of neural network. That helped me make sense of this presentation: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-hjrYrynGWGA.html

@gavin8535 4 года назад

What is W^L at 3:56 ?

@aayushpaudel2379 3 года назад

Weight matrix for layer L , i.e the last layer.

@marcostavarez2702 5 лет назад

could someone please explain the roles of the 'blocks-of-color' and the 'colored-in-circles' represent ?

@IrfanAhmad-od2sn 5 лет назад

colored-in-circles are actual training data/values.After training model on training data,model predicted all decision boundaries.so color of blocks are blocks predicted by model.