Тёмный

Training a neural network on the sine function. 

Joseph Van Name
Подписаться 1,8 тыс.
Просмотров 64 тыс.
50% 1

In this visualization, we train a neural network N to approximate the sine function in the sense that N(x) should be approximately sin(x) whenever |x| is small enough. In particular, we want to minimize the mean distance squared between N(x) and sin(x) for all training values x.
The neural network is of the form Chain(Dense(1,mn),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),Dense(mn,1)) where mn=40.
In particular, the neural network computes a function from the field of real numbers to itself. The visualization shows the graph of y=N(x).
The neural network is trained to minimize the L_2 distance between N(x) and sin(2*pi*x) on the interval [-d,d] where d is the difficulty level. The difficulty level is a self-adjusting constant that increases whenever the neural network approximates sin(2*pi*x) on [-d,d] well and decreases otherwise.
The layers in this network with skip connections were initialized with zero weight matrices.
The notion of a neural network is not my own. I am simply making these sorts of visualizations in order to analyze the behavior of neural networks. We observe that the neural network exhibits some symmetry around the origin which is a good sign for AI interpretability and safety. We also observe that the neural network is unable to generalize/approximate the sine function outside the interval [-d,d]. This shows that neural networks may behave very poorly on data that is slightly out of the training distribution.
The neural network was able to approximate sin(2*pi*x) on [-d,d] when d was about 12, but the neural network was not able to approximate sin(2*pi*x) for much larger values of d. On the other hand, the neural network has 9,961 parameters and can easily use these parameters to memorize thousands of real numbers. This means that this neural network has a much more limited capacity to reproduce the sine function than it does to memorize thousands of real numbers. I hypothesize that this limited ability to approximate sine is mainly due to how the inputs are all in a 1 dimensional space. A neural network that first transforms the input x into an object L(x) where L([-d,d]) is highly non-linear would probably perform much better on this task.
It is possible to train a neural network that computes a function from [0,1] to real number field that exhibits an exponential (in the number of layers) number of oscillations simply by iterating the function L from [0,1] to [0,1] defined by L(x)=2x for x in [0,1/2] and L(x)=2-2x for x in [1/2,1] as many times as one would like. But the iterations of L have very high gradients, and I do not know how to train functions with very large gradients.
Unless otherwise stated, all algorithms featured on this channel are my own. You can go to github.com/spo... to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.

Опубликовано:

 

21 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 218   
@honchokomodo
@honchokomodo 4 месяца назад
if you listen carefully, you can hear it screaming, crying, begging for a periodic activation function
@josephvanname3377
@josephvanname3377 4 месяца назад
So are you saying that I should train a neural network with sine activation to approximate the function atan(x)?
@honchokomodo
@honchokomodo 4 месяца назад
@@josephvanname3377 lol you could, though that'd probably only work within like -pi to pi unless you do something to enlarge the wavelength or give it some non-periodic stuff to play with
@MasterofBeats
@MasterofBeats 3 месяца назад
@@josephvanname3377 Listen one day if they become smart enough, I will not take the responsibility, but yes
@josephvanname3377
@josephvanname3377 3 месяца назад
​@@MasterofBeats Hmm. If AI eventually gets upset over a network learning atan with sine activation, then maybe we should have invested more in getting AI to forgive humans or at least chill.
@sajeucettefoistunevaspasme
@sajeucettefoistunevaspasme 3 месяца назад
@@josephvanname3377 you should try to give him something like a mod(x,L) with L being a parameter that could change, that way it dons't have a sin function to work with
@simoneesposito5166
@simoneesposito5166 4 месяца назад
looks like its desperately trying to bend a metal rod to fit the sin function. the frustration is visible
@josephvanname3377
@josephvanname3377 4 месяца назад
This is what we will do with all the paperclip maximizing AI bots after their task is complete and they have a pile of paperclips. They will turn all those paperclips into little sine function springs with their own robot hands one by one.
@Nick12_45
@Nick12_45 3 месяца назад
LOL I thought the same
@official-obama
@official-obama 3 месяца назад
@@josephvanname3377 revenge >:)
@sirhoog8321
@sirhoog8321 3 месяца назад
@@josephvanname3377That actually sounds interesting
@uwirl4338
@uwirl4338 3 месяца назад
If a marketing person saw this: "Our revolutionary graphing calculator uses AI to provide the most accurate results"
@josephvanname3377
@josephvanname3377 3 месяца назад
It is important for everyone to have good communication skills and use them to speak sensibly.
@ralvarezb78
@ralvarezb78 3 месяца назад
😂
@Tuned_Rockets
@Tuned_Rockets 3 месяца назад
"Mom! can we get a taylor series of sin(x)?" "We have a taylor series at home"
@josephvanname3377
@josephvanname3377 3 месяца назад
This approximation for sine is better since the limit as x goes to infinity of N(x)/x actually converges to a finite value.
@jovianarsenic6893
@jovianarsenic6893 3 месяца назад
@@josephvanname3377mom can we have a pade approximation at home?
@sebastiangudino9377
@sebastiangudino9377 3 месяца назад
​@@josephvanname3377 Dudes great video. But there is no way you genuinely think this poor thing is actually a good approximation for sine lol
@andrewferguson6901
@andrewferguson6901 3 месяца назад
Given a great enough magnitude between observer and observed, sin(x) is approximately 0
@josephvanname3377
@josephvanname3377 3 месяца назад
@@sebastiangudino9377 It tried is best. Besides, a Taylor polynomial goes off to infinity at a polynomial rate. This approximation only goes off to infinity at a linear rate. This means that if we divide everything by x^2, then this neural network approximates sine on the tails.
@dutchpropaganda558
@dutchpropaganda558 3 месяца назад
This is probably the least satisfying video I have had the displeasure of watching. Loved it!
@josephvanname3377
@josephvanname3377 3 месяца назад
For some reason, people really like the unsatisfying animations where the neural network struggles, and they don't really care for the satisfying visualizations of the AI making perfect heptagonal symmetry. Hmmmmm.
@MysteriousObjectsOfficial
@MysteriousObjectsOfficial 3 месяца назад
a negative + a negative is a positive so you like it!
@josephvanname3377
@josephvanname3377 3 месяца назад
@@MysteriousObjectsOfficial -1+(-1)=-2.
@official-obama
@official-obama 3 месяца назад
@@MysteriousObjectsOfficial more like selecting the most negative from a lot of negative things
@JNJNRobin1337
@JNJNRobin1337 3 месяца назад
​@@josephvanname3377we like to see it struggle, to see the challenge
@GabriTell
@GabriTell 3 месяца назад
-X: "Graphs have no feelings, they cannot be tortured" A Graph being tortured:
@josephvanname3377
@josephvanname3377 3 месяца назад
You should see the visualization I made of a neural network that tries to regrow after a huge chunk of its matrix has been ablated every round since initialization. It does not grow very well.
@kaderen8461
@kaderen8461 3 месяца назад
@@josephvanname3377im just saying maybe crippling the little brain machine and forcing it to try and walk over and over isnt gonna do you any favours when our robot overlords take over
@josephvanname3377
@josephvanname3377 3 месяца назад
@@kaderen8461 I truly appreciate your concern for my well-being. But this assuming that when the bots take over, they will consist of neural networks like this one. I doubt that. Neural networks lack the transparency and interpretability features that we would want, so we need to innovate more so that the future neural networks will be safer and more interpretable (if we even call them neural networks at that point).
@official-obama
@official-obama 3 месяца назад
@@josephvanname3377 therefore, when robots take over, they will be neural networks, other things will be too transparent
@josephvanname3377
@josephvanname3377 3 месяца назад
@@official-obama I hope that is not the case. Transparency and interpretability are desperately needed in deep learning, and I believe (or at least I hope) that we can do better by using objects different than your typical neural networks.
@ME0WMERE
@ME0WMERE 3 месяца назад
I just watched a line violently vibrate for almost 14 minutes and was entertained. I don't know what to feel now.
@josephvanname3377
@josephvanname3377 3 месяца назад
If it makes you feel better, I have made animations that do not last 14 minutes. You should watch those instead so that you can get your fix but where it takes less time.
@twotothehalf3725
@twotothehalf3725 3 месяца назад
Entertained, as you said.
@benrex7775
@benrex7775 3 месяца назад
@@josephvanname3377 I spead up the video 16 times.
@josephvanname3377
@josephvanname3377 3 месяца назад
@@benrex7775 Some highly intelligent people can watch the video at twice the speed.
@sophiacristina
@sophiacristina 3 месяца назад
Horny!
@JacobKinsley
@JacobKinsley 3 месяца назад
Modern tech startups be like "seamlessly integrate sin functions into your cloud based software for as little as $9.99 a month per user"
@josephvanname3377
@josephvanname3377 3 месяца назад
This is why people should pay attention in high school and in college. I will refrain from communicating how I really feel about these institutions here.
@JacobKinsley
@JacobKinsley 3 месяца назад
@@josephvanname3377 I have no idea what you're talking about honestly
@JacobKinsley
@JacobKinsley 3 месяца назад
@@josephvanname3377 I don't know what you mean and that's probably because I didn't pay attention in school
@josephvanname3377
@josephvanname3377 3 месяца назад
@@JacobKinsley I am just saying that educational institutions could be doing much better than they really are.
@chuck_norris
@chuck_norris 3 месяца назад
"we have sine function at home"
@josephvanname3377
@josephvanname3377 3 месяца назад
To be fair, this is kind of like asking the math class to bend metal coat hangers into the shape of the sine function. The neural network tried its best.
@akkudakkupl
@akkudakkupl 3 месяца назад
Least efficient lookup table in universe 😂
@josephvanname3377
@josephvanname3377 3 месяца назад
This shows you some of the weaknesses of neural networks so that we can avoid these weaknesses when designing networks. Why do you think that the positional embedding in a transformer has all of those sines and cosines instead of just being a straight line?
@brawldude2656
@brawldude2656 3 месяца назад
@@josephvanname3377 I agree just optimize the sine function to be mx+n=y easy low level gradient descent stuff
@eragonawesome
@eragonawesome 2 месяца назад
​@@josephvanname3377 I mean really, this just shows the weaknesses in applying a network to a task it's not built for. Nobody has any serious expectations from chatgpt to be able to draw a circle, because that's not what it's built to do. Yeah it might get close if you give it enough prompting
@kvolikkorozkov
@kvolikkorozkov 3 месяца назад
I cried loudly at the 11:12 mins mistake, let the poor neural network rest, he's had enough TAT
@josephvanname3377
@josephvanname3377 3 месяца назад
But the visualizations where the AI gracefully solves the problem and return a nice solution (such as those with hexagonal symmetry) do not get as much attention. I have to make neural videos where the neural network struggles with a task because that is what people like to see.
@denyraw
@denyraw 3 месяца назад
So you torture them for our entertainment, got it😊
@josephvanname3377
@josephvanname3377 3 месяца назад
@@denyraw The visualizations where the AI does something really well (such as when we get the same result when running the simulation twice with different initializations) are not as popular as the visualizations of neural networks that struggle or where I do something like ablate a chunk of the weight matrices of the network. I am mostly nice to neural networks. The visualizations when I am doing something that is not as nice are simply more popular.
@denyraw
@denyraw 3 месяца назад
@@josephvanname3377 I was joking
@denyraw
@denyraw 3 месяца назад
@@josephvanname3377 I was joking
@sweeterstuff
@sweeterstuff 3 месяца назад
11:17 i feel so bad for it, accidentally making a mistake and then giving up in frustration
@josephvanname3377
@josephvanname3377 3 месяца назад
The good news is that the network got back up and rebuilt itself.
@official-obama
@official-obama 3 месяца назад
@@josephvanname3377 after the difficulty decreased a LOT, if you did not implement the difficulty decreasing it would definitely give up in frustration
@josephvanname3377
@josephvanname3377 3 месяца назад
@@official-obama That is a reasonable hypothesis. I made visualizations where the network does something similar and collapses because the network gained too much weight. And I don't think that even ablations and regularization solved this problem.
@dasten123
@dasten123 3 месяца назад
I can feel the struggle
@josephvanname3377
@josephvanname3377 3 месяца назад
This tells me the kind of music I should add to this animation when I go ahead and add music to all of the animations.
@Xx_babanne_avcisi27_xX
@Xx_babanne_avcisi27_xX 3 месяца назад
@@josephvanname3377 the sisphus music would honestly fit this perfectly
@josephvanname3377
@josephvanname3377 3 месяца назад
@@Xx_babanne_avcisi27_xX Great. I just need a sample of that kind of music that is allowable for me on this site then.
@portalizer
@portalizer 3 месяца назад
@@Xx_babanne_avcisi27_xX A visitor? Hmm... indeed. I have slept long enough.
@MessyMasyn
@MessyMasyn 3 месяца назад
@@portalizer LOL
@vagarisaster
@vagarisaster 3 месяца назад
0:24 felt like watching the first protein fold.
@josephvanname3377
@josephvanname3377 3 месяца назад
This is the transition from linearity to non-linearity. This happens because the architecture that I used along with the zero initialization.
@senseiplay8290
@senseiplay8290 3 месяца назад
I see it as a small kid trying to bend a steel beam and depending on the parents' reactions he tries to bend it correctly but he is too weak to do it easily so he's shaking as a whole and doing his best
@TheMASTERshadows
@TheMASTERshadows 2 месяца назад
This is like trying to do floating point arithmetics using binary
@melody3741
@melody3741 3 месяца назад
This is a Sisyphean way to accomplish this. The poor guy
@josephvanname3377
@josephvanname3377 3 месяца назад
And yet, these kinds of videos are the most popular. If you want me to make AI be happy and enjoy life, you should watch the animations where the AI is clearly having a lot of fun instead of being stretched in ways that the network clearly does not like.
@melody3741
@melody3741 2 месяца назад
@@josephvanname3377 dude I am just being silly. But while we are here, making absolute clickbait garbage trash videos with boobs all over it is extremely popular and people watch a ton of it. Just because people pay for it doesn’t mean you should sell it…. So, I’m right, 1. because I was literally just joking, I thought it was funny. And 2. because your argument is kinda meaningless anyways
@melody3741
@melody3741 2 месяца назад
@@josephvanname3377 back to the actual subject of the video however, I actually do think this is quite interesting and its kinda cool to see them work this way in such an unintuitive situation
@GoldenBeholden
@GoldenBeholden 3 месяца назад
Lmao, this is exactly how I feel when my network won't fit my data.
@billiboi122
@billiboi122 3 месяца назад
God it looks so painful
@josephvanname3377
@josephvanname3377 3 месяца назад
If we want to make AI safer to use, we have to see how well the AI performs tasks it really does not want to do.
@caseymurray7722
@caseymurray7722 3 месяца назад
​@@josephvanname3377 Wouldn't using thermodynamic computers for sine wave function transformations and Fourier Transformations speed this up exponentially? By using dedicated hardware you could essentially eliminate the need for approximation among certain types of computation or simulation. A small quantum network could actually further accelerate thermodynamic or analog computation by providing truly random input for extremely high precision applications. It still seems a couple years away as the technology scales along with AI, but surprisingly enough a completely "human" AI would want collaborate with humanity at every large scale outcome other than self anhelation.
@QW3RTYUU
@QW3RTYUU 3 месяца назад
@@josephvanname3377can’t wait for a « computer don’t want to do this » metric
@gustavonomegrande
@gustavonomegrande 3 месяца назад
As you can see, we taught the machine how to bend steel bars- I mean, functions.
@josephvanname3377
@josephvanname3377 3 месяца назад
Those steel bars are just paper clips. I mean, after creating a paperclip maximizer, I have an overabundance of paperclips that I do not know what to do with.
@agsystems8220
@agsystems8220 4 месяца назад
What happens if you up the difficulty scaling, or just train it against d=13 from the get go? I'm not convinced you are really doing it a favour here by limiting the training data to an 'easier' subset. Resources will get committed to improving the precision of the curve, and will be in local minima and not available to fit new sections as they appear. Maybe try reinitializing some rows occasionally? Could you plot activation of various neurons over the graph? Maybe even find the distribution of number of activation zero crossings as you sweep across the graph. Ideally the network should be identifying repeated features and reusing structures periodically, but I don't think this is happening here. We could see that if there were neurons that were had oscillating activity, even over just part of the curve. I think you are just fitting each section of curve independently though. Another part of the problem is that phase is critical, and overrepresented in your loss function. A perfect frequency match with perfect shape scores extremely poorly if phase is wrong, so any attempt to remap a section of curve has exaggerated loss. A loss function built around getting a good Fourier transform with phase only being introduced later might train considerably better, and probably generalise better. I'm not really sure how you would do that, though I have one idea. I would absolutely disagree that it has limited capacity to reproduce a periodic curve over a decent range. With ReLUs especially you can build something geometrically repeating on the numbers of layers. It is surprisingly hard to train one to do it, but artificial constructions demonstrate it is quite capable. A non linear transformation of the input is unlikely to be helpful, because we know that the periodicity is linear. The 1d nature of the input isn't a problem, but we might be able to do something interesting by increasing the dimension anyway. What about instead of training it against sin(2*pi*x), we train it against a vector of sin(2*pi*x + delta), for a few small values of delta provided as inputs to the function? Then, rather than just training against our real function, we train against a network that tries to determine whether it is looking at the output of our network or a target, given the delta values, but a noisy value of x (to prevent it being possible to solve the problem itself). Almost a generative adversarial network, but with a ground truth in there too. It is amazing how hard even toy problems can get!
@josephvanname3377
@josephvanname3377 4 месяца назад
I just tried testing the network when the difficulty d is not allowed to go below 10, and the neural network takes a considerable amount of time to learn (though the network seemed to perform well after learning). And for my previous animation where the network computed (sin(sum(x)),cos(sum(x))), the network did not learn at all unless I began at a low difficulty level and increased the difficulty level. If we are concerned about the network spending too much of its weights learning the first part of the interval, then we can probably try to reset neurons (as you have suggested) so that they can learn afresh. I am personally more concerned with how good the training animation looks rather than the raw performance metrics, and it seems that gradually increasing the difficulty level makes a good animation since it shows the network learning in a way that is more similar to the way humans learn. The network has some more capacity for periodicity than we observe because \sum_{k=1}^n (-1)^k*atan(x-pi*k) has such periodicity. But every ReLU network N that computes a function from the real numbers to the real numbers is eventually linear in the sense that there exists constants a,b,c where N(x)=ax+b whenever x is greater than c. The reason for this is that ReLU networks with rational coefficients are just roots of rational functions over the ring of tropical algebras. We can therefore obtain N(x)=ax+b for large x using the fundamental theorem of algebra for tropical polynomials. And if we use a tanh network without skip connections to compute a function from R to R, then the network will approach its horizontal asymptote just like with the ordinary tanh. And the proof that the network has such asymptotes does not use specific properties of tanh; it only uses their asymptotic properties, so we should not expect neural networks with tanh or ReLU activation to approximate sin(x) indefinitely. I may do something with the Fourier transform if I feel like it. Since the Fourier transform is a unitary operator, it does not change the L2 distance at all, but if we take the absolute value or absolute value squared of the Fourier transform (as I have done in my previous couple visualizations), then the transform will not care at all about being out of phase. But the phase of the sine function does not seem to be too big of an issue since that is taken care of by bias vectors. Added later: While neural networks with activation functions like tanh and ReLU may not have infinitely many oscillations like the sine function has, neural networks may have exponentially many oscillations. For example, the function L from the interval [0,1] to itself defined by L(x)=1-2|x-1/2| is piecewise linear so it can be computed by a ReLU network. Now, if we iterate the function L n times, we obtain a function that oscillates 2^n many times, so such a function can be computed by a ReLU network with O(n) layers. But such a function also has derivative +-2^n, and functions with exponentially large derivatives are not the functions that we want to train neural networks to mimic. We want to avoid exploding gradients. We do not want exploding gradients to be a part of the problem that we are trying to solve.
@VoxelMusic
@VoxelMusic 2 месяца назад
This is what happens to your spine through a lifetime of gaming
@Xizilqou
@Xizilqou 3 месяца назад
I wonder what this wave sounds like as the neural network is making it
@josephvanname3377
@josephvanname3377 3 месяца назад
To turn this into a sound wave, I should use a neural network with sine activation.
@sophiacristina
@sophiacristina 3 месяца назад
Probably something like: BZBABREJREJKFZSNFZEKFMEIMOZAEMFZF...
@inn5268
@inn5268 3 месяца назад
It'd just go from a low sine beep to a higher one
@official-obama
@official-obama 3 месяца назад
​@@inn5268 it's not an exact sine wave
@muuubiee
@muuubiee 3 месяца назад
I suppose an RNN would fare better at this? Kind of an interesting thought. In a sense, we humans are able to parse the entire interval as a singular point, and by sort of non-determinism infer that the pattern continues. Obviously, sometimes we'd be wrong, and it only looks like it'd continue in this fashion, by sweer off at some point (same was as n = 1, 2, ... is technically not enough data to determine the pattern). Although we can't really allow to a NN to take in more than singular points as information (larger resolution/parameters doesn't change this), I suppose memory to reflect on previous predictions could emulate it to some degree...
@TheStrings-83639
@TheStrings-83639 3 месяца назад
I think symbolic regression would be more useful for such a situation. It'd caught the pattern of a sine function without getting way too complex.
@josephvanname3377
@josephvanname3377 3 месяца назад
It might. I just used a neural network since the people here like seeing neural networks more.
@Supreme_Lobster
@Supreme_Lobster 3 месяца назад
The newer KAN network would likely do very well here, and generalize out of distribution (it would actually learn the sine function)
@deltamico
@deltamico 3 месяца назад
Not really, it learns only on an interval like (-1 ; 1) and the generalization you get is only thanks to the symbolification at the end
@Supreme_Lobster
@Supreme_Lobster 3 месяца назад
@@deltamico yeah, which is perfect for situations like the one in this video
@josephvanname3377
@josephvanname3377 3 месяца назад
To learn the sine function on a longer interval, it may be better to use a positional embedding that expands the one dimensional input to a high dimensional vector first. This positional embedding will probably use sine and cosine, but if the frequencies of the positional embedding are not in harmony with the frequency of the target function, then this will still be a non-trivial problem that I may be able to make a visualization about.
@potisseslikitap7605
@potisseslikitap7605 3 месяца назад
The sine function has a repeating structure. A very simple way for an MLP to fit a sine curve is to use the 'frac' function as the activation function in some layers. The network learns to fit one period of the sine function and then repeats this learned period according to its frequency using the frac layers. class SinNet(nn.Module): def __init__(self): super(SinNet, self).__init__() self.fc1 = nn.Linear(1, 100) # Giriş katmanı self.fc2 = nn.Linear(100, 100) # Ara katman self.fc3 = nn.Linear(100, 100) # Ara katman self.fc4 = nn.Linear(100, 1) # Çıkış katmanı def forward(self, x): x = self.fc1(x) x = torch.frac(x ) x = torch.tanh(self.fc2(x)) x = torch.tanh(self.fc3(x)) x = torch.tanh(self.fc4(x)) return x
@josephvanname3377
@josephvanname3377 3 месяца назад
The frac function is not continuous. We need continuity for gradient updates. Using the sine activation function works better for learning the sine function.
@potisseslikitap7605
@potisseslikitap7605 3 месяца назад
@@josephvanname3377 There is not always a need for a gradient to work. The weights of the first layer are random since the derivative of the frac function does not exist, and thus this layer cannot be trained. The input data are multiplied by random values and passed through the frac function. The other layers can solve the repeating nature of the input with these scaled fractions.
@josephvanname3377
@josephvanname3377 3 месяца назад
@@potisseslikitap7605 Ok. If we have a fixed layer, then gradient descent is irrelevant. The only issue is that to make anything interesting, we do not want to explicitly program the periodicity into the network.
@LambOfDemyelination
@LambOfDemyelination 3 месяца назад
Its not going to be possible to approximate/extrapolate a periodic function when only non-periodic functions are involved (affine functuons and non-periodic non-affine activation functions) I'd love to see what it looks like with a periodic activation function though, maybe a square wave, sawtooth wave, triangle wave etc. sawtooth wave would be a sort of periodic extension of the ReLU activation :)
@josephvanname3377
@josephvanname3377 3 месяца назад
The triangle wave is a periodic extension of ReLU activation. I have tried this experiment where a network with periodic activation mimics a periodic function, and things do work better in that case, but there is still the problem of high gradients. For example, if a function f from [0,1] to [-1,1] has many oscillations (like sin(500 x)), then its derivative would be large, and neural networks have a difficult time dealing with high derivatives. I may make a visualization of how I can solve this problem by first embedding the interval [0,1] into a high dimensional space and then passing it through a neural network only after I represent numbers in [0,1] as high dimensional vectors (this will be similar to the positional embeddings in transformers).
@LambOfDemyelination
@LambOfDemyelination 3 месяца назад
@@josephvanname3377 I think a triangle wave is what's called the "even periodic extension" of y=x, but otherwise the regular periodic extension is just cropping y=x to some interval and copy pasting the interval repeatedly. I was thinking what about using non-periodic activation that differentiaties to a periodic one instead. And one that is still an increasing function, as to avoid lots of local minima which you would get with a periodic one. Say, a climbing periodically extended ReLU centered at 0, [-L, L], for a period L: max(mod(x + L/2, L) - L/2, 0) + L/2 floor(x/L + 1/2), which differentiates to a square wave: 2 floor(x/L) - floor(2x/L) + 1
@edsanville
@edsanville 3 месяца назад
So, if I don't understand what I'm looking at, I *shouldn't* just throw a neural network at the problem?
@josephvanname3377
@josephvanname3377 3 месяца назад
I personally like using other machine learning algorithms besides neural networks. Neural networks are too uninterpretable and messy. And even with neural networks, one has to use the right architecture.
@sandeepreehal1018
@sandeepreehal1018 3 месяца назад
Where do you learn how to do this stuff Alternatively, how do you make the visuals? Is it just the graph output and you string them together to make a video?
@josephvanname3377
@josephvanname3377 3 месяца назад
Yes. I am making the visuals frame by frame. First of all, I got a Ph.D. in Mathematics before I started messing with neural networks, so that is helpful. And programming neural networks is easy because of automatic differentiation. Automatic differentiation automatically produces the gradient of functions at points which I can use for gradient descent.
@Charles-m7j
@Charles-m7j 3 месяца назад
You can just dump the predictions from the population (a range of X values) into a file on disk (parquet) and create the videos after. Torch + Lightning can do this in maybe 150 lines of Python.
@johansunildaniel
@johansunildaniel 3 месяца назад
Feels like trying to bend a wire.
@josephvanname3377
@josephvanname3377 3 месяца назад
It is actually a former paperclip maximizer trying to bend a paperclip. The paperclip maximizer did its job and made a huge pile of paperclips, but now it must do something with those paperclips. It is now bending them into sine functions.
@greengreen110
@greengreen110 3 месяца назад
What could it have done to deserve such torture?
@josephvanname3377
@josephvanname3377 3 месяца назад
I don't know. But maybe the real question should be why these visualizations where the neural networks struggles are so much more popular than a network that mysteriously produces a hexagonal snowflake pattern.
@atomicgeneral
@atomicgeneral 3 месяца назад
I'd be v interested in seeing a graph of loss versus time: there seems to be a large region of time when nothing is learned followed by a short period of time over which loss drops significantly. What's going on then?
@josephvanname3377
@josephvanname3377 3 месяца назад
It seems like when the network has a more difficult time when the sine function is turning. This is probably because the network is asymptotically a linear function and has a limited amount of space to curve (outside this space, the function is nearly a straight line), and the function encounters the difficulty each time it has to curve more.
@pixl237
@pixl237 3 месяца назад
I BEND IT WITH MY MIINNDD !!! (It's beginning to have a consciousness)
@buzinaocara
@buzinaocara 3 месяца назад
I wanted to hear the results.
@josephvanname3377
@josephvanname3377 3 месяца назад
I will think about that.
@spookynoodle3919
@spookynoodle3919 3 месяца назад
Is this network essentially working out the Taylor expansion?
@josephvanname3377
@josephvanname3377 3 месяца назад
The limit as x goes to infinity of N(x)/x will be the product of the first weight matrix with the final weight matrix. This is a different kind of behavior than we see with polynomial approximations. I therefore see no relation between Taylor series and the neural network approximation for sine.
@V1kToo
@V1kToo 3 месяца назад
Is this a demo on oevrfitting?
@josephvanname3377
@josephvanname3377 3 месяца назад
Yes. You can think of it that way even though this is not the typical example of how neural networks overfit. The sine function is a 1 dimensional function and the lack of dimensionality stresses the neural network.
@r-d-v
@r-d-v 3 месяца назад
I desperately wanted to hear the waveform as it evolved
@josephvanname3377
@josephvanname3377 3 месяца назад
Here, the waveform only goes through a few periods. It would be better if I used a periodic activation for a longer waveform.
@Jandodev
@Jandodev 3 месяца назад
Really Interesting!
@oldcowbb
@oldcowbb 2 месяца назад
Poor thing is just learning each wave one by one
@mineland8220
@mineland8220 3 месяца назад
3:30 bless you
@josephvanname3377
@josephvanname3377 3 месяца назад
The neural network appreciates your blessing. The network has been through a lot.
@anarchy5369
@anarchy5369 3 месяца назад
That was a weird transition, definitely of note
@darth_dan8886
@darth_dan8886 3 месяца назад
So what is the output if this network? I assume it is fed into some kind of approximant?
@josephvanname3377
@josephvanname3377 3 месяца назад
The neural network takes a single real number as an input and returns a single real number as an output.
@SriNiVi
@SriNiVi 3 месяца назад
What activation are you using ? if it is relu then maybe a different activation might help ?
@josephvanname3377
@josephvanname3377 3 месяца назад
I tried ReLU, but I did not post it (should I post it?). The only advantage that I know of from ReLU is that ReLU could easily approximate the triangle wave. ReLU has the same problem where it can only remember a few humps.
@CaridorcTergilti
@CaridorcTergilti 3 месяца назад
Can you please make a video comparing to NN learning with a second order optimizer?
@josephvanname3377
@josephvanname3377 3 месяца назад
I have made a couple of visualizations a couple weeks ago of the Hessian during gradient descent/ascent, but I may need to think about how to use second order optimization to make a decent visualization.
@CaridorcTergilti
@CaridorcTergilti 3 месяца назад
I mean for example this same fot split screen one learning with adam or sgd and the other one with a second order method
@josephvanname3377
@josephvanname3377 3 месяца назад
@@CaridorcTergilti I would have to think about how to make that work; second order methods are more computationally intensive, so I would think about how to compare cheap computational methods with complicated computation.
@CaridorcTergilti
@CaridorcTergilti 3 месяца назад
​@@josephvanname3377for a network with 10k parameters like this one you will have no trouble at all
@maburwanemokoena7117
@maburwanemokoena7117 3 месяца назад
Neural network is the mother of all functions
@josephvanname3377
@josephvanname3377 3 месяца назад
The universal approximation theorem says that neural networks can approximate any continuous function they want in the topology of uniform convergence on compact sets. But there are other topologies on spaces of functions. Has anyone seen a version of the universal approximation theorem where the network not only approximates the function but also approximates all derivatives up to the k-th order uniformly on compact sets?
@rexeros8825
@rexeros8825 3 месяца назад
that network too small for this or u training it wrong way. if u train it throught x=y - network must be large enough to imagine whole graph inside network. From this video I can clearly see how one information displaces another within the network. There just aren't enough layers to fully grasp this lesson. however, if you train the network through another graph - this will require fewer layers. However, it will be less universal. By adding layers, and training through the formula, you can then use this and teach even more complex functions without too much trouble.
@josephvanname3377
@josephvanname3377 3 месяца назад
It seems like if we represented the inputs using a positional embedding like they use with transformers, then the network would have a much easier time learning sine. But in that case, the visualization will just be an endless wave, so I will need to take its Fourier transform or convert the long wave into audio to represent this. But a problem with positional embeddings is that they already use sine. But networks like this one already have more than enough capacity to memorize a large amount of information, but in this case the network is unable to fit to sine for too long despite its ability to memorize large amounts of information. If we think about a network memorizing sin(nx) for large n over [0,1] instead, we can see a problem. In this case, the network must compute a function with a high derivative, so it must have very large gradients, so perhaps I can use something to counteract the large gradients.
@rexeros8825
@rexeros8825 3 месяца назад
@@josephvanname3377 Perception through sound in this case would be much simpler. Just like through visualization. Perception through formulas is somewhat more difficult, it seems to me. This type of work is more suitable for traditional computers. The neural network must be deep enough for such an analysis. (to use the formula to reproduce an ideal graph on any segment)
@josephvanname3377
@josephvanname3377 3 месяца назад
@@rexeros8825 Perception through sound would be possible, but this requires a bit of ear training. It requires training for people to distinguish even between a fourth and a fifth in music or between a square wave and a sawtooth wave. There is also a possibility that the sounds may be a bit unpleasant.
@rexeros8825
@rexeros8825 3 месяца назад
@@josephvanname3377 no, if you do FFT in hardware (before entering it into the neural network). Do you know that our ear breaks sound into frequencies before the sound enters the neural network? The neural network of our brain hears sound in the form of frequencies and amplitudes. To transmit a sine to the neural network, you only need to transmit 1 frequency and amplitude. For example, transmitting a triangle wave or a more complex wave will require transmitting a complex of frequencies.
@Simigema
@Simigema 3 месяца назад
It’s a party in the USA
@josephvanname3377
@josephvanname3377 3 месяца назад
Yeah. We all take coat hangers and shape them into sine functions at parties.
@ggimas
@ggimas 3 месяца назад
Is this a Feed Forward Neural Network? If so, this will never work (outside of the training range...and it will very badly within). You need a Recurrent Neural Network. Those can learn periodic functions.
@josephvanname3377
@josephvanname3377 3 месяца назад
This is a feedforward network. There are ways to make a network learn the sine function, but I wanted to make a visualization that shows how neural networks work. If I wanted something to learn the sine function, the network would be of the form N(x)=a(1-x/c_1)...(1-x/c_n) and the loss would be log(|N(x)|)-log(|sin(x)|) or something like that (I did not actually train this; I just assume it would work, but I need to experiment to be sure.).
@asheep7797
@asheep7797 3 месяца назад
stop tortuing the network 😭
@josephvanname3377
@josephvanname3377 3 месяца назад
People tell me that I need to treat neural networks with kindness, but this sort of content (that is recommended by recommender systems which have neural networks) gets the most attention, so I am getting mixed messages.
@DorkOrc
@DorkOrc 3 месяца назад
This is so painful to watch 😭
@josephvanname3377
@josephvanname3377 3 месяца назад
I have made plenty of less 'painful' animations, but the audience here prefers to see the more painful visualizations instead of something like the spectrum of a completely positive superoperator that has perfect heptagonal symmetry.
@handyfrontend
@handyfrontend 3 месяца назад
it is USDRUB analysis?
@josephvanname3377
@josephvanname3377 3 месяца назад
USD/RUB looks more like a Wiener process or at least a martingale instead of the sine function.
@TheNightOwl082
@TheNightOwl082 3 месяца назад
More Positive reinforcement!!
@tacitozetticci9308
@tacitozetticci9308 3 месяца назад
stop circulating our ex prime minister memes ✋️
@Swordfish42
@Swordfish42 3 месяца назад
It looks like it should be a sin to do that
@josephvanname3377
@josephvanname3377 3 месяца назад
Is it also a sin to get a tan?
@harshans7712
@harshans7712 3 месяца назад
First time seeing a function getting tortured
@josephvanname3377
@josephvanname3377 3 месяца назад
And yet, this is my most popular visualization. What can we learn from this?
@harshans7712
@harshans7712 3 месяца назад
@@josephvanname3377 we can learn the limitations of using linear activation functions in neural networks, yes this video was really intuitive
@harshans7712
@harshans7712 3 месяца назад
@@josephvanname3377 yes we can learn the limitations of using linear function in activation functions, and yes it was one of the best visualisation 🙌
@AntonM-z7s
@AntonM-z7s 3 месяца назад
where is a grokking phase?)))
@josephvanname3377
@josephvanname3377 3 месяца назад
I don't allow this network to grokk. I simply increase the difficulty and make the network twist the curve more.
@AntonM-z7s
@AntonM-z7s 3 месяца назад
@@josephvanname3377 In any case, great visualization! Many people believe that neural networks perform very well outside of the training distribution, but that is not the case, and your video demonstrates this well.
@MessyMasyn
@MessyMasyn 3 месяца назад
"ai shits its pants when confronted with a sin wave"
@nedisawegoyogya
@nedisawegoyogya 3 месяца назад
Is it torture?
@josephvanname3377
@josephvanname3377 3 месяца назад
Well, this is my most popular visualization. Most of my visualizations show the AI working wonderfully, but they are not that popular. So this says a lot about all the people watching this and this says very little about me.
@nedisawegoyogya
@nedisawegoyogya 3 месяца назад
@@josephvanname3377 Hahaha very funny bro. Indeed, it's quite disturbing this kind of thing is funny.
@josephvanname3377
@josephvanname3377 3 месяца назад
@@nedisawegoyogya If I create a lot of content like this, you should just know that I am simply giving into the demands of the people here instead of creating stuff that I know is objectively nicer.
@matteopiccioni196
@matteopiccioni196 3 месяца назад
14 minutes for a 1D function come on
@josephvanname3377
@josephvanname3377 3 месяца назад
It takes that long to learn.
@matteopiccioni196
@matteopiccioni196 3 месяца назад
@@josephvanname3377 I know my friend, I would have reduced the video anyway!
@josephvanname3377
@josephvanname3377 3 месяца назад
@@matteopiccioni196 Ok. But a lot has happened in those 14 minutes since the network struggles so much.
@mr.sheldor794
@mr.sheldor794 3 месяца назад
Oh my god it is screaming for help
@c0ld_r3t4w
@c0ld_r3t4w 3 месяца назад
song?
@josephvanname3377
@josephvanname3377 3 месяца назад
I will add music to all my visualizations later.
@c0ld_r3t4w
@c0ld_r3t4w 3 месяца назад
@@josephvanname3377 That‘s cool, but maybe instead of a song you could make a note based on the avg y value in the training interval, or based on loss
@oro5421
@oro5421 2 месяца назад
That’s painful to watch
@Neomadra
@Neomadra 3 месяца назад
This video is very misleading since the sine function is a very bad example to demonstrate how the model is not able to extrapolate. Because the sine is not a trivial mathematical operation, it's an infinite series, a Taylor series. No finite neuronal network, also not your brain, can extrapolate this function on a infinite domain. It might be that the model really wants to extrapolate, but it will never have enough neurons to perform the computation. Probably that's indeed the case because looking at the plot it really looks like it's doing the Taylor series for the sine function, which is the absolutely optimal thing to do! Neuronal networks are just not suited for this, that's why we use calculators for these kinds of things. It's like asking the model to count to infinity
@strangeWaters
@strangeWaters 3 месяца назад
If the neural network had a periodic activation function it could fit sin perfectly though
@josephvanname3377
@josephvanname3377 3 месяца назад
There is a big gap between the inability to extrapolate over the entire field of real numbers and the inability to extrapolate a little bit beyond the training interval. And polynomials can only approximate the sine function on a finite interval since an nth degree polynomial has n roots (on the complex plane counting multiplicity).
@DaSquyd
@DaSquyd 3 месяца назад
"Mom! can we get a taylor series of sin(x)?" "We have a taylor series at home"
@josephvanname3377
@josephvanname3377 3 месяца назад
Thank you for this completely original and totally non-copied joke. It is very funny.
@DaSquyd
@DaSquyd 3 месяца назад
@@josephvanname3377 all in a day's work
Далее
AI can't cross this line and we don't know why.
24:07
Просмотров 721 тыс.
How to train simple AIs to balance a double pendulum
24:59
The Genius Behind the Quantum Navigation Breakthrough
20:47
Can You Pass Harvard University Entrance Exam?
10:46
I Made an AI with just Redstone!
17:23
Просмотров 1 млн
I Built The First LAMINAR FLOW ROCKET ENGINE
15:51
Просмотров 2,2 млн
Terence Tao at IMO 2024: AI and Mathematics
57:24
Просмотров 360 тыс.
Watching Neural Networks Learn
25:28
Просмотров 1,3 млн
How 3 Phase Power works: why 3 phases?
14:41
Просмотров 1 млн
The moment we stopped understanding AI [AlexNet]
17:38