Coursera | Andrew Ng (02-week3-3


Coursera | Andrew Ng (02-week3-3

文章插图
In the last video, you saw howat , over the range of ,can allow you toover the space ofmore。But it turns out thatatdoesn’t meanatover the range of valid。, it’sto pick thescale on which tothe。In this video,I want to show you how to do that 。Let’s say that you’retotheofunits, n[l],for a given layer l 。
And let’s say that you think a good range ofisfrom 50 to 100。In that case, if you look at theline from 50 to 100 ,maybesomeatthisline 。There’s away tofor this。Or if you’retoon theofin your,we’rethatL 。Maybe you think the totalofbe2 to 4 。Thenat , along 2, 3 and 4, might be。
Or even using a grid , where youthe2, 3 and 4 might be。So these were awhereatover the range you’re , might be athing to do 。But this is not true for all。
Coursera | Andrew Ng (02-week3-3

文章插图
在上一个视频中 你已经看到了在超参数范围中,随机取值可以提升你的搜索效率,但随机取值并不是在有效值范围内的随机均匀取值,而是选择合适的标尺,用于探究这些超参数这很重要,在这个视频中 我会教你怎么做,假设 你要选取隐藏单元的数量 n[l],对于给定层 1 而言,假设 你选择的取值范围是从 50 到 100 中某点,这种情况下 看到这条从 50 - 100 的数轴,你可以随机在其上取点,这是一个搜索特定超参数的很直观的方式,或者 如果你要选取神经网络的层数,我们称之为字母 L,你也许会选择层数为 2 到 4 中的某个值,接着 顺着 2 3 4 随机均匀取样才比较合理,你还可以应用网格搜索 你会觉得 2 3 4,这三个数值是合理的,这是几个在你的考虑范围内随机均匀取值的例子,这些取值还蛮合理的,但这对某些超参数而言不适用 。
Let’s look at.Say yourfor thealpha, therate.And let’s say that you0.0001 might be on the low end,or maybe it could be as high as 1.Now if you draw theline from 0.0001 to 1,andatover thisline.Well about 90% of theyouwould be0.1 and 1.So you’re using 90% of theto0.1 and 1, and only 10% of theto0.0001 and 0.1.So that doesn’t seem right., it seems moretoforon a log scale.Whereof using ascale,you’d have 0.0001 here,and then 0.001, 0.01, 0.1, and then 1.And you, at , on this type ofscale.Now you have moreto0.0001 and 0.001,and0.001 and 0.01, and so on.
Coursera | Andrew Ng (02-week3-3

文章插图
看看这个例子,假设你在搜索超参数αα学习速率,假设你怀疑其值最小是 0.0001,或最大是 1,如果你画一条从 0.0001 到 1 的数轴,沿其随机均匀取值,那 90% 的数值将会落在 0.1 到 1 之间,结果就是 在 0.1 到 1 之间 应用了 90% 的资源,而在 0.0001 到 0.1 之间 只有 10%的搜索资源,这看上去不太对,反而 用对数标尺搜索超参数的方式会更合理,因此这里不使用线性轴,分别依次取 0.0001 0.001 0.01 1,在对数轴上均匀随机取点,这样 在 0.0001 到 0.001 之间 就会有更多的搜索资源可用,还有在 0.001 到 0.01 之间等等 。
So in , the way youthis,is let r = -4 * np 。。rand() 。And then avalue of alpha, would be alpha = 10 to the power of r 。So after this first line, r will be a-4 and 0 。And so alpha here will be10 to the -4 and 10 to the 0 。So 10 to the -4 is this left thing,this 10 to the -4 。And 1 is 10 to the 0 。In a morecase,if you’reto10 to the a, to 10 to the b, on the log scale 。And in this , this is 10 to the a 。
And you canout what a is bythe log base 10 of 0 。0001 ,which is going to tell you a is -4 。And this value on the right,this is 10 to the b 。And you canout what b is,bylog base 10 of 1,which tells you b is equal to 0 。So what you do, is thenr , at ,a and b 。So in this case,r would be-4 and 0 。And you can set alpha,on yourvalue, as 10 to the r, okay?So just to recap, toon the log scale,you take the low value,take logs toout what is a 。