mindspore.nn.Adam
浏览次数:
次 发布时间:2024-08-12 03:15:31
\[\begin{split}\begin{array}{l}
&
ewline
&\hline \\
& extbf{Parameters}: \: 1^{ ext {st }} ext {moment vector} \: m , \: 2^{ ext {nd}} \:
ext{moment vector} \: v , \\
&\: ext{gradients } g, \: ext{learning rate} \: \gamma, ext
{ exponential decay rates for the moment estimates} \: \beta_{1} \: \beta_{2} , \\
&\: ext {parameter vector} \: w_{0}, \: ext{timestep} \: t , ext{ weight decay } \lambda \\
& extbf{Init}: m_{0} \leftarrow 0, \: v_{0} \leftarrow 0, \: t \leftarrow 0, \:
ext{init parameter vector} \: w_{0} \\[-1.ex]
&
ewline
&\hline \\
& extbf{while} \: w_{t} \: ext{not converged} \: extbf{do} \\
&\hspace{5mm}\boldsymbol{g}_{t} \leftarrow
abla_{w} \boldsymbol{f}_{t}\left(\boldsymbol{w}_{t-1}\right) \\
&\hspace{5mm} extbf {if } \lambda
eq 0 \\
&\hspace{10mm}\boldsymbol{g}_{t} \leftarrow \boldsymbol{g}_{t}+\lambda \boldsymbol{w}_{t-1} \\
&\hspace{5mm}\boldsymbol{m}_{t} \leftarrow \beta_{1} \boldsymbol{m}_{t-1}+\left(1-\beta_{1}\right)
\boldsymbol{g}_{t} \\
&\hspace{5mm}\boldsymbol{v}_{t} \leftarrow \beta_{2} \boldsymbol{v}_{t-1}+\left(1-\beta_{2}\right)
\boldsymbol{g}_{t}^{2} \\
&\hspace{5mm}\hat{\boldsymbol{m}}_{t} \leftarrow \boldsymbol{m}_{t} /\left(1-\beta_{1}^{t}\right) \\
&\hspace{5mm}\hat{\boldsymbol{v}}_{t} \leftarrow \boldsymbol{v}_{t} /\left(1-\beta_{2}^{t}\right) \\
&\hspace{5mm}\boldsymbol{w}_{t} \leftarrow \boldsymbol{w}_{t-1}-\gamma \hat{\boldsymbol{m}}_{t}
/(\sqrt{\hat{\boldsymbol{v}}_{t}}+\epsilon) \\
& extbf{end while} \\[-1.ex]
&
ewline
&\hline \\[-1.ex]
& extbf{return} \: \boldsymbol{w}_{t} \\[-1.ex]
&
ewline
&\hline \\[-1.ex]
\end{array}\end{split}\]