A few days ago I implemented my first full neural network in Octave. Nothing too major, just a three layer network recognising hand-written letters. Even though I finally understood what a neural network is, this was still a cool challenge.

Yes, even despite having so much support from ml-class ... they practically implement everything and just leave the cost and gradient functions up to you to implement. Then again, Octave provides tools for learning where you essentially just run a function, tell it where to find the cost and gradient function and give it some data.

Then the magic happens.

Getting the basic implementation to work is really simple since the formulas being used aren't all that complex:

Here's the code I've come up with to get this working on a three layer network:

function [J grad] = nnCostFunction(nn_params, ...input_layer_size, ...hidden_layer_size, ...num_labels, ...X, y, lambda)%NNCOSTFUNCTION Implements the neural network cost function for a two layer%neural network which performs classification% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...% X, y, lambda) computes the cost and gradient of the neural network. The% parameters for the neural network are "unrolled" into the vector% nn_params and need to be converted back into the weight matrices.%% The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...num_labels, (hidden_layer_size + 1));% Setup some useful variablesm = size(X, 1);% You need to return the following variables correctlyJ = 0;Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));yy = zeros(size(y),num_labels);for i=1:size(X)yy(i,y(i)) = 1;endX = [ones(m,1) X];% costfor i=1:ma1 = X(i,:);z2 = Theta1*a1';a2 = sigmoid(z2);z3 = Theta2*[1; a2];a3 = sigmoid(z3);J += -yy(i,:)*log(a3)-(1-yy(i,:))*log(1-a3);endJ /= m;J += (lambda/(2*m))*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));t=1;for t=1:m% forward passa1 = X(t,:);z2 = Theta1*a1';a2 = [1; sigmoid(z2)];z3 = Theta2*a2;a3 = sigmoid(z3);% backpropdelta3 = a3-yy(t,:)';delta2 = (Theta2'*delta3).*[1; sigmoidGradient(z2)];delta2 = delta2(2:end);Theta1_grad = Theta1_grad + delta2*a1;Theta2_grad = Theta2_grad + delta3*a2';endTheta1_grad = (1/m)*Theta1_grad+(lambda/m)*[zeros(size(Theta1, 1), 1) Theta1(:,2:end)];Theta2_grad = (1/m)*Theta2_grad+(lambda/m)*[zeros(size(Theta2, 1), 1) Theta2(:,2:end)];% Unroll gradientsgrad = [Theta1_grad(:) ; Theta2_grad(:)];end

This then basically gets pumped into the *fmincg* function and on the other end a result pops out.

Now, I've managed to vectorize this thing to the edge of my capabilities. But I know it's still just matrix multiplication so I know for a fact it should be possible to vectorize even further. Anyone know how to do that?

Also, if you know of a cool way to generalize the algorithm so it would work on bigger networks, I'd love to hear about that as well!

###### Related articles

- Unique Neural Networks Make Leo Trader Pro A Unique And Profitable Forex Trading System (joeyjardins.wordpress.com)
- Train neural network in R, predict in SAS (heuristically.wordpress.com)
- Training of Neural Network (vql89.wordpress.com)

## Learned something new?

Want to become an expert?

Here's how it works 👇

Leave your email and I'll send you thoughtfully written emails every week about **React**, **JavaScript**, and **your career**. Lessons learned over 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

### Join Swizec's Newsletter

And get thoughtful letters 💌 on **mindsets, tactics, and technical skills** for your career. Real lessons from building production software. No bullshit.

"Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. 👌"

**Have a burning question that you think I can answer?** I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

**Ready to Stop copy pasting D3 examples and create data visualizations of your own?**
Learn how to build scalable dataviz components your whole team can understand
with React for Data Visualization

**Curious about Serverless and the modern backend?** Check out
Serverless Handbook, modern backend for the frontend engineer.

**Ready to learn how it all fits together and build a modern webapp from scratch?**
Learn how to launch a webapp and make your first 💰 on the side with ServerlessReact.Dev

**Want to brush up on your modern JavaScript syntax?** Check out my interactive cheatsheet: es6cheatsheet.com

**By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️**