I suck at implementing neural networks in octave

Swizec TellerNovember 15, 2011

Hi 👋 you're reading a pretty old post! I started writing on here back in high school and this page may not reflect my current views. Recommend checking out related articles and categories down below, I've likely published more recent thoughts on this topic.

A few days ago I implemented my first full neural network in Octave. Nothing too major, just a three layer network recognising hand-written letters. Even though I finally understood what a neural network is, this was still a cool challenge.

Yes, even despite having so much support from ml-class ... they practically implement everything and just leave the cost and gradient functions up to you to implement. Then again, Octave provides tools for learning where you essentially just run a function, tell it where to find the cost and gradient function and give it some data.

Then the magic happens.

Getting the basic implementation to work is really simple since the formulas being used aren't all that complex:

Here's the code I've come up with to get this working on a three layer network:

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices.
%
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);

% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

yy = zeros(size(y),num_labels);
for i=1:size(X)
  yy(i,y(i)) = 1;
end

X = [ones(m,1) X];
% cost
for  i=1:m
  a1 = X(i,:);
  z2 = Theta1*a1';
  a2 = sigmoid(z2);
  z3 = Theta2*[1; a2];
  a3 = sigmoid(z3);

  J += -yy(i,:)*log(a3)-(1-yy(i,:))*log(1-a3);
end

J /= m;

J += (lambda/(2*m))*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));

t=1;
for t=1:m
  % forward pass
  a1 = X(t,:);
  z2 = Theta1*a1';
  a2 = [1; sigmoid(z2)];
  z3 = Theta2*a2;
  a3 = sigmoid(z3);

  % backprop
  delta3 = a3-yy(t,:)';
  delta2 = (Theta2'*delta3).*[1; sigmoidGradient(z2)];
  delta2 = delta2(2:end);

  Theta1_grad = Theta1_grad + delta2*a1;
  Theta2_grad = Theta2_grad + delta3*a2';
end

Theta1_grad = (1/m)*Theta1_grad+(lambda/m)*[zeros(size(Theta1, 1), 1) Theta1(:,2:end)];
Theta2_grad = (1/m)*Theta2_grad+(lambda/m)*[zeros(size(Theta2, 1), 1) Theta2(:,2:end)];

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

This then basically gets pumped into the fmincg function and on the other end a result pops out.

Now, I've managed to vectorize this thing to the edge of my capabilities. But I know it's still just matrix multiplication so I know for a fact it should be possible to vectorize even further. Anyone know how to do that?

Also, if you know of a cool way to generalize the algorithm so it would work on bigger networks, I'd love to hear about that as well!

Filed under: AIMachine learningNeural NetworkOctave

I suck at implementing neural networks in octave

Related Articles

Related Articles

I suck at implementing neural networks in octave

Dive deeper with my books

Related Articles

Related Articles