Netpicking Part 2: Generating the networks
In Netpicking Part 1, I described a dilemma in picking a neural network for MNIST. I went through summary stats for 1001 different generated networks. This post explains how I generated these networks.
Representing a Neural Network
The MNIST problem requires finding a function
$$ \mathit{f} : \R^{784} \rightarrow {0,1,2,3,4,5,6,7,8,9} $$
such that $ \mathit{f} $ performs well on a target dataset. I can solve this as a multi-class classification problem using neural networks, and constrain the space of functions $ \mathit{f} $:
- Every $ \mathit{f} $ must accept an input of size
784
- Every $ \mathit{f} $ must provide an output of size
10
for each input - $ \mathit{f} $ is trained using cross-entropy loss i.e. the output goes through a
SoftMax
layer
A neural network can be represented as:
- a parameterized function, used in textbooks when teaching the theory
- a directed acyclic graph or DAG, which provides a visually friendly representation of the flow of operations
- text obeying a particular grammar, which is how neural nets are described in a programming language
For example, in PyTorch, a sample $ \mathit{f} $ satisfying the above constraints is represented like this:
class Basic(nn.Module):
def __init__(self):
nn.Module.__init__(self)
self.l1 = nn.Linear(in_features=784, # constraint 1
out_features=10, # constraint 2
bias=True)
self.ac = nn.LogSoftmax(dim=1) # constraint 3
def forward(self, x):
# DAG represented in text as function calls.
x = self.l1(x)
x = self.ac(x)
return x
How about a sample using 2D convolutions?
class Conv2dReLU_12(nn.Module):
def __init__(self):
nn.Module.__init__(self)
self.f0 = nn.Conv2d(in_channels=1, out_channels=62, kernel_size=(1, 1), bias=True)
self.f1 = nn.ReLU()
self.f2 = nn.Conv2d(in_channels=62, out_channels=18, kernel_size=(5, 5),)
self.f3 = nn.Conv2d(in_channels=18, out_channels=22, kernel_size=(11, 11), bias=True)
self.f4 = nn.ReLU()
self.f5 = nn.Conv2d(in_channels=22, out_channels=10, kernel_size=(14, 14),)
self.f6 = nn.LogSoftmax(dim=1) # constraint 3
def forward(self, *inputs):
x = inputs[0]
x = x.view(x.shape[0], 1, 28, 28) # constraint 1
# DAG represented in function calls.
x = self.f0(x)
x = self.f1(x)
x = self.f2(x)
x = self.f3(x)
x = self.f4(x)
x = self.f5(x)
x = x.view(x.shape[0], 10) # constraint 2
x = self.f6(x)
return x
Now, a leap of faith generalization. Every function $ \mathit{f} $ that satisfies the above constraints will follow the below template:
class Network(nn.Module):
def __init__(self): # possibly some args, kwargs
nn.Module.__init__(self)
# a sequence of layer declarations
self.activation = nn.LogSoftmax(dim=1)
def forward(self, *inputs):
x = inputs[0]
# check constraint 1
# represent DAG in function calls
# check constraint 2
x = self.activation(x) # check constraint 3
return x
Having networks follow this template would save time when writing boilerplate code for train/validation/test
cycles. Let’s add another simplifying constraint: if the neural network DAG is forced to be a straight line,
the function calls in the forward
method can be in the same order as the declarations. How do I
start designing such a template?
Jinja2
From the Jinja2
website (emphasis mine):
Jinja is a modern and designer-friendly templating language for Python, modelled after Django’s templates. […] A Jinja template is simply a text file. Jinja can generate any text-based format.
Any text-based format, so the above Python code block also applies. The Jinja2
templating language provides
mathematical operators, logical operators, if-else
, and for
statements. If I create a template similar to
the Network
class above, instantiating1 rendering that template with different parameters should
get the 1000 networks. Each network must have:
- (Constraint 1): an
input_shape
member, which can be used to shape the input.2 - a sequence of declarations. Naming the layers is simple (a loop with
self.f1
,self.f2
…), but generating the layer declaration on the RHS seems complicated. - a sequence of function calls (the simplified DAG) in the
forward
method. A loop withx = self.f{{ i }}(x)
. - (Constraint 2): its output shape cast to
x,10
after the all the function calls. - (Constraint 3): a
LogSoftmax
layer after the template declarations, and call it last.
Is declaring a layer really that complex? Let’s look at it again:
self.f5 = nn.Conv2d(in_channels=22, out_channels=10, kernel_size=(14, 14),)
Suppose I had an object x
of type Conv2d
, such that str(x)
returned "Conv2d(in_channels=22, out_channels=10)"
? Are there classes like this?
The Python standard library provides collections.namedtuple
, which has the right format for
stringified output. But then I need to write namedtuple
equivalents for so many classes! I wonder if there
is a way to examine (or inspect) the methods of a class to produce a namedtuple
.
inspect
From the documentation, the inspect
module in the Python standard library allows one to (emphasis
mine):
[…] get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects […] examine the contents of a class, retrieve the source code of a method, extract and format the argument list for a function, or get all the information needed to display a detailed traceback.
For a class A
, I’d like to get a namedtuple
that has the same arguments and defaults as A.__init__
, so
that I can generate a string A(par1=val1, par2=val2)
. inspect
is perfect for this.
from collections import namedtuple
import inspect
def get_namedtuple(obj):
klass = obj if inspect.isclass(obj) else type(obj)
sig = inspect.signature(klass.__init__)
params = {}
for name, par in sig.parameters.items():
if name in ("self", "*args", "**kwargs"):
continue
params[name] = ""
if par.default != inspect.Parameter.empty:
param[name] = par.default
tmpl_string = namedtuple(klass.__name__, tuple(params.keys()))
tmpl_string.__new__.__defaults__= tuple(params.values()))
return tmpl_string
print(get_namedtuple(nn.ReLU)(inplace=True))
# ReLU(inplace=True)
print(get_namedtuple(nn.Linear)(in_features=2, out_features=3))
# Linear(in_features=2, out_features=3, bias=True)
Good enough; with the appropriate parameters, I can instantiate a namedtuple
that prints the exact layer
declaration I want.3
Generating the 1000 Neural Networks
Though AutoML has been around for quite some time, I didn’t want to generate networks for this exercise with any optimization in mind. The aim was to have 1000 networks obeying the 4 constraints; I decided to use random parameters while instantiating each layer.
bool
parameters areTrue
with a probability in $ [0, 1] $.int
/float
parameters are randomly chosen from a given range with uniform probability.- shape parameters like
kernel_size
are square i.e. only one randomint
selected.
Armed with the inspect
/Jinja2
combo, I wrote a generation script that would:
Select the number of layers in the network.
Select a computation layer (
Conv1d
,Conv2d
,Conv3d
,Linear
, orBasicBlock
).Select an activation layer (None,
ReLU
,SeLU
,Sigmoid
,Tanh
).Generate each layers of the network one by one with random parameters, using a
namedtuple
template.- Check that the generated layer can accept the input shape
- Precompute the output shape of the layer based on the input shape
Ensure the generated network satisfies all constraints.
Repeat to generate 1000 networks across different kinds of computation/activation combinations.
The majority of debugging the script was in step 4: the input tensor would pass through a particular layer,
change shape, and would then be incompatible as input for the next layer. This was particularly annoying with
the inputs for ResNet BasicBlock
layers, which require an input of larger than a particular size, which is
not obvious from the declaration.
On the same note, look at the Conv2dReLU_12
code block again: Suppose it is known that the input is of shape (1, 1, 28, 28)
, and the DAG of the neural network is provided in text. It should be possible to tell what the
shape of the output is at any point in the DAG before running the script to train/test the network. I know
the Conv2d
documentation includes the calculation of output shape from input shape, but having an IDE
plugin to provide the shapes would save a lot of time. Alternatively, type-checking the shapes before running
the network could help as well.4
Closing notes
“Dumb” AutoML can be realized by generating neural nets following a flexible template, followed to selecting
one with the “best” performance characteristics. The randonet
package (currently version 0.0.1)
contains the code involved to generate networks according to the ideas described above. I used it to generate
the networks for mnistk
(you can see the difference between generated code and the
handwritten code).
Writing neural network programs involves boilerplate/scaffolding code, which can be offset with some templating (design patterns?) tailored to the specific problem at hand. I know wrapper packages exist, but when I last tried them I got lost between the abstractions and my customizations.
The
inspect
/Jinja2
combo has potential, especially for use cases involving the generation of contextual information from objects/function in a package: it can possibly be used for generating documentation or boilerplate code.Randomly generating a valid neural net program involves a lot of constraints, some of which are not apparent until the program is run. Removing some constraints would lead to wackier network architectures (an unconstrained DAG instead of a line graph, rectangular/cuboidal convolutions).
Too much time around
C++
templates and the mountain of errors I generate using them. ↩︎I added another simplification here:
Conv2d
/ResNetStyle
networks have an input of shape(N, 1, 28, 28)
,Conv1d
networks have(N, 28, 28)
(yes, 28 channels),Conv3d
networks have(N, 16, 7, 7)
, andLinear
networks have(N, 784)
. I realized later that another layer of randomness could be added by listing all 2-factor and 3-factor combinations of 784, but by then I had gotten bored of debugging templated Python code. ↩︎Of course, there were too many classes in
torch.nn
to call this function class by class, so I wrote another script to generate thenamedtuple
s corresponding to each class, instantiate with the appropriate random values, and write the rendered templates to a file. Debugging that was horrible: I had to typecheck the parameter defaults (PyTorch has anint
as default forkernel_size
instead of atuple
) generate thenamedtuple
s, check if I could generate text using them in the template, and then check if the generated text was valid Python code. ↩︎Can the shape of input/output tensor be provided as a type annotation? How would that even work in Python? ↩︎