Automatic Neural Network Architecture Optimization

Mounir Sourial, Maggie Moheb

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Thesis

Automatic Neural Network Architecture Optimization

MPS-Authors

Mounir Sourial, Maggie Moheb
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Mounir Sourial, M. M. (2019). Automatic Neural Network Architecture Optimization. Master Thesis, Universität des Saarlandes, Saarbrücken.

Cite as: https://hdl.handle.net/21.11116/0000-0005-9C58-9

Abstract

Deep learning has recently become a very hot topic in Computer Science. It has invaded
many applications in Computer Science achieving exceptional performances compared
to other existing methods. However, neural networks have a strong memory limitation
which is considered to be one of its main challenges. This is why remarkable research
focus is recently directed towards model compression.

This thesis studies a divide-and-conquer approach that transforms an existing trained
neural network into another network with less number of parameters with the target of
decrasing its memory footprint. It takes into account the resulting loss in performance.
It is based on existing layer transformation techniques like Canonical Polyadic (CP) and
SVD aﬃne transformations. Given an artiﬁcial neural network, trained on a certain
dataset, an agent optimizes the architecture of the neural network in a bottom-up man-
ner. It cuts the network in sub-networks of length 1. It optimizes each sub-network using
layer transformations. Then it chooses the most- promising sub-networks to construct
sub-networks of length 2. This process is repeated until it constructs an artiﬁcial neural
network that covers the functionalities of the original neural network.

This thesis oﬀers an extensive analysis of the proposed approach. We tested this tech-
nique with diﬀerent known neural network architectures with popular datasets. We
could outperform recent techniques in both the compression rate and network perfor-
mance on LeNet5 with MNIST. We could compress ResNet-20 to 25% of their original
size achieving performance comparable with networks in the literature with double this
size.