Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Operational Security //


09:35 AM
Larry Loeb
Larry Loeb
Larry Loeb

Academics Look to Bolster the Optimization of Neural Networks

A trio of academic papers looks at the current methods used to train neural networks and where the techniques can be improved in order to benefit the businesses that use them.

As artificial intelligence becomes increasingly critical to the everyday workflow of enterprises, including increasing usage within security, computer scientists in the AI community are attempting to make the process of actually coming up with these decisions work better.

Neural networks are used in the inference part of the overall process, and are critical to decision making. There has been a recent flurry of academic papers published that address how to optimize these networks from both a training and operational standpoint.

One paper, "Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm" from Charbel Sakr and Naresh Shanbhag of the University of Illinois at Urbana-Champaign, describes a "precision assignment methodology for neural network training in which all network parameters, i.e., activations and weights in the feedforward path, gradients and weight accumulators in the feedback path, are assigned close to minimal precision. The precision assignment is derived analytically and enables tracking the convergence behavior of the full precision training, known to converge a priori."

So what does this mean?

(Source: iStock)\r\n\r\n
(Source: iStock)\r\n\r\n

This approach reduces the complexity of the neural network by using minimal precision, which also reduced the amount of work needed to train the algorithm. The paper describes an attempt to bypass the usual training methods that involve the stochastic gradient descent algorithm, which can make things very slow and energy hungry.

This differs from the approach described in the paper "Exploring Weight Symmetry in Deep Neural Networks" by Xu Shell Hu, Sergey Zagoruyko, and Nikos Komodakis of Université Paris-Est, École des Ponts ParisTech in Paris. Here, they propose imposing "symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines."

One might expect a significant drop in accuracy due to reduction in the number of parameters brought about by the symmetry constraints. However, the authors show that this is not the case. Further, they find that depending on network size, symmetry can have little or no negative effect on network accuracy.

In the paper, the researchers write that "symmetry parameterizations satisfy universal approximation property for single hidden layer networks" with 25% less parameters yielding only a 0.2% accuracy loss.

Other researchers are looking at totally different aspects of the overall process.

In a paper entitled "Stanza: Distributed Deep Learning with Small Communication Footprint," Xiaorui Wu, Hong Xu, and Bo Li of the City University of Hong Kong, along with Yongqiang Xiong of Microsoft, look at how each compartmentalized part of a parameter server system works to train the complete model.

They found that the data transfer between the convolutional layers and the fully connected layers has a non-negligible impact on training time."

Their paper proposes layer separation in distributed training. The majority of the nodes will only train the convolutional layers, and the rest will train the fully connected layers only. Gradients and parameters of the fully connected layers no longer need to be exchanged across the cluster, which substantially reduces the data transfer volume and the time needed to do the data transfers.

In their conclusion, the researchers determined that on Amazon EC2 instances with an Nvidia Tesla V100 GPU and 10GB bandwidth, their system -- called Stanza -- is 1.34x to 13.9x faster for common deep learning models.

These are all ways to optimizing existing neural net architectures that remain similar in concept to those that were first developed. How that architecture may change for the better remains to be seen.

Related posts:

— Larry Loeb has written for many of the last century's major "dead tree" computer magazines, having been, among other things, a consulting editor for BYTE magazine and senior editor for the launch of WebWeek.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
Enterprise Cybersecurity Plans in a Post-Pandemic World
Download the Enterprise Cybersecurity Plans in a Post-Pandemic World report to understand how security leaders are maintaining pace with pandemic-related challenges, and where there is room for improvement.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-09-23
A CSRF in Concrete CMS version 8.5.5 and below allows an attacker to clone topics which can lead to UI inconvenience, and exhaustion of disk space.Credit for discovery: "Solar Security Research Team"
PUBLISHED: 2021-09-23
The vCenter Server contains a reflected cross-site scripting vulnerability due to a lack of input sanitization. An attacker may exploit this issue to execute malicious scripts by tricking a victim into clicking a malicious link.
PUBLISHED: 2021-09-23
Rhttproxy as used in vCenter Server contains a vulnerability due to improper implementation of URI normalization. A malicious actor with network access to port 443 on vCenter Server may exploit this issue to bypass proxy leading to internal endpoints being accessed.
PUBLISHED: 2021-09-23
The vCenter Server contains an arbitrary file deletion vulnerability in a VMware vSphere Life-cycle Manager plug-in. A malicious actor with network access to port 9087 on vCenter Server may exploit this issue to delete non critical files.
PUBLISHED: 2021-09-23
The vCenter Server contains a denial-of-service vulnerability in VAPI (vCenter API) service. A malicious actor with network access to port 5480 on vCenter Server may exploit this issue by sending a specially crafted jsonrpc message to create a denial of service condition.