Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Operational Security //


09:35 AM
Larry Loeb
Larry Loeb
Larry Loeb

Academics Look to Bolster the Optimization of Neural Networks

A trio of academic papers looks at the current methods used to train neural networks and where the techniques can be improved in order to benefit the businesses that use them.

As artificial intelligence becomes increasingly critical to the everyday workflow of enterprises, including increasing usage within security, computer scientists in the AI community are attempting to make the process of actually coming up with these decisions work better.

Neural networks are used in the inference part of the overall process, and are critical to decision making. There has been a recent flurry of academic papers published that address how to optimize these networks from both a training and operational standpoint.

One paper, "Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm" from Charbel Sakr and Naresh Shanbhag of the University of Illinois at Urbana-Champaign, describes a "precision assignment methodology for neural network training in which all network parameters, i.e., activations and weights in the feedforward path, gradients and weight accumulators in the feedback path, are assigned close to minimal precision. The precision assignment is derived analytically and enables tracking the convergence behavior of the full precision training, known to converge a priori."

So what does this mean?

(Source: iStock)\r\n\r\n
(Source: iStock)\r\n\r\n

This approach reduces the complexity of the neural network by using minimal precision, which also reduced the amount of work needed to train the algorithm. The paper describes an attempt to bypass the usual training methods that involve the stochastic gradient descent algorithm, which can make things very slow and energy hungry.

This differs from the approach described in the paper "Exploring Weight Symmetry in Deep Neural Networks" by Xu Shell Hu, Sergey Zagoruyko, and Nikos Komodakis of Université Paris-Est, École des Ponts ParisTech in Paris. Here, they propose imposing "symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines."

One might expect a significant drop in accuracy due to reduction in the number of parameters brought about by the symmetry constraints. However, the authors show that this is not the case. Further, they find that depending on network size, symmetry can have little or no negative effect on network accuracy.

In the paper, the researchers write that "symmetry parameterizations satisfy universal approximation property for single hidden layer networks" with 25% less parameters yielding only a 0.2% accuracy loss.

Other researchers are looking at totally different aspects of the overall process.

In a paper entitled "Stanza: Distributed Deep Learning with Small Communication Footprint," Xiaorui Wu, Hong Xu, and Bo Li of the City University of Hong Kong, along with Yongqiang Xiong of Microsoft, look at how each compartmentalized part of a parameter server system works to train the complete model.

They found that the data transfer between the convolutional layers and the fully connected layers has a non-negligible impact on training time."

Their paper proposes layer separation in distributed training. The majority of the nodes will only train the convolutional layers, and the rest will train the fully connected layers only. Gradients and parameters of the fully connected layers no longer need to be exchanged across the cluster, which substantially reduces the data transfer volume and the time needed to do the data transfers.

In their conclusion, the researchers determined that on Amazon EC2 instances with an Nvidia Tesla V100 GPU and 10GB bandwidth, their system -- called Stanza -- is 1.34x to 13.9x faster for common deep learning models.

These are all ways to optimizing existing neural net architectures that remain similar in concept to those that were first developed. How that architecture may change for the better remains to be seen.

Related posts:

— Larry Loeb has written for many of the last century's major "dead tree" computer magazines, having been, among other things, a consulting editor for BYTE magazine and senior editor for the launch of WebWeek.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
Enterprise Cybersecurity Plans in a Post-Pandemic World
Download the Enterprise Cybersecurity Plans in a Post-Pandemic World report to understand how security leaders are maintaining pace with pandemic-related challenges, and where there is room for improvement.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-09-25
There is an information leak vulnerability in the message service app of a ZTE mobile phone. Due to improper parameter settings, attackers could use this vulnerability to obtain some sensitive information of users by accessing specific pages.
PUBLISHED: 2021-09-24
Shopkit v2.7 contains a reflective cross-site scripting (XSS) vulnerability in the /account/register component, which allows attackers to hijack user credentials via a crafted payload in the E-Mail text field.
PUBLISHED: 2021-09-24
A Cross-Site Request Forgery (CSRF) in Maccms v10 via admin.php/admin/admin/del/ids/<id>.html allows authenticated attackers to delete all users.
PUBLISHED: 2021-09-24
OpenNMS version 18.0.1 and prior are vulnerable to a stored XSS issue due to insufficient filtering of SNMP trap supplied data. By creating a malicious SNMP trap, an attacker can store an XSS payload which will trigger when a user of the web UI views the events list page. This issue was fixed in ver...
PUBLISHED: 2021-09-24
OpenNMS version 18.0.1 and prior are vulnerable to a stored XSS issue due to insufficient filtering of SNMP agent supplied data. By creating a malicious SNMP 'sysName' or 'sysContact' response, an attacker can store an XSS payload which will trigger when a user of the web UI views the data. This iss...