Deep learning is one of the buzziest buzzwords of 2017, and for good reason. Deep learning (more accurately called deep neural networks) attempts to mimic the activities of the brain. The basic principles of neural networks have existed since the late 1950s, yet it wasn't until around 2010 that computers became powerful enough (and data got big enough) for highly complex "deep" neural networks to become practical for real-world applications.
Today, this technique is revolutionizing natural language processing and malware detection. Deep learning can figure out how to solve tough problems, such as identifying suspicious online behavior. This technique and related systems and tools will play an increasingly greater role in anti-fraud and security applications.
Compared with other forms of machine learning, deep learning requires less manual programming to solve problems. The most expensive part of leveraging traditional machine learning algorithms is a stage called feature engineering. An engineer, analyst, or data scientist needs to write code to extract interesting features from the data for the machine learning algorithm to learn from, such as the number of transactions that a person makes per day, or how far away from home his or her credit card is being used. The analyst must intuit which features would indicate fraud or a security breach.
Deep learning changes this equation; it imports raw transaction and user data and applies neural network technology to automatically do this feature engineering. For some problems (such as image recognition) it's very hard for humans to write code to extract these features. Deep learning opens new opportunities for innovative products in many fields, but this is especially exciting in security, fraud, and abuse detection. Some of its applications include the following:
1. Spotting inappropriate behavior. Social networks and other forums where users can contribute content sometimes attract deviant behavior, such as people posting pornographic or violent images. With deep learning, companies can automatically spot prohibited content instead of employing people to manually review images reported from users. This saves money and time and is a more proactive way of ensuring that users aren't violating company policies.
2. Photo verification: Cybercriminals often create fake photos and IDs. This gives them access to a new identity, so they can create fake accounts to dupe users into sharing data or signing up for bogus services. Large-scale marketplaces such as Airbnb are increasingly affected by these attacks. Deep neural networks can be trained to identify manipulated or duplicate images, and since 2015, neural networks have been outperforming humans on similar image-recognition tasks.
3. Phishing emails: Phishing — the practice of sending emails that appear to come from legitimate senders such as UPS or a bank — continue to trick people into clicking on the links and opening their PCs to data-stealing viruses. Some of us unwittingly give up our personal data, including account numbers and passwords, to these scammers. Deep-learning systems can be trained to recognize these phishing emails and prevent them from getting delivered to anyone's inbox.
4. Spam detection: Deep learning can root out all forms of unwanted email by learning the difference between junk and legitimate messages. Deep neural networks can understand the concepts included in the email's text and can, for example, identify if the email includes a call to action to purchase a product.
5. User and entity behavior analytics: User and entity behavior analytics (UEBA) focuses on analyzing the behaviors of people who are connected to an organization's network as well as entities such as servers, accounts, laptops, and so on. UEBA is used for external breach detection and for identifying rogue insiders by analyzing what is normal behavior — such as where users normally log in from and what applications they access — and looking for what isn't. Deep learning reduces the feature engineering required for UEBA, and neural networks can learn patterns of user behavior that may indicate a malicious session.
6. Account takeover mitigation: Like UEBA, security engineers and researchers are beginning to see the power of training recurrent neural networks on an individual user's behavior. If that user's behavior sufficiently deviates from the model, it may indicate that the account has been compromised.
Deep learning, however, has a few problems. First, it requires a vast quantity of labeled data to be effective. This requires people to select and feed data to the system so it can learn patterns to recognize, such as phony logos or email addresses used in phishing emails. Second, deep learning is extraordinarily computationally expensive.
That's why deep learning was a nascent field until around 2010, when Google started publishing state-of-the-art results. It could do this because of the advent of cheaper, more powerful processors called GPUs (the graphics cards that gamers use to render impressive 3-D visuals). Additionally, Google and other large corporations had amassed a vast quantity of training data by 2010, which is required for deep learning to be effective. As the world's data is doubling every two years, this presented a unique opportunity for a new type of machine learning to be successful.
Fortunately, you often don't need a huge data set to realize the benefits of deep learning. Many research groups publish pre-trained models on the Web under a permissive open source license. Additionally, you can use a strategy called transfer learning to start from one of these pre-trained networks and refine it on your own data. For example, you can take a pre-trained deep neural network that can recognize different animals and refine it on a data set of landscapes, and with just a few hundred or thousand samples it may achieve state-of-the-art performance.
Deep learning's potential for security and fraud detection is still in its early stages. Deep learning could change the math on machine learning; on most problems, not just the malware detection of today, we'll be able to get better results with less work from analysts.