It seems that recently the infosec community revived the debate about security-by-obscurity. Some have argued that it is only bad if employed as the only line of defense, and that it can be a desirable security layer if combined with otherwise-proper measures. Others have asserted that we are cursed by binary thinking when we consider something as "good" or "bad," and that security-by-obscurity can have merits if applied properly.
I would like to argue that a security-by-obscurity strategy becomes meaningless in the context of defending AI-based systems. Here is why.
Security-by-obscurity is a spin-off of Kerckhoffs' principle, which stipulates that a cryptosystem should remain secure and provide the same level of protection, even if the attacker has gained full knowledge of the system, except for the encryption key.
Security-by-obscurity is an extension of this principle that is applied on an entire system, rather than merely on its cryptographic component. It is the belief that we can secure systems if we keep their design secret. Since the attacker is confronted with a complete black box, she will not be able to compromise it.
While Kerckhoffs' principle still stands in cryptography, its bastard child of security-by-obscurity has been largely seen as invalid. The reason is obvious: If the attacker gets the design of a cryptosystem without obtaining the key, it's still possible to protect data secrecy since it was designed with this scenario in mind. On the other hand, if the attacker can obtain the design of the system — which we intended to keep as a secret as the only means to prevent the attack — with the absence of additional security mechanisms, the attacker will be able to compromise the system, which is now an unprotected white box.
How AI-Based Systems Work
Machine learning and its popular subbranch of deep learning (aka "neural networks") are a subfield of AI that aims to give computers the ability to learn from examples. Suppose you want the computer to be able to recognize cats in images. It is very difficult to write a program to achieve this task, mainly because the programmer does not understand how his own brain recognizes cats. If I do not understand how this task is accomplished, I cannot write instructions to the computer to allow it to achieve this task step-by-step.
Machine learning is a tool that helps programmers avoid such explanations. Even in cases where they are unable to explain how the task can be accomplished, they can provide a dataset of cat images to the machine and run an algorithm that is able to identify similarities and differences across all such images by mathematically measuring the relationships between pixels. Consequently, the computer itself derives an algorithm whose purpose is to recognize cats in previously unseen images — an algorithm that could not be provided by a human programmer, but rather deduced from examples. We refer to such an algorithm as a "model," against which all future images will be applied in order to detect cats.
Obviously, we can use this technique to train computers to identify other things besides cats: This way we can recognize faces and voices, and even malware. For the latter, we feed the machine with various malware samples and "train" a model to recognize previously unseen malware based on various characteristics (or "features") of the provided examples, which can be deduced statically (e.g., file size, file content) or dynamically (e.g., interaction of the file with the operating system). Once a new file with similar characteristics is met, we assume it is malicious.
At this point, security professionals should be asking themselves, can an attacker bypass such a model? Can an attacker show a picture of a dog to the cat-recognizing model, and convince it that this is a cat? Or more seriously, can an attacker convince a malware detection model that a malicious file is benign? Unfortunately, the answer is yes.
It becomes increasingly apparent, that an attacker can successfully bypass any machine learning-based system without any prior knowledge about it. A common approach to pull such an attack off starts when the attacker builds their own machine learning-based system that accomplishes the same task as the target system.
For example, if I wanted to bypass a victim’s malware detection system, I would gather my own dataset of malware and train my own model with it. Such a dataset can be completely different from the dataset used to train the victim model. Next, I would craft a malware that is able to bypass my own detection model. I can accomplish this task, since my model is a white box to me, allowing me to calculate the exact adjustments I need to apply to the malware. Once the adjusted malware is delivered to the victim system, there is a high chance (between 30% and 70%) it won't be detected by the victim's model. In other words, there is a high probability that I will be able to mislead a target model by crafting a sample that misleads my own model, despite the fact that they were trained on different datasets and use different algorithms. This ability to transfer a successful attack from one model to another is called an "adversarial transferability."
Unfortunately, unaware of adversarial transferability, too many CISOs use security-by-obscurity as the only line of defense for AI-based systems. They believe that because AI-based models are so complex and "unexplainable," the adversary will have a hard time attacking them. This gives them a false sense of security: Adversarial transferability provides the attackers with the ability to effectively convert black-box models into a white box.
Various methods have been proposed to make AI-based models more robust to adversarial attacks. Most of them address the way these models are being created. Unfortunately, data scientists who build such models are often not responsible for (or familiar with) security issues; and most CISOs are not involved in (or familiar with) data-science processes. Yet again, we let our software be vulnerable without any responsible adults being accountable. Are we, as security professionals, going to address this problem before it's too late?