Cracking the Pandoras Box of Neural Networks

“Cracking the Pandoras Box of Neural Networks

With the advent of Chat GPT-3, the world seems to be captivated and terrified by the incredible capabilities of modern-day neural networks. While I am mystified by it, I can’t seem to shake the uneasy feeling that everyone is dazzled by the shiny light: not realizing the light is emanating from a 50-megaton nuclear blast.

While everyone seems to be arguing about whether AI will replace programmers, artists, copywriters, lawyers, and people who don’t seem to be focused on the spooky problem underlying all this: these neural networks are extremely powerful and possess mass amounts of private data, but what happens when a bad actor can gain access and control it? This in my opinion is a huge problem that’s not being given the attention it deserves, and as neural networks collect more and more data from the internet, it will only become a bigger problem in the future.

Here, I will attempt to answer this question, explore the future of cyber security in an age of AI, as well as point to some possible solutions to this problem.

Breaking HAL 9000

Do you remember ‘2001: A Space Odyssey? Do you remember when Captain Dave climbed into the crawl space to deactivate HAL’s central core? Much like HAL, you cannot attack a neural network by interfacing with it directly (asking it questions or prompting it to create something), but attacks must be done on the back end by either poisoning its training data or stealing personal data from people in the data that the neural network is training on.

In this article, I will be focusing on the former as the latter would require several more articles to explore fully. Data poisoning is a type of neural network attack where an attacker introduces malicious samples in a neural network training set using some form of Evasion Attack Algorithm to manipulate its output on an attacker’s chosen input. While this risk is mitigated using data brokers who provide training datasets, often neural networks are trained on un-sanitized data from internet traffic, crowd-sourced information, or user-generated data.

As an example, let’s look at a classic AI classified the problem using the MNIST dataset. In this problem, given a random handwritten positive integer between one and nine, the goal is to create a neural network that can accurately label the integer, but what if someone were to introduce a trojan in the neural net such that it is a certain pixel were white instead of black, it would misclassify a “7” as a “4” but behave normally on standard inputs? While misclassifying a “7” as a “4” is trivial, this isn’t the case when a neural network is making financial decisions, driving a car, or deciding what users see on their social media feeds.

When a “7” turn into a “4”, a “wall” turns into a “road”

This could get scary, imagine if an attacker created a trojan in a neural network used to trade the stock market. The attacker could poison the data set with a malicious sample such that the neural network would buy and sell at whenever the attacker wanted, leading to millions of dollars in losses.

In the worst-case scenario, an extremely malicious actor could retrain an autonomous vehicle to mistake a wall for a road. Although these attacks are largely unheard of due to their difficulty, it is not the frequency or potential frequency that bothers me; It’s the amount of leverage that attackers can use from manipulating these powerful systems.

In one single attack, millions or billions of dollars can be lost, or worst of all lives could be lost. In a very broad system such as Chat-GPT, it would be difficult to control the damage once the malicious sample propagated through the network, moreover, trojans can remain largely undetected due to the massive output space and training data, allowing attackers to generate malicious code or reveal dangerous information.

What can we do about this?

I hope by this point the magnitude of this issue is apparent, the next question is what can be done to prevent attacks like these? The best preventative measure that can be taken is detection, and many methods are being proposed and implemented.

The most popular method currently being used is to detect changes in sample images by examining neural activation. This could be done using a system of neural networks, algorithms, or a team of researchers, although I see it to be more likely that a combination of all three of these would be the most viable solution as some attacks are hard to recognize by humans and easy to recognize by computer and vice-versa.

On a side note, I believe that as the sheer size and amount of data neural networks handle grows, it’s very likely that in the future, neural networks will need a team of people consistently monitoring and updating them to prevent attacks like these.

The next solution is to install some form of ‘kill-switch’ so that when a malicious sample is detected or a neural network is behaving unexpectedly, any further damage can be stopped by the click of a button or flip of a switch. This kill switch can be controlled by an algorithm, a system of neural networks, or a group of researchers, but like the solution above I postulate that a combination of these three will likely be the most viable solution.

As of now, there are a sparse set of libraries and APIs that offer detection of data poisoning attacks, but I only see the need for these growing as AI systems grow, and attacks become more prevalent. While I think that there has been a great effort in coming up with our current set of solutions, I can see a future in which AI systems will become so advanced that our current solutions will become obsolete, and some truly out-of-the-box solutions merging knowledge in computer science and neuroscience will be needed.

Karl Spengler

Oakland Community College