Image for Black-box Attack

Black-box Attack

A black-box attack is a method used to trick or manipulate an AI system without knowing how it works internally. Attackers send inputs (like images or data) to the system and observe its responses, then adjust their approach based on that feedback. Since they lack detailed knowledge of the system’s inner workings, they rely solely on its outputs. This approach makes it challenging to defend against, as the attacker doesn't need access to the system’s code or training data, only the ability to interact with it. Black-box attacks highlight vulnerabilities in AI systems that can be exploited from outside.