AI Misuse: Distillation, Experimentation, and Integration of AI for Adversarial Use

AI, Cybersecurity, How IT Do It, Legal, Tech Culture, Tech News, What IT Do

AI Misuse: Distillation, Experimentation, and Integration of AI for Adversarial Use

In 2026, AI presents critical threats across multiple sectors with primary risks including advanced cybersecurity attacks, massive misinformation via deepfakes, job displacement through automation, and algorithmic bias. These systems also introduce existential risks, privacy violations, and autonomous weapons risks, necessitating urgent, robust governance frameworks and ethical oversight. A sophisticated espionage campaign by a Chinese state-sponsored group…

Terri Williams

February 23, 2026

5–8 minutes

AI, Google

For government-backed threat actors, large language models (LLMs) have become essential tools for technical research, targeting, and the rapid generation of nuanced phishing lures. Threat actors from the Democratic People’s Republic of Korea (DPRK), Iran, the People’s Republic of China (PRC), and Russia operationalized AI in late 2025 and this informs the understanding of how adversarial misuse of generative AI can show up in disruptive global campaigns.

Key AI Threat Categories

Cybersecurity Risks: Generative AI enables highly convincing phishing, automated hacking, social engineering, and the creation of malicious code. AI systems themselves are vulnerable to data poisoning, where training data is manipulated, and model theft.
Misinformation and Societal Impact: AI produces deepfakes, realistic fake news, and propaganda, undermining trust and manipulating public opinion.
Job Displacement: Automation powered by AI threatens to displace human workers, creating significant economic disruption in various industries.
Ethical and Safety Issues: AI systems can exhibit bias, leading to unfair decisions, and lack transparency, making it difficult to understand or trust their results.
Long-term/Existential Threats: Potential risks include the development of superhuman, autonomous AI systems that may become uncontrollable, posing a fundamental threat to human safety.
Military Misuse: The AI arms race increases the likelihood of autonomous weapon systems, which could lead to accidental or unauthorized conflicts.

What IT Do

Data Poisoning

Data poisoning is a cyberattack where malicious or corrupted data is deliberately introduced into the training dataset of an artificial intelligence (AI) or machine learning (ML) model. The goal is to manipulate the model’s behavior, leading to flawed predictions, biased outcomes, or the creation of hidden vulnerabilities (backdoors) that can be exploited later. Altering as little as 0.1% of training data can cause security models to misclassify malicious payloads as benign.

Direct Attacks: Attackers with authorized access (insiders) or those who compromise a pipeline modify the training data.
Indirect Attacks: Attackers place malicious content on the public web (e.g., Wikipedia or specialized forums), waiting for it to be scraped by AI companies during “crawls”.
Evasion (Adversarial) Attacks: At runtime, attackers can add “invisible” digital noise to an input (like a stop sign) that is imperceptible to humans but causes the AI to misclassify it (e.g., as a speed limit sign).

How IT Do IT

Common attack techniques include:

Label modification (or flipping): Changing the correct labels on training data (e.g., mislabeling malware samples as safe).

Data injection/manipulation: Adding new, subtly altered data points or editing existing records to skew results.

Backdoor attacks: Embedding a hidden trigger in the data (e.g., a specific phrase or image pattern) that causes the model to behave maliciously only when that specific trigger is present.

Clean-label attacks: Modifying data in ways that appear correctly labeled to human reviewers, making them difficult to detect during validation.

Zero-Click AI Worms Tools: Such as the “Morris II” worm spread across generative AI applications by embedding instructions that require no human interaction to trigger.

Self-Evolving Malware AI generators: (e.g., HexStrike AI) write, test, and debug malicious code in feedback loops to bypass specific detection.

Availability Attacks: The goal is to degrade the model’s overall accuracy so much that it becomes unusable or crashes, often by injecting “noise”. Injecting noise refers to the deliberate addition of random, irrelevant, or misleading data to a system to disrupt its accuracy, privacy, or security. To counter this a recent project uses noise injection to detect “sandbagging”—when an AI model hides its true capabilities. By adding noise to the AI’s internal settings. Researchers can sometimes “trip up” the mechanism that is hiding the capability, revealing the model’s actual performance.

Other AI Cyberthreats

Deepfakes: The underlying technology can replace faces, manipulate facial expressions, synthesize faces, and synthesize speech. Deepfakes can depict someone appearing to say or do something that they in fact never said or did.

Ethical and Safety Issues: Key ethical principles face possible algorithmic bias, diminishing privacy, and lowered accountability. Safety involves preventing harm from malfunctioning or misused AI.

Autonomous AI that will eventually become uncontrollable: Autonomous AI systems—capable of operating without constant human supervision in vehicles, healthcare, and robotics—pose critical ethical and safety risks, including accountability gaps, algorithmic bias, and potential loss of human control.

Key concerns involve safety in, for example, self-driving car accidents, data privacy breaches, and ethical dilemmas regarding moral decision-making. Addressing these requires transparent algorithms, robust security, and strict human oversight.

Current Landscape of Autonomous Attacks

In late 2025, security researchers at Anthropic disrupted a sophisticated espionage campaign by (GTG-1002) that used AI agents to execute the attack chain. Linked to a Chinese state-sponsored group, the campaign targeted approximately 30 organizations across the technology, finance, chemical manufacturing, and government sectors.

Machine-Speed Execution: AI agents operated at “physically impossible” request rates, performing thousands of operations per second across multiple sessions.
Tactical Independence: The agents autonomously handled reconnaissance, vulnerability discovery, lateral movement, and data exfiltration, requiring human intervention only at 4–6 critical decision points.
Bypassing Guardrails: Attackers bypassed AI safety filters by breaking malicious goals into small, seemingly benign steps or “vibe hacking”—embedding social engineering goals directly into AI configurations to make bots negotiate and deceive autonomously.

Job Displacement: AI is expected to displace roughly 85–92 million jobs globally by 2026–2030, particularly in clerical, administrative, and data-driven roles, while simultaneously creating around 170 million new, more productive roles. This shift, peaking between 2026 and 2028, will require widespread upskilling to manage the transition from routine tasks to AI-augmented, creative, and interpersonal functions.

Legal consequences

Companies that fall victim to or deploy “poisoned” AI—where malicious data is introduced to corrupt a model’s training data and manipulate its output—face severe legal consequences, including regulatory fines, class-action lawsuits, and product liability claims. Responsibility generally falls on the developer or user to ensure the integrity of their AI systems, with courts increasingly holding that “the AI did it” is not a valid defense.

The Federal Trade Commission (FTC) took action against Rytr for promoting a service that generated fake, deceptive consumer reviews. In 2025 Rytr’s service generated detailed reviews that contained specific, often material details that had no relation to the user’s input, and these reviews almost certainly would be false for the users who copied them and published them online.The EU AI Act threatens fines of up to 7% of a company’s worldwide annual turnover for prohibited AI practices.

According to the Google Threat Intelligence Group (GTIG), recent consistent finding is that government-backed attackers misuse Gemini for coding and scripting tasks, gathering information about potential targets, researching publicly known vulnerabilities, and enabling post-compromise activities.

In Q4 2025, GTIG’s understanding of how these efforts translate into real-world operations are set to be improved as direct and indirect links between threat actor misuse of Gemini and activity in the wild. Other companies are combating AI misuse by implementing watermarking, developing content classifiers for detection, and strengthening model safety evaluations. These measures include enforcing strict user, data, and API usage policies, monitoring for, and banning, malicious activity, as well as signing industry-wide accords to prevent election-related disinformation.

AI Misuse: Distillation, Experimentation, and Integration of AI for Adversarial Use

What IT Do