4.1

AI image detection: methods and accuracy

Detection is the option of last resort, used when no credential exists and no watermark is present. Its accuracy in laboratory benchmarks is high; its accuracy in the wild is the central operational disappointment of the field.

AI image detection is the family of techniques that attempt to determine, without producer cooperation, whether an image was generated by an AI system. Unlike C2PA and unlike generator-embedded watermarks, detection does not require the producer to do anything. This is its great virtue, the reason it remains operationally important even as provenance infrastructure expands. It is also the source of its persistent limits: the detector and the generator are locked in an adversarial game, and the generator typically has the last move.

This page covers the main detection method families in production and in the academic literature, the systematic problems with cross-model and cross-time generalization, and the realistic accuracy practitioners should expect. Readers wanting the high-level argument for why detection alone cannot scale will find it on the introductory page; this page is for the technical specifics.

The method families

Supervised classifiers

The most common approach. Train a binary classifier (typically a CNN or a vision transformer) on a balanced dataset of real and synthetic images, then use it as a detector. The classifier learns to identify whatever statistical regularities distinguish the training distributions. Performance on the training distribution is reliably high; performance off-distribution depends on how representative the training data was and how different the test cases are.

The early work (Rossler et al., FaceForensics++, 2019) used this approach against GAN-generated face manipulations. Detection accuracy in benchmark conditions reached 95%+. Subsequent work demonstrated that the same detectors collapsed against newer generative models, against post-processing pipelines, and against images compressed at different qualities than the training set used. The pattern has been consistent: classifier accuracy is highest on the data the classifier was trained against and degrades against everything else.

Frequency-domain analysis

Generative models produce images whose spectral characteristics differ subtly from camera-captured photographs. GAN outputs in particular exhibit characteristic peaks in the high-frequency spectrum, an artifact of the upsampling layers used in their generators. Diffusion-model outputs have their own characteristic spectral signatures, less localized but still detectable in aggregate. Wang et al., 2020, "CNN-generated images are surprisingly easy to spot," demonstrated the approach for the GAN era.

Frequency-domain detection has the virtue of being model-agnostic in principle: any generator that uses upsampling networks produces some artifact. The artifacts get harder to detect as architectures improve and as post-processing pipelines specifically attenuate them. Modern diffusion models with high-quality decoders produce outputs whose frequency profiles are much closer to real photographs than 2021-era GAN outputs were.

Co-occurrence and patch statistics

Methods that examine the statistical relationships between adjacent pixels, edges, or texture patches. Real photographs from camera sensors produce specific co-occurrence patterns related to the sensor's color-filter array, demosaicing, and JPEG quantization. Synthetic images, lacking those processes, often produce different patterns. The approach is the basis of much classical forensic work and overlaps significantly with classical image forensics.

Inversion and reconstruction

An approach specific to detecting outputs of a known generator: try to invert the generator on the suspect image. If the inversion produces a latent that, re-generated, matches the suspect image, the image is probably from the generator. The approach is reliable when the suspect image is in fact from the queried generator and gives high-confidence negative results when it is not. The cost is per-generator: it requires access to the generator's weights, and it does not generalize across model families.

Multi-modal classifiers

Classifiers that combine multiple feature types — pixel-level, frequency-level, semantic — to produce a more robust verdict. The 2024 and 2025 academic literature has explored these extensively, with mixed results. The combinations improve in-distribution performance but inherit the off-distribution brittleness of each component.

The cross-model generalization problem

The single largest practical problem with AI image detection is that detectors trained against outputs from one generator family generalize poorly to outputs from other families. A detector trained on Stable Diffusion 1.5 outputs performs much worse on Stable Diffusion XL outputs, worse still on Flux outputs, and barely above chance on completely different architectures. The Cozzolino et al. 2024 benchmark study, "On the Generalization of Detection Methods for Synthetic Images," documented the pattern systematically across the major model families.

The reason is intuitive: a classifier learns the specific statistical regularities of its training distribution, not a general theory of what makes an image synthetic. A new architecture produces images with different statistical regularities, against which the trained classifier has no special grip. Generalization across architectures is much harder than performance within an architecture and shows no signs of being solved.

The practical implication is that any detector deployed for general-purpose AI detection has to be continually retrained on new model outputs as those models appear. Production deployments at the major platforms do this through internal pipelines that ingest new generator outputs and update classifier weights. Public-facing detection tools that lack this continuous updating capability degrade in usefulness over time.

The cross-time problem

CohortDetector accuracy (in-distribution)Detector accuracy 12 months later
Early GAN (StyleGAN, 2019)97–99%97–99% (architecture frozen)
Mid GAN (StyleGAN3, 2021)95–98%~70% on newer GAN derivatives
Early diffusion (SD 1.5, 2022)92–96%~60–70% on SDXL outputs
Modern diffusion (SDXL, Flux, 2024)85–94%50–70% on 2025 generations

The pattern in the table is illustrative of the published academic record rather than a single source. The numbers vary by paper and benchmark; the trend across studies is consistent. A detector with strong in-distribution performance loses 20–40 percentage points within a year as the generator population shifts. This is not a problem unique to image detection; it is the same generalization-over-time problem that affects every classifier trained against an evolving adversarial distribution.

The post-processing problem

Images in the wild are post-processed. They are re-encoded as JPEG at various qualities, resized for platform thumbnails, color-converted, sometimes lightly edited. Detectors trained on raw generator outputs lose accuracy against post-processed versions of the same outputs. Detectors trained against post-processed examples regain some accuracy but lose it against post-processing pipelines they were not trained for.

This is not adversarial in intent; it is the consequence of normal distribution channels. A synthetic image uploaded to Instagram, re-encoded at Instagram's quality settings, downloaded by a third party, screenshotted, and re-shared has gone through transformations that the detector probably did not see during training. The accuracy lost to these benign transformations is comparable to the accuracy lost to deliberate adversarial scrubbing.

Caveat A detector's reported accuracy is almost always conditional on (a) which generator the test data came from, (b) what post-processing the test data went through, and (c) whether the test data is from the same time period as the training data. When any of these conditions does not match the deployment population, accuracy drops sharply. The single most important question to ask any detection vendor is: what is your accuracy on outputs from generators released after your model was trained.

Operational use

For all the limits above, detection has real operational value. A platform triaging millions of uploads needs some signal beyond user reports; detection provides it for the cases where no provenance signal exists. A newsroom verifying a viral image needs some way to assess the probability that it is synthetic; detection contributes to that assessment. A forensic examiner needs a starting hypothesis; detection provides one.

The honest framing is that detection is probabilistic, not categorical. A detector that outputs "78% probability AI-generated" is useful as one input among several; the same output presented as "this is AI-generated" is misleading. The verification workflow page describes how detection fits into a multi-input verification practice.

The vendor landscape

As of mid-2026, the publicly available detection tools include OpenAI's DALL·E classifier (which OpenAI itself describes as limited), Hive's commercial detection API, Optic's reverse-search-plus-detection product, Sensity's enterprise platform, and several research-group detectors. Some major social platforms have internal detectors that are not exposed externally. The accuracy claims published by commercial vendors should be evaluated against the questions in the caveat above; many fail the cross-time and post-processing tests.

The browser-extension space includes several free detection tools, most of which use a single model trained on a specific dataset. They are useful for rough triage and unreliable for any decision with consequences. The recommendation that recurs in every responsible deployment guide is: never use a single detector's output as the basis for a publication, takedown, or legal decision.

Where the field is moving

The trajectory is toward defense in depth, not toward a single solved-the-problem detector. The academic literature has converged on the view that no general detector for all synthetic images will be reliable across model families and across time. The system-level answer combines watermarking from cooperative generators, C2PA manifests where they exist, classifier detection as backstop, reverse search for established imagery, and forensic analysis for the residual cases. This is the same defense-in-depth answer that runs through this site.

One promising direction is detection at the generation stage rather than at the consumption stage: prompts and parameters logged by the generator, combined with cryptographic attestation that the log is complete, can produce after-the-fact certainty about whether an image came from a specific generator. This is essentially what C2PA's AI-generation assertion provides when the generator cooperates. The next several years will reveal whether the regulatory environment makes such cooperation common enough that the residual unmarked-generation case becomes a small minority, in which case detection becomes a much more manageable problem, or whether open-weights deployment continues to produce a large unmarked-generation population, in which case detection remains the unsolved long tail it currently is.