4.5

Visual indicators of AI generation

The visual tells that worked in 2023 do not work in 2026. The ones that still work are subtler, less reliable, and getting weaker every six months. This is a snapshot, not a permanent guide.

For two years after the diffusion-model breakthrough of March 2023, a small set of visual tells were broadly diagnostic of AI generation: extra fingers, garbled text, melting jewelry, asymmetric eyes, distorted reflections. Detection guides written in 2023 leaned heavily on these heuristics, and they worked well enough that an attentive viewer could spot a high percentage of synthetic images on first glance. By 2026, most of these tells have been substantially mitigated. The guide below documents what remains diagnostic, what has stopped being diagnostic, and the rate at which the remaining tells are eroding.

This page is the most time-sensitive content on this reference. Anything written here will need updating within a year, possibly within months. The point is not to provide an evergreen checklist but to give a current snapshot of which heuristics are still useful and to set expectations about the trajectory. For longer-term verification practice, the techniques that do not depend on visual artifacts — C2PA, reverse search, metadata analysis — are more durable.

What no longer works

The headline change since 2023 is that the most-cited visual tells have largely been solved. Modern diffusion models from the major commercial providers reliably produce:

Guides that lead with these tells produce false negatives on modern outputs. A reader applying a 2023 checklist to a 2026 synthetic image will plausibly conclude it is real because the checklist's failure modes do not appear. This is the routine problem with detection guides: they describe the previous generation of failures, which the next generation of models has fixed.

What still works, sometimes

A residual set of tells remains useful as of mid-2026, with the caveat that each is less reliable than it was a year ago.

Background inconsistency

Most current models produce strong foreground subjects and weaker backgrounds. Background elements often display geometric implausibilities: pillars that don't quite line up, ceilings that don't quite connect, windows that fail to have a coherent view through them. Crowded backgrounds, in particular — large groups of people behind the main subject, urban street scenes with many small details — frequently break down on close inspection. The faces in background crowds, secondary signage, and architectural detail are where current models still struggle.

Hand-object interactions

Hands holding objects continue to produce errors at a higher rate than hands alone. The intersection of fingers, object boundaries, and shadows is a frequent failure point. Specific patterns to watch for: fingers passing through objects, objects suspended without proper grip, shadows of held objects not matching the implied lighting.

Multi-person interaction

Two or more people in close interaction — hand-shaking, embracing, dancing — remain difficult. Limbs can interpenetrate or become tangled in ways that are visually obvious on close examination. Scenes with crowds of interacting people are still a reliable difficulty area, even when individual figures look correct.

Glasses and refractive surfaces

Eyeglasses, water surfaces, and glass distortions still produce errors at a meaningful rate. The model has to render coherent refraction across complex shapes, which is computationally taxing even for human artists and tends to fail in subtle ways: lenses that should distort what's behind them but don't, lens edges that don't match expected curvature, glasses frames that interact with hair in implausible ways.

Physical text in difficult positions

While general text rendering is much improved, text on curved surfaces (mugs, signs at oblique angles), text in low-contrast areas, and text in non-Latin scripts continue to fail at higher rates. Handwritten text on paper inside a scene is particularly unreliable.

Anatomical edge cases

Teeth in open-mouth poses, fingernails on detailed hand close-ups, ears on side profiles, and the inner structure of nostrils remain failure-prone. The patterns are subtle and require close inspection at high resolution to spot reliably. They are also exactly the kind of inspection that does not survive the typical viewing context of a thumbnail on a phone screen.

Statistical regularities under inspection

At a more technical level, generated images often have over-smooth textures in low-detail regions and over-uniform color distributions. Camera sensor noise has a specific distribution that AI-generated images do not match; histogram analysis can sometimes catch this. These are not visible at normal viewing distances but can be confirmed in forensic analysis. They overlap with the techniques on the image forensics page.

TellReliability mid-2023Reliability mid-2026
Extra fingersVery highLow
Garbled textVery highLow
Melted jewelryHighLow
Asymmetric eyesHighVery low
Background inconsistencyModerateModerate
Multi-person interactionHighModerate
Hand-object interactionModerateModerate
Glasses / refractionModerateModerate
Difficult-context textVery highModerate
Anatomical edge casesModerateLow
Caveat The presence of a tell is suggestive of AI generation; the absence of any tell is not evidence of human capture. Real photographs sometimes look uncanny; synthetic ones sometimes look perfect. Visual inspection is one input to verification, not a complete method, and its reliability is declining as the models improve. Treat the tells above as triage signals, not as verdicts.

What changes in the next year

The trajectory of model improvement is unkind to visual-tell-based detection. Each generation of models — both in the commercial cycle and in the open-weights world — has reduced the failure rate on the residual tell categories. The 2026-vintage tells described above will probably be substantially weaker by mid-2027. Multi-person interactions, hand-object handling, and complex background coherence are the active areas of model improvement, and the relevant tells will degrade as those areas improve.

The compensating change is that detection-aware viewers know more about what to look for than they did three years ago. Visual literacy with respect to synthetic images has improved in the population that handles them professionally; what has not improved at the same rate is visual literacy in casual viewers. The gap between expert and casual detection continues to widen, which has consequences for any verification strategy that depends on viewer judgment rather than tooling.

Looking at synthetic video

Synthetic video has progressed similarly to synthetic still imagery, with a lag of perhaps eighteen months. The 2024-vintage Sora and Veo models produced video with characteristic tells — temporal incoherence (objects appearing and disappearing between frames), physically implausible interactions (water flowing in the wrong direction, gravity inconsistencies), and characteristic motion artifacts. The 2025 and 2026 generations have substantially improved on temporal coherence; physical implausibility remains the most reliable tell in video.

Specific patterns to watch for in synthetic video: text that subtly shifts between frames, faces of background characters that change identity across shots, objects that don't quite track their motion, and the characteristic "everything is slightly slow-motion" feel of many diffusion-video models. These are present-tense observations and will need updating; video synthesis is moving faster than still-image synthesis right now.

Why this guide will not save you

The honest position on visual-tell-based detection is that it cannot be the primary defense against synthetic media. The tells are eroding too quickly, the inspection conditions in normal media consumption are too unfavorable (small screens, fast scrolling, attention divided across content), and the most-discussed tells are exactly the ones that model developers are most motivated to fix. A verification strategy that depends on visual inspection alone will degrade in usefulness every year.

The robust strategies are the ones that do not depend on the image looking wrong. Provenance signals that ride along with the image regardless of how it looks. Reverse search that establishes prior history regardless of visual artifacts. Metadata inspection that captures producer signals the visual content does not carry. The visual-tell layer remains useful as a triage step — if you see an obvious tell, you can short-circuit further investigation — but it should not be the layer your verification practice rests on.

Where the field is moving

The model improvement curve will continue. The tells will keep eroding. The verifier's tools will need to shift away from visual inspection toward credential inspection, source assessment, and forensic-mathematical analysis that does not depend on a viewer's perceptual ability. This is consistent with the general arc of the field: as generation becomes cheap and convincing, the verification answer moves from "does this look real" to "where did this come from." The visual-tell layer is the legacy of the early diffusion years and is on the way to being a historical curiosity in the way that some 1990s computer-graphics tells now seem quaint.