For two years after the diffusion-model breakthrough of March 2023, a small set of visual tells were broadly diagnostic of AI generation: extra fingers, garbled text, melting jewelry, asymmetric eyes, distorted reflections. Detection guides written in 2023 leaned heavily on these heuristics, and they worked well enough that an attentive viewer could spot a high percentage of synthetic images on first glance. By 2026, most of these tells have been substantially mitigated. The guide below documents what remains diagnostic, what has stopped being diagnostic, and the rate at which the remaining tells are eroding.
This page is the most time-sensitive content on this reference. Anything written here will need updating within a year, possibly within months. The point is not to provide an evergreen checklist but to give a current snapshot of which heuristics are still useful and to set expectations about the trajectory. For longer-term verification practice, the techniques that do not depend on visual artifacts — C2PA, reverse search, metadata analysis — are more durable.
What no longer works
The headline change since 2023 is that the most-cited visual tells have largely been solved. Modern diffusion models from the major commercial providers reliably produce:
- Hands and fingers with correct counts and reasonable anatomy in most poses. Unusual gestures and hand-overlap scenes can still produce errors, but the "always six fingers" pattern is gone.
- Readable text in signs, books, and labels. Earlier models produced near-text gibberish; modern systems produce text that reads correctly in English and many other languages, though specialized contexts (handwriting, complex typesetting) can still fail.
- Symmetric facial features including matching eyes, properly aligned ears, and consistent jewelry across left-right pairs.
- Coherent jewelry and clothing without the melting and dissolving that earlier models produced on small reflective objects.
- Eye reflections and catchlights that are physically plausible across multiple subjects in a scene.
- Hair-edge transitions that no longer show the rope-like clumping artifact of 2022-era models.
Guides that lead with these tells produce false negatives on modern outputs. A reader applying a 2023 checklist to a 2026 synthetic image will plausibly conclude it is real because the checklist's failure modes do not appear. This is the routine problem with detection guides: they describe the previous generation of failures, which the next generation of models has fixed.
What still works, sometimes
A residual set of tells remains useful as of mid-2026, with the caveat that each is less reliable than it was a year ago.
Background inconsistency
Most current models produce strong foreground subjects and weaker backgrounds. Background elements often display geometric implausibilities: pillars that don't quite line up, ceilings that don't quite connect, windows that fail to have a coherent view through them. Crowded backgrounds, in particular — large groups of people behind the main subject, urban street scenes with many small details — frequently break down on close inspection. The faces in background crowds, secondary signage, and architectural detail are where current models still struggle.
Hand-object interactions
Hands holding objects continue to produce errors at a higher rate than hands alone. The intersection of fingers, object boundaries, and shadows is a frequent failure point. Specific patterns to watch for: fingers passing through objects, objects suspended without proper grip, shadows of held objects not matching the implied lighting.
Multi-person interaction
Two or more people in close interaction — hand-shaking, embracing, dancing — remain difficult. Limbs can interpenetrate or become tangled in ways that are visually obvious on close examination. Scenes with crowds of interacting people are still a reliable difficulty area, even when individual figures look correct.
Glasses and refractive surfaces
Eyeglasses, water surfaces, and glass distortions still produce errors at a meaningful rate. The model has to render coherent refraction across complex shapes, which is computationally taxing even for human artists and tends to fail in subtle ways: lenses that should distort what's behind them but don't, lens edges that don't match expected curvature, glasses frames that interact with hair in implausible ways.
Physical text in difficult positions
While general text rendering is much improved, text on curved surfaces (mugs, signs at oblique angles), text in low-contrast areas, and text in non-Latin scripts continue to fail at higher rates. Handwritten text on paper inside a scene is particularly unreliable.
Anatomical edge cases
Teeth in open-mouth poses, fingernails on detailed hand close-ups, ears on side profiles, and the inner structure of nostrils remain failure-prone. The patterns are subtle and require close inspection at high resolution to spot reliably. They are also exactly the kind of inspection that does not survive the typical viewing context of a thumbnail on a phone screen.
Statistical regularities under inspection
At a more technical level, generated images often have over-smooth textures in low-detail regions and over-uniform color distributions. Camera sensor noise has a specific distribution that AI-generated images do not match; histogram analysis can sometimes catch this. These are not visible at normal viewing distances but can be confirmed in forensic analysis. They overlap with the techniques on the image forensics page.
| Tell | Reliability mid-2023 | Reliability mid-2026 |
|---|---|---|
| Extra fingers | Very high | Low |
| Garbled text | Very high | Low |
| Melted jewelry | High | Low |
| Asymmetric eyes | High | Very low |
| Background inconsistency | Moderate | Moderate |
| Multi-person interaction | High | Moderate |
| Hand-object interaction | Moderate | Moderate |
| Glasses / refraction | Moderate | Moderate |
| Difficult-context text | Very high | Moderate |
| Anatomical edge cases | Moderate | Low |
What changes in the next year
The trajectory of model improvement is unkind to visual-tell-based detection. Each generation of models — both in the commercial cycle and in the open-weights world — has reduced the failure rate on the residual tell categories. The 2026-vintage tells described above will probably be substantially weaker by mid-2027. Multi-person interactions, hand-object handling, and complex background coherence are the active areas of model improvement, and the relevant tells will degrade as those areas improve.
The compensating change is that detection-aware viewers know more about what to look for than they did three years ago. Visual literacy with respect to synthetic images has improved in the population that handles them professionally; what has not improved at the same rate is visual literacy in casual viewers. The gap between expert and casual detection continues to widen, which has consequences for any verification strategy that depends on viewer judgment rather than tooling.
Looking at synthetic video
Synthetic video has progressed similarly to synthetic still imagery, with a lag of perhaps eighteen months. The 2024-vintage Sora and Veo models produced video with characteristic tells — temporal incoherence (objects appearing and disappearing between frames), physically implausible interactions (water flowing in the wrong direction, gravity inconsistencies), and characteristic motion artifacts. The 2025 and 2026 generations have substantially improved on temporal coherence; physical implausibility remains the most reliable tell in video.
Specific patterns to watch for in synthetic video: text that subtly shifts between frames, faces of background characters that change identity across shots, objects that don't quite track their motion, and the characteristic "everything is slightly slow-motion" feel of many diffusion-video models. These are present-tense observations and will need updating; video synthesis is moving faster than still-image synthesis right now.
Why this guide will not save you
The honest position on visual-tell-based detection is that it cannot be the primary defense against synthetic media. The tells are eroding too quickly, the inspection conditions in normal media consumption are too unfavorable (small screens, fast scrolling, attention divided across content), and the most-discussed tells are exactly the ones that model developers are most motivated to fix. A verification strategy that depends on visual inspection alone will degrade in usefulness every year.
The robust strategies are the ones that do not depend on the image looking wrong. Provenance signals that ride along with the image regardless of how it looks. Reverse search that establishes prior history regardless of visual artifacts. Metadata inspection that captures producer signals the visual content does not carry. The visual-tell layer remains useful as a triage step — if you see an obvious tell, you can short-circuit further investigation — but it should not be the layer your verification practice rests on.
Where the field is moving
The model improvement curve will continue. The tells will keep eroding. The verifier's tools will need to shift away from visual inspection toward credential inspection, source assessment, and forensic-mathematical analysis that does not depend on a viewer's perceptual ability. This is consistent with the general arc of the field: as generation becomes cheap and convincing, the verification answer moves from "does this look real" to "where did this come from." The visual-tell layer is the legacy of the early diffusion years and is on the way to being a historical curiosity in the way that some 1990s computer-graphics tells now seem quaint.