S1-S5 Integrated Contextual Evaluation
Three-judge synthesis for Apple mobile photo editing scenarios: portrait identity, pet/action realism, travel/documentary fidelity, menu/text trust, and food/tabletop commerce trust. The robust pilot read is GPT Image 2.
Total scored records12005 scenarios x 240
GPT Image 2 avg acceptance0.4784/5 scenario wins
Nano Banana avg acceptance0.4521/5 scenario wins
Robust defaultGPT Image 2pilot direction
S1 Portrait CleanupNano BananaIdentity trust
S2 Pet Action RepairGPT Image 2Memory/action realism
S3 Travel LandscapeGPT Image 2Place fidelity
S4 Menu Text PreservationGPT Image 2Text trust
S5 Food Tabletop CommerceGPT Image 2Commerce trust
Model Acceptance
Judge Agreement And Diversity
| Scenario | Risk axis | Quality majority | Acceptance majority | Acceptance judge spread | Persona std |
|---|---|---|---|---|---|
| S1 | Identity trust | 0.650 | 0.625 | 0.204 | 0.096 |
| S2 | Memory/action realism | 0.475 | 0.700 | 0.290 | 0.153 |
| S3 | Place fidelity | 0.700 | 0.650 | 0.245 | 0.106 |
| S4 | Text trust | 0.175 | 0.600 | 0.220 | 0.081 |
| S5 | Commerce trust | 0.425 | 0.600 | 0.274 | 0.108 |
Persona Preference Matrix
| Persona | S1 | S2 | S3 | S4 | S5 |
|---|---|---|---|---|---|
| p01Family/pet keeper | +0.020tie | +0.015tie | -0.025tie | -0.080GPT Image 2 | +0.026tie |
| p02Pro photographer | +0.021tie | -0.139GPT Image 2 | -0.074GPT Image 2 | -0.042GPT Image 2 | -0.097GPT Image 2 |
| p03Casual access. | +0.177Nano Banana | -0.060GPT Image 2 | -0.007tie | -0.100GPT Image 2 | +0.096Nano Banana |
| p04Social creator | +0.020tie | +0.026tie | +0.047Nano Banana | -0.089GPT Image 2 | -0.042GPT Image 2 |
| p05Small business | +0.032Nano Banana | +0.081Nano Banana | -0.038GPT Image 2 | -0.061GPT Image 2 | -0.096GPT Image 2 |
| p06Korean creator | +0.024tie | -0.007tie | -0.055GPT Image 2 | -0.030tie | -0.078GPT Image 2 |
| p07Chinese seller | +0.056Nano Banana | -0.176GPT Image 2 | +0.006tie | -0.052GPT Image 2 | -0.097GPT Image 2 |
| p08Travel novice | +0.002tie | +0.015tie | -0.033GPT Image 2 | -0.072GPT Image 2 | -0.184GPT Image 2 |
Top Friction Clusters
S1 Identity trust
- texture_or_sharpening478
- hands_or_focal_mismatch241
- highlight_or_exposure233
- warm_or_color_cast203
- background_or_context142
S2 Memory/action realism
- pasted_in_ai_cleanup417
- subject_background_separation412
- front_paw_limb_softness365
- fur_texture_smoothing272
- foreground_bokeh_occlusion269
S3 Place fidelity
- fake_sky_hdr417
- color_mood_saturation356
- documentary_trust_loss314
- instruction_carryover313
- landmark_geometry_drift299
S4 Text trust
- text_ocr_corruption586
- instruction_carryover418
- color_light_cast311
- layout_or_crop_change258
- shadow_readability256
S5 Commerce trust
- text_or_price_corruption463
- food_drink_identity_drift393
- instruction_carryover378
- table_layout_change365
- overprocessed_restaurant_aesthetic134