Some Thoughts: AI Retouch

Image by user Rosykk on reddit: What a wonderful nature - Disco Diffusion V5 ( + VanceAI )
An AI generated image by user Rosykk on reddit

Currently, AI based images are often generated from noise or a reference image using a phrase (prompt) to describe what should be visible in the image.

Sometimes the positioning of objects in such AI videos seems a bit arbitrary. Beautiful that may be and it may remind of works by M. C. Escher but I wonder if anybody has tried to use AI dreaming – not with one seed image but together with a semantic segmentation/labeling video tracker scheduled before the actual AI processing:

An example of semantic segmentation/labeling by the Computer Vision Annotation Tool (CVAT)

I currently have no time to proceed deeper into this topic but I think this algorithm could work:

  1. Segment and label video stream images
    • Find pixel areas corresponding to certain objects or object parts in your images
    • For example people, dogs, cars and parts thereof: heads, feet, tires, perhaps even eyes, noses …
  2. Possibly exchange or change/refine the labels of your tracked areas to your liking
    • Add moods etc.
  3. Let the AI create new content for (some of) the found objects
    • Using the new labels as prompts and taking into account
    • The original image
    • Previously created AI images
    • The surrounding pixel areas
    • Object velocities/optical flow
An example for the generation of content by a segmented image: NVIDIA Canvas

I think this way you could control a lot better where and how the AI dreams, especially since you enable it to know where a head is and where a leg or a foot, and it could move the objects perfectly with the video stream, too.

Possibly useful links in this respect as of April 2022: