Currently, AI based images are often generated from noise or a reference image using a phrase (prompt) to describe what should be visible in the image.
Sometimes the positioning of objects in such AI videos seems a bit arbitrary. Beautiful that may be and it may remind of works by M. C. Escher but I wonder if anybody has tried to use AI dreaming – not with one seed image but together with a semantic segmentation/labeling video tracker scheduled before the actual AI processing:
I currently have no time to proceed deeper into this topic but I think this algorithm could work:
- Segment and label video stream images
- Find pixel areas corresponding to certain objects or object parts in your images
- For example people, dogs, cars and parts thereof: heads, feet, tires, perhaps even eyes, noses …
- Possibly exchange or change/refine the labels of your tracked areas to your liking
- Add moods etc.
- Let the AI create new content for (some of) the found objects
- Using the new labels as prompts and taking into account
- The original image
- Previously created AI images
- The surrounding pixel areas
- Object velocities/optical flow
I think this way you could control a lot better where and how the AI dreams, especially since you enable it to know where a head is and where a leg or a foot, and it could move the objects perfectly with the video stream, too.
Possibly useful links in this respect as of April 2022: