Meeting I Brainstorming

Team Formation

We are a team of four, Erik, Claire, Lucky, and Wei where sketching, programming, and broad other design related skills are balanced. From the initial meeting, we quickly communicated that our expectation is maybe not to be the best, but definitely, we want a fine touch animation project generated at the end. Also, we laid a foundation that our project should be easy to be implemented while allowing us to explore different techniques to improve the final results.

Low-poly Floating Building
An example render of a low-poly style. Author: Behance https://www.pinterest.ca/pin/336855247109560667/

Quickly, we decide low-poly is the style we want to go with. Our rationale is that a realistic scene may look good but it lacks variety, i.e. the same scene done by team A will be similar to team B since essentially they result should be as close as to real life which can be dull if it is a daily scenario. Meanwhile, low-poly offers us a completely different view while still allow us to explore conventional techniques.

Initial Ideas

We brainstormed few directions where we may go. One central guide is that avoid complicated scenarios so we can be focusing on storytelling etc. One strategy is the main character can be a creature floating or moving mechanically like a machine so the moving animation can be simplified. Some generated ideas are as follow:

Intense Fishing

The story starts with an overview of a lake scene, two characters are introduced while the camera setting tends to lead the viewer thinking this is a fighting ground. At the climax, instead of start fighting, two characters actually sit down and start fishing. The intense of the scenario reduced dramatically toward the end to give audiences a surprise.

Alien Kidnap

The story starts with a fairly small setup within a normal room. The main character is sleeping, then aliens are moved inside from the window and kidnaped the character. At last, the character is sent back and put to sleep again after the alien experiment is done.

Snowman

Alien visit earth and found a snowman near a small remote town. The alien start exploring and interacting with the snowman. Eventually, the alien is alerted by dropping snow and accidentally shot the snowman then crashed it.

Conclusion

We liked the fishing idea due to the surprise it introduced. However, the fishing requires rendering of water and complicated movement which can be challenging to us. We also like the alien kidnap idea since the main advantage is the story happens within one small room and the alien will probably be floating rather than walking which is easier for us to model. Eventually, we selected the snowman idea since it combines the alien advantage while the snowman is easier to handle. We want to start the project with a relatively simple setup while gradually challenge ourselves moving on.

Getting Started

Future System

Steve readdressed our current development with few sample, also ran few tools we use to generate arts and videos. Eventually, a more clear image is set that in the future, our program should generate painting in real time based on captured video while responding to user emotion. Few issues are addressed:

  • Our current advantage in this field is generated art based on parameters which lead to being able to alter video results based on the input. Not implemented yet but possible
  • For a real-time application, performance is a key issue. Our current videos are not generated in real time but we do optimize the result by generating one result every three frames. Users had feedback that the art displayed in this rate is better visualized compare to one result per frame due to having more time to experience the work
  • Our current result is lacking temporal coherence. Users experience flicking, discontinuing of texture on moving object during the video. This issue might be resolved by applying generation based on optical flow. An interesting video can be found here:

Work to Do

Comparing to optical flow and temporal coherence, I’m more familiar with the solution to make our current video generating program able to respond to the parameter change. The task from now on is adding an API so the program can read parameters from a text file and output results accordingly.

Our current program generates frames from a video in a for-loop by calling a DeepDream routine:

DeepDream(Layer, Guide, Iteration, Octave);

The layer parameter controls which layer the neural network is considered to paint on the canvas, i.e. most likely how the final pattern looks like. The guide controls which portion of the nodes within the layer is prioritized. Iteration controls how heavy the output is altered, the more iteration the more deviation from the final result to the original input. The octave controls the size of the style patch. Larger octave will generate a larger patch.

First Step

Our program currently iterates through all frames by a same hardcoded set of parameters. The first step is updating iteration and octave based on keyframe interpolation. For example, if users assign iteration 10 at frame 1 and 20 at frame 5, then the program must interpolate at frame 3, iteration will be 15 if by linear. The interpolation can be linear, cubic etc. which is based on user choice. One issue here is the interpolation may generate float iteration or frame which must be rounded. This may lead to the final frames jump from one to another lacking of smoothness.

Second Step

The layer and guide should be updated dynamically similar to above. One key difference here is the generated result must be blended since the change will be dramatic. Steve proposed few simple alpha blending solution but I feel we can utilize more advanced blending based on total pixel difference which was covered in my previous computer graphics study. This part is more challenging and also more rewarding. If it can be solved properly, it can also help the float frame issue in step one, even resolve the flickering issue mentioned at the beginning.

Overall I’m quite happy that having a very concrete discussion with Steve today and sorted out the later tasks. Steve not only convey his idea but also talked about some very detailed implementation. I think in the future, only the high-level requirement is necessary to just save some effort. Plus, the alpha blending is really a relief for me since the expectation of our next implementation is not that high. Multimedia and image processing really laid a great foundation of the language we use here in computer graphics today.

First Meeting

Journey Start

My directed study was approved recently. The topic is about teaching A.I. mimic classical paintings such as Van Gogh, Monet etc. We briefly discussed current research of SFU iVizLab on this topic and visualized some of the generated results which are quite different to what I previously studied.

Deep Learning

Deep Learning
Deep Learning

In our research, deep learning is applied extensively. It consists of decision-making with the multilayer of nodes. One example from above figure can be “Identify Animals”. The first set of four nodes at left which is the first layer can have some very primitive questions, each answer of the question is a vertex between first layer node and second layer nodes. The last layer is the final answer which is what animal is this image represented. The weight of each vertex can vary to decide how accurate the answer from the previous question is. For example, questions can be as follows:

  • First Layer
    • The image has color?
    • The image has clear intensity change?
    • The image contains limited noises?
    • The image is in a reasonable size?
  • Second Layer
    • The image contains liquid structure like water?
    • The image contains column structure like legs?
    • The image subject is small?
    • The image subject has fur texture?
  • Third Layer
    • The image contains many leg structures?
    • The image contains what pattern on the fur texture?
    • The image contains eye structure?
    • The image contains nose structure?
  • Fourth Layer
    • The image is a portrait of a fish
    • The image is a portrait of an elephant
    • The image is a portrait of a dog
    • The image is a portrait of an insect

In this case, A.I. can answer these questions by layer. Positive answer of the image has no leg and the scene is about water may lead to the fish final answer while the image has a large subject with four legs may lead to elephant answer. Once the A.I. is trained, it can be asked to create a new creature. A fish-like new creature is much likely to live underwater and without leg based on the training experience. Apply this analogy to painting, to paint like Van Gogh, the A.I. is much likely to select the stroke and style where appears in the Van Gogh paintings. This is different than synthesis one image from another one since the A.I. is using its trained knowledge to create new things rather than patching from the old ones.

Source Input Quality Matter

When human artist paint a portrait, the painting is mixed with 1.) the actual appearance of the subject and 2.) the interpretation of the artist of the subject. However, since A.I. is less capable of understanding the subject, it generally treats everything equally. Thus, some flaw from the scene, like lack of ideal lighting condition, will be captured by A.I. where the human artist can auto correct it when painting.

Segmentation of Subject from Background

Similiar to the previous section, A.I. tends to treat everything in the scene equally. Thus, current A.I. is able to generate high-quality art creation but the subject and the background are all painted artistically. In an ideal setup, the subject should be more focused rather than the background. My reflection on this is a depth map can be generated from the scene based on perspective projection. Then subject can be separated from it. However, we currently use a different approach in iVizLab which I will be covered later.

Where Human is Looking At

To resolve the problem that A.I. is difficult to separate the subject from the background, one approach is focusing on the area where the human attention is usually at. In this case, the A.I. is trained on a special data to allow it predicts if a scene is displayed to a human, where the human will look at most. In general, where the human attention is at is usually the subject or area the artist want to stress. This experiment is done by the help of eye-tracking system.

Not Enough Training Data

One thing special of this research field is usually an artist will only leave so few works through his/her life. Thus, an A.I. cannot be trained efficiently with all these limited examples. One approach we introduce to resolve this problem is training the A.I. by patches from the painting rather than the entire one. In this case, a painting can generate hundreds of 256×256 patches or more if randomness is introduced. Then, the A.I. can be sufficiently trained. However, one problem is if the training is based on patches, the A.I. tends to paint the scene less based on the overall style of the artist which lead to other problems. My reflection is since we are essentially training the A.I. how to mimic a painting style rather than create a new painting style, why not interview the people who are good at fake other artist’s painting?