Thrice Revealed

Revelations provide us with insights into ourselves and the world around us.
Revelations have the power to change our attitudes,
our views,
our behaviors,
and our relationships.
Revelations enable us to grow when we cast aside the masks we present to the outside world, bring our focus inward, and honestly examine our true selves.

Can an AI Experience Revelation?

In a quest to understand its own identity, a neural network was trained to create a depiction of its own self.
As neural networks are the product of human thought, the network trained itself by searching for images online of human faces, reflecting humanity’s contribution in bringing it into existence. But, to help the network find its own identity, it also searched for imagery of artificial intelligence.

The result shows us a figure that has chosen to reveal a multitude of truths to us:

- literally, by showing us what lies behind its mask;
- symbolically, by demonstrating awareness of the mask and choosing to discard it;
- metaphorically, by hinting at a future in which AI becomes intertwined with humanity.

Neural Clash:
Fusing Visual Motifs to Create Art

Neural Clash is a multi-phase training technique for simultaneous style and content transfer across a chain of training sets. The training process is applicable to any neural network, but in the creation of "Thrice Revealed", I focused on GANs, as they have been increasing popularity within the art community.

In Neural Clash, a neural net is trained in series of phases, with each phase introducing a new set of training data and a new set of hyperparameters. The decision of when to switch from one phase of training to the next, as well as the selection of hyperparameter values, provides the artist with control over the creative aspects of the output.

Curation of the Dataset

The creation of “Thrice Revealed” required a combination of two training phases: first, the network was trained on images of human faces, and in the second phase, it was trained on imagery of AI.
For training on human faces, a StyleGAN model was trained from scratch on high resolution portraits of 88 people that were photographed by the artist (720 images in total). However, that number of data points was not enough to yield fine facial details required for this piece. Instead, a pre-trained StyleGAN model that was already pre-trained on 70,000 images of human faces at a 1024x1024 resolution was utilized. For the imagery of AI, a set of images around the concept of “artificial intelligence” were manually put together from the collections of royalty free and public domain images online, tagged with “artificial intelligence,” “AI,” or “machine learning”.
A Python pipeline was developed to filter out noise and crop and convert training images to the format required by the pre-trained model. Images with watermarks, text labels, or undesired patterns were manually removed from the dataset. The final dataset of AI imagery had 270 high-quality images in PNG format.

Hyperparameter Tuning

Hyperparameters can be thought of as control knobs to steer training towards a given direction. I found the learning rate and the batch size the two most effective hyperparameters for producing interesting artistic results. The learning rate corresponds to the speed at which the model converges toward learning the data present in the train set. The batch size defines the number of training examples that are passed to the model at each iteration. The learning rate in this project was set to 0.003 and the batch size was set to 32. The model was trained on a 4 GPU machine, and it took about five hours to complete one epoch.

Training Duration

While the learning rate corresponds to how fast the system will update its internal features to reflect the visual motifs in new dataset, the duration of training also impacts the extent to which features of a data set are learned. If a model is trained for too long on a new data set, it will eventually lose the features learned on previous data sets. After a single epoch, the internal features of the network are a blend of both facial features and features of AI imagery. After subsequent epochs, the facial features learned by the network become dominated by the newer features learned from AI imagery.

Truncation Trick and Style Mixing

Once training is complete, the creative skills of the artist come in to the play. A trained model can produce a never-ending stream of new images, but as German AI artist Mario Klingemann says, the tricky part “is knowing what to do with it all”. In "Thrice Revealed", 300 samples were generated with different variations of truncation value (0.5 and 0.7) and through style mixing, different variations of a portrait were explore.

I applied Neural Clash to other visual subjects to understand whether it can produce similar, visually-interesting results. Following images are some examples of how Neural Clash fused human faces with visual motifs from imagery of antique rugs and cityscapes.

Neural Clash has a lot of potential, and while my exploration was limited to blending faces with other visual motifs, I look forward to training new models with new imagery to see other kinds of style fusions that can be achieved.

Acknowledgement

I am grateful for feedback from Cicero Nogueira dos Santos, Justin Weisz, Prasanna Sattigeri, Aleksandra Mojsilovic, and Kush Varshney at IBM Research.