Blog

Archive by category "Blog"
Deep Style Networks:<br/> Famous Painters Make Concept Art for Quake

Deep Style Networks:
Famous Painters Make Concept Art for Quake

Wouldn’t it have been great to hire Pablo Picasso, Gustav Klimt, Suzuki Harunobu, Camille Corot or Jackson Pollock as art consultants for a game? Now, thanks to deep neural networks you can borrow some of their skills to help with things like Concept Art.

Less than two weeks ago, a new research paper was published entitled “A Neural Algorithm for Artistic Style” and lets you transfer aspects from one image to another. Generally speaking, it works by leveraging the neural network to find patterns in the images at multiple levels (e.g. grain, texture, strokes, elements, composition), then using an optimization process to help generate a new image from scratch. Using this technique, it’s possible to combine “style” (as understood by the neural network) with “content” from another image. Neural networks have become very good at recognizing patterns in images, so the results are higher quality than you might expect.

The rest of this article shows how screenshots from Quake I were combined with paintings from famous artists, along with an analysis and answers to common questions at the bottom. Why Quake? Because it’s (in)famous for its brownish colors, and maybe a little creative input could help improve that without going too far outside of color limits of the time… Let’s see how it turned out!

Frequently Asked Questions

  1. How did you pick the reference paintings?
  2. What’s the implementation you used to generate these images?
  3. Can this be made into a real-time effect via a shader?
  4. How did you adjust the parameters for the algorithm?
  5. This looks pretty great! Can I do it myself?
  6. These look like a terrible Photoshop filter. What’s going on?
  7. How is this going to affect the artistic process in the future?

Free Course In Autumn!

NOTE: Join our 10-week course about Artificial Intelligence and Machine Learning techniques applied interactively, in particular for games. It’s free if you sign-up now! Click for details →

Table Of Contents

Concept #1

Original Quake

Jackson Pollock

Charles Bargue

Pablo Picasso

Camille Corot

Gustav Klimt

Suzuki Harunobu

Concept #2

Original Quake

Jackson Pollock

Charles Bargue

Pablo Picasso

Camille Corot

Gustav Klimt

Suzuki Harunobu

Concept #3

Original Quake

Jackson Pollock

Charles Bargue

Pablo Picasso

Camille Corot

Gustav Klimt

Suzuki Harunobu

Frequent Questions

Q: How did you pick the reference paintings?

A: Given a specific painter name, their paintings were chosen automatically by a piece of code written for a bot called @DeepForger that applies a “style” to photos you submit by Twitter. The code searches through a large collection of paintings, and matches those paintings based on similarity (various metrics) with the original painting. Thanks to this code, there was no iteration on the painting selection; the first hit was the one I chose!

The process of finding paintings is very important to get good results from the original algorithm. If you browse the bot’s replies on Twitter you’ll see how certain results from images with extreme mismatches don’t always work out without a few iterations and many parameter tweaks.

Q: What’s the implementation you used to generate these images?

A: Even though the algorithm was only announced a few weeks ago, there are already multiple implementations of the algorithm: the first available, the most used, and one in Python (no doubt more). All of these are powered by CUDA for additional performance, though a CPU-based fallback is available when there’s not enough GPU memory (e.g. for high-resolution rendering).

Q: Can this be made into a real-time effect via a shader?

A: The images above are generated in two steps, first with 400 optimization steps on the GPU based on a randomly initialized image, at a resolution of roughly 720×486. (The exact resolution depends on the size of the reference painting, because it all has to fit in memory.) The GPU is a GTX970 with 4Gb of RAM, and renders those images in about 6 minutes. Second, for the benefits of this article, a post-process on the CPU at resolution around 1140×770 was also used to make the article look cool. (Again, the exact resolution depends on memory, this time 32Gb of system memory limit.) Some images were rendered at full 1280×720 while other were 1024×692 and scaled up. The CPU post-process is seeded with the results from the GPU, and runs for 200 iterations only, but takes just over an hour!

As it is now, this technology is not ready for realtime, and it’s likely a new algorithm is needed for this to run at 60 FPS anytime soon. Alternatively, you can just wait for Moore’s law to catch up!

Q: How did you adjust the parameters for the algorithm?

A: The default values from the code in repositories work well for their test images, but often require customizing to get specific results (portraits in particular take a couple iterations). Watching the output from the bot for a week provides a good sense of what works and what doesn’t!

In this case, all the images were generated with the same parameters. The goal was to bias the screenshot generation to feature more content from Quake and de-emphasize the often extreme style of the chosen artists. If you’ve used the bot, this is equivalent to the parameter ratio=2/1 which add emphasis on the content (from Quake) twice as much as the style (from painter).

Q: This looks great! Can I do it myself?

A: You can either submit images to the bot and have it process the results for you (NOTE: each image takes a while to compute so there’s a queue). Or you can setup the same Open Source projects and have it run on your GPU if you have at least 4Gb. The setup itself is a bit challenging, both because of the use of GPU (and CUDA) and libraries in out-of-the-ordinary languages that require downloading and compiling.

The other challenges involve finding reference images and tuning parameters, as mentioned above. With a bit of practice and watching the bot in action, you’ll get there relatively quickly. Having a large library of reference art certainly helps, and that’s partly what the bot is there for!

Q: These look like a terrible Photoshop filter. What’s going on?

A: If you’ve seen a Photoshop filter that can output the images above with the exact same parameters each time, and provide results in 6 minutes or less, then we want to know! Of course it’s possible to improve each of these images, but they’re still useful as inspiration and an insight where the technology is going.

As far as the algorithm itself, it has a few deficiencies. It’s the first know general “style transfer” algorithm and a novel application of neural networks, so it’s safe to expect many improvements in the future. In particular, the various layers of patterns found by the neural network (e.g. grain, strokes, elements, composition) are optimized separately from each other when in fact the strokes should depend on higher-level patterns too. In short, the semantic information already existing in the neural network should be put to use better.

Q: How is this going to affect the artistic process in the future?

A: It’s still very early and it’s hard to say. However, the quality of the output from this tool, given the time taken to compute the results (minutes) makes it extremely valuable already as a source of ideas. In the near future these techniques may be able to re-purpose and restyle existing textures, so switching from photo-realistic to a cartoon style could be a matter of spawning a few cloud instances and generating new textures.

The future powered by machine learning looks bright, and any tool that can improve the creativity and productivity of artists is (mostly) very welcome!

Free Course In Autumn!

NOTE: Join our 10-week course later this Autumn about Artificial Intelligence and Machine Learning techniques applied interactively, in particular for games. It’s free to participate if you sign-up now and follow along with the course as it takes place for the first time! Click for details →

The Secret Manual to Creating Deep Forgeries

The Secret Manual to Creating Deep Forgeries

This guide will get you started with @DeepForger (a Twitter bot by @alexjc) that paints your photos using techniques from famous artists. It’s still alpha and we’re aiming to make this process as smooth as possible, but you’ll still need to supervise and sometimes help guide the process…

  1. First, look at the best forgeries so far by browsing through favorites (these are manually curated), and see what combinations work for you.
  2. To create a forgery, send a Tweet directly to @DeepForger and *attach* your photo. By default, the bot will try to surprise you with a semi-random style!
  3. There’s a queue but you can get a priority boost by Following the bot, or posting another Tweet linking to your original submission (aka. “Quote RT”).

IMPORTANT: Keep in mind that the artistic process is often based on iteration and incremental improvement, so it may take you 2 or 3 attempts to get what you want!

1. Tags:   #

TL;DR: Add single-word tags with the hash prefix to your request, e.g. #portrait #landscape.

The bot is still learning to understand photos and paintings — let’s not even mention Art — so you can help get better results by tagging your submission. Here are some of the most useful and common tags:

  • #portrait — A human face makes up most of the photo.
  • #people — There are humans standing in the picture.
  • #landscape — The photo includes hills, mountains, trees, grass, villages.
  • #buildings — The picture includes close buildings.
  • #funny — If it’s a meme picture or something new/weird for entertainment.
  • #nsfw — Tasteful material that’s not safe for work. No porn allowed!

The tags are used as hints to help with automatic setting of the remaining parameters. As we integrate other forms of machine learning into the system, the need to use tags should disappear! (Also, if you don’t include the #nsfw tag when appropriate or submit pornography or close-up photos of genitals, you’ll get blocked.)

2. Re-Submitting…

TL;DR: Type the exact word resubmit (sent to @DeepForger, of course) as a reply to your original request Tweet to try again with new parameters.

There are two ways re-submission operates, depending what message you reply to:

  1. Randomize — If you reply resubmit to your first & original submission, then the bot will re-run the algorithm that matches paintings and come up with a completely new one.
  2. Experiment — If you reply to an existing forgery with resubmit, the bot will pick the exact same painting and try again with the new settings.

You will most likely need to use both types of re-submission to get the results you want.

3. Commands:   /

TL;DR: Use exclamation marks to make commands to the bot and request a /sketch or /abstract forgery.

Commands are simplified requests that will set all the other filters and parameters appropriately. Here are the ones currently available:

  • /sketch — Request the use of pencil, pen, ink, chalk or charcoal on almost any medium.
  • /abstract — Make something more abstract as an interpretation of the current photo!
  • ... — We’re still working on adding more to this list. Suggestions welcome.

4. Filters:   + or –

TL;DR: Use specific single-word ±keyword filters to request painters like +Picasso or media like +pencil.

  • +Lautrec — Only include matching artists like Toulouse-Lautrec.
  • +watercolor — Only include paintings whose medium matches watercolor.
  • -Gogh — Exclude artists whose name match, specifically Vincent van Gogh.
  • -ink -pencil — Exclude all paintings whose medium match either ink or pencil.

NOTE: While author names are often correct in the database, the medium isn’t always perfect and this may cause a small subset of paintings to match and occasionally incorrect. You can combine any number of filter, and the bot will tell you when it doesn’t find a match.

Lost Interview about Style Networks
and the Deep Forger Bot

How does Deep Forger generate its paintings?

Well, Deep Forger is made up of three parts, and only one of them is a deep neural network. There are many neural networks available for things like image recognition, and the one behind Deep Forger is cutting-edge but now widely used: it’s called VGG and has 19 layers, you can even download it in a 574Mb file! Sitting on top of this neural network is an optimization algorithm based on a research paper called A Neural Algorithm of Artistic Style — which allows you to combine different features from two images and generate a third output image. (This was published less than two weeks ago!)

The Deep Forger was the first bot to make this algorithm available to non-programmers. To make it more accessible it has some extra components that make it particularly interesting and easy to use! One of these components is a library of thousands of famous paintings, serving as a reference when forging the user’s photos; Deep Forger has some understanding of the paintings and will try to find those that best match the content. It achieves this via an AI Curator that helps find the best painting and parameters, then a tongue-in-cheek AI Critic that gives its assessment of the quality before posting. Both of these AIs are learning from the feedback received on Twitter (Favs or Retweets); it’s still early days but results are getting visibly better.

What are Deep Forger’s capabilities?

The idea is to lower the barrier of entry for non-programmers and make it as simple as possible to commission Generative Art: you send the bot your photo along with a description of what you want, and it’ll return a painting that it generated! By default, it will pick randomly one of the famous painting it knows from its database, but you can also use commands like “!sketch” “!stylized” or “!abstract” to commission a particular type of painting.

We let users also pick which painters they want to see using keywords, so “+Cassatt” (to add Mary) or “-Picasso” (to remove Pablo) will filter the list accordingly. So far, users have created new avatars or background images, or simply posting cat pictures and internet memes—obviously!

How is the system trained to generate paintings?

The “teaching” can be broken down into three parts again, and was the combined effort of many researchers and developers in the field. First comes the training of a deep neural network (“VGG” as mentioned previously) on an image recognition problem. This takes a very long time and the best GPUs you can buy, but it’s now a relatively standard optimization process and you can even download pre-trained versions. The second process is the algorithm that generate an image from scratch (based on the Neural Style[2] paper): using features found by the neural network, it optimizes your photo (changing the pixel colors) to keep its content but also match the style of another image. Third, there’s a component that selects the right image, scale and parameters to match your photo and help generate good results. This is done by an algorithm that compares features of the image in the database and matching them with the features in the image—for example the color scheme. This is where the social network comes in; the code gathers statistics from Twitter that it uses to influence the algorithm and adapt over time.

What was the inception of the art project?

The whole machine learning (ML) community is booming right now and the field combining ML with generative art is just emerging! A few months ago it was Deep Dreaming, then applications in Recurrent Neural Networks, but with the recent publication of the “Neural Style” there’s more potential for the tools to be helpful for artists—and usable by general users too. As soon as the paper was published, I knew there’d be an army of hobbyists and professionals working to replicate the results. And indeed, within a week, there have been almost half a dozen, in particular the one by Kai Sheng Tai (first available) and by Justin Johnson (powering Deep Forger).

Even before the code was available, I speculated what the missing components would be and how I could best contribute to the field. Putting up a bot on Twitter generating near-HD images seems an obvious step, as it would help the community quickly explore the medium and figure out what the technology is capable of. The idea of gathering an encyclopedic database of art was a good starting point, and is turning into a great platform for helping understand what “Style” actually is and how you can explain the style of various painters. For example, it’s possible to gather the common style (or average of all styles) used by Picasso in all his paintings, and apply that to your photos! Getting feedback on Twitter also will help the bot understand which styles are preferred by followers and how they combine with specific content. There’s really a lot to do here—and we’re only just getting started :-)

What are, to you, fascinating capabilities of current AI?

The most promising idea is that neural networks may help artists be more creative or productive, for example by helping fill in the gaps and perform repetitive work, or coming up with creative suggestions. For example, multiple game developers have been using Deep Forger to try out Art Style experiments and see possible styles/elements that they might not have thought of!

Another very interesting aspect is that machine learning can eventually capture, for example, the craft of Rembrandt or Mary Cassatt: the grains of their canvas, the strokes of the paint, their common elements, and aspects of the painting composition. Neural Networks cannot replicate the Artist, but we may be able to get an additional, unique understanding of Fine Arts thanks to these techniques. It’s fascinating for a variety of reasons — not least copyright — but that’s a discussion for another time!

Neural Doodles:<br/> Workflows for the Next Generation of Artists

Neural Doodles:
Workflows for the Next Generation of Artists

Recent advances in machine learning have opened Pandora’s box for creative applications! Find out about the technology behind #NeuralDoodle and how it could change the future of content creation. (This research was possible thanks to nucl.ai [1].)

Last year, with Deep Dream (June 2015) and Style Networks (August 2015), the idea that deep learning may become a tool for Art entered the public consciousness. Generative algorithms based on Neural Networks so far haven’t been the most predictable or easiest to understand, but when they work — by combination of skill or luck — the quality of the output is second to none!

It took until 2016 for those techniques to be turned into tools that are useful for artists, starting with this paper we call Neural Patches (January 2016) that lets the algorithm process images in a context-sensitive manner. Now, when style transfer techniques are extended with controls and annotations, they can process images in a meaningful way: reducing glitches and increasing user control. This is our work on Semantic Style Transfer (March 2016), which can be applied to the same applications as before, as well as generating images from rough annotations—a.k.a. doodles.

WORKFLOW MOCKUP

Iterating on a scene inspired by Claude Monet.

A significant amount of work in deep learning today is spent on command lines and editing text files. That’s how the tools operate and where we spend most of our time too! Now with algorithms like #NeuralDoodle becoming useful as tools, it’s time to start thinking more broadly to make them more widely accessible.

This prototype video shows how such algorithms could be integrated into common image editing tools, in this case GIMP. This tool doesn’t exist (yet), it’s there to help you visualize how these tools could evolve.

See for yourself in this hybrid system where the human makes doodles and the machine paint high-quality version at regular intervals on request.

This workflow mockup was created by a human working in GIMP using reference art from one of Renoir’s paintings, as you see in the video. However, the image was regularly saved then processed with our implementation of semantic style transfer. The resulting output from the tool was then edited back into the screen capture.

It’s not real yet, but students are already looking into this integration! With machine learning, the future is always closer than you think ;-)

ITERATION THAT WORKS (SLOWLY)

As always with advanced tools, things may not work out as expected the first time. Thanks to the annotations in the semantic maps, however, it becomes possible to iterate to get the desired results. Currently it takes 3-5 minutes to generate a single HD 720p image from scratch, depending on the combination of images.

There’s certainly a lot of room for improvement in performance—after all, the underlying algorithm is a brute force matching of neural patches—but the basic workflow is in place. When reusing previous images, and only repairing select parts of the image, things may speed up to the point of being almost realtime. Also, expect advances in machine learning to speed up the computations and reduce the workload required over the next year or two.

As for the iteration that’s already possible (slowly), here’s an example based on a Monet painting from l’Étretat.

At each stage the algorithm is doing the same thing, transferring style from one image to another on demand using annotations. Only the inputs to the algorithm changes, in this case, the doodles. Result gets better as the human addresses problems with the previous iteration, and the synthesized image converges to something that can be painted successfully based on the reference material.

You may notice a few things from these prototype images, which reveals fascinating insights about the algorithm and how it operates:

  1. The first image (top left) was generated from incorrect annotations: sky patches incorrectly render above every patch of sand. The second image tries to remove the left sandy ledge.
  2. In the third image (top row), the top arch is blurred by sky texture. This can happen if the source painting doesn’t have any reference material showing how to paint this.
  3. The fifth image (bottom row) removes the arch for better results, but the left cliff looks rather bland with a repeating texture—similar to the original painting.
  4. The last image fixes this by adding some darker rock patterns, and also removes the sand at the base of the arch in the sea to increase the feeling of depth.

Here’s the final set of images for this particular synthesized image; (left) the annotations for the painting, (middle) Monet’s original painting, and (right) the doodle for desired image.

QUALITY & CONSISTENCY

Beyond the workflow, it’s important to emphasize the benefits of having a tool that can consistently generate images at this level of quality from so little input. It may not match the original Renoir, but as placeholder art for many games, simulations, visualizations this is more than acceptable. It may even be good enough to ship ;-)

This is a 720p rendering generated by the implementation behind @DeepForger—the first online service to offer both Style Networks and now Neural Patches to end-users on social media. It’s quickly become of our favorite renderings of all time!

Synthesis #1 Synthesis #1
Renoir6_social Doodle #1


Again, when errors occur, it’s no longer a problem: it’s possible to extend the source material so the Neural Patches algorithm can find a match while painting, or use some annotations and manually fix the results thanks to semantic style transfer.

CONCLUSION

Deep learning and neural networks are going to fundamentally change the way create, and the tools we interact with will become smarter too. It’s a guessing game to predict exactly how, but here you saw a mockup of a tool that would be possible to build within the next year only!

In the meantime, here are some great places to continue learning about the topic:

  1. Find and star the neural-doodle repository on GitHub. The code is well commented ;-)
  2. Go and read our research paper on arXiv and dig into the technology in more depth.
  3. Visit us in Vienna for the nucl.ai Conference on July 18-20! First speakers will be announced soon.

You can find us as @nuclai on Twitter, via Facebook, or see our website. Thanks for reading!

[1] This research was funded out of the marketing budget for the nucl.ai Conference 2016, our event dedicated Artificial Intelligence in Creative Industries. It’s better this way, right? ;-)

Minecraft, ENHANCE!<br/> Neural Networks to Upscale & Stylize Pixel Art

Minecraft, ENHANCE!
Neural Networks to Upscale & Stylize Pixel Art

How about taking pixelated graphics and using a neural network to increase their resolution, using example photos or textures? We attempted it for Minecraft with an open source project… (This research project was possible thanks to nucl.ai [1].)

Just over a month ago, we released Neural Doodle: a deep learning project to transfer the style from one image onto another. The script allows anyone to reuse existing Art and Photos to improve their two-bit doodles. There’s now a growing number of developers experimenting and posting their results — which inspired the work in this article!

Neural Doodle is in fact a very simple project with 550 lines of code, powered by deep neural network libraries like Lasagne and Theano. However, the algorithm can be used in a variety of different ways: texture synthesis, style transfer, image analogy, and now another: example-based upscaling.

MINECRAFT PIXEL ART

Lets start with some examples! The following 512x512 textures were generated by running Neural Doodle on the GPU. Here’s the core command line:

python3 doodle.py --style Example_Stone.jpg --seed Minecraft_Stone.jpg \
                  --iterations=100 --phases=1 --variety=0.5

The input examples are a variety of textures collected by Image Search. They are not shown in full due to Copyright questions, but you can find them again yourself easily!


After about five to ten minutes — depending on the style and the speed of your GPU — the script will output the following images. (CLICK & HOLD THE THUMBNAILS TO COMPARE.)

Using the exact same code, you can also do the same for dirt textures. Here are the example photos which were also found from Image Search.

Again, after running during your lunch break or overnight, you’ll end up with synthesized textures like this. (CLICK & HOLD THE THUMBNAILS TO COMPARE.)

Note that these dirt textures are more organic than the stones and it’s harder to see the original pixels in the final images. This is done on purpose, see alternative images below that have less variety but more visible pixels.

HOW IT WORKS

Under the hood, Neural Doodle is an iterative algorithm, which means it performs many incremental refinements to an image that we call frames; each one gets a step closer to the final desired output. Each step, the algorithm matches “neural patches” from the desired style image and nudges the current image in that direction (i.e. gradient descent, for those of you familiar with optimization). You can stop the process at any stage if you’re happy with the quality — but it usually requires 100 steps.

Depending on how you use the script, it starts with different types of seed images: random noise, the target image, or hand-crafted seed. In the case of Minecraft, the pixelated art is taken as the seed at the target resolution (e.g. 512x512) and the optimization adjusts each pixel towards the target style—neural patch by neural patch.

Neural networks have the advantage of better understanding image patterns, gained from learning to classify images. In this case, we use a convolution network called VGG by the University of Oxford. It was trained over millions of images, and thanks to this, can blend patches better than an algorithm operating on individual pixels.

» The code is open source and available on GitHub; the main script is around 550 lines. It’s well commented too to help you figure out what’s going on!

FAILURE CASES

The algorithm is not perfect and obviously has some failures too… Here are three of the main ones!

1) Reasonable Textures, Unsuitable Results

Some of the failures depend entirely the input images: if the style is inappropriate for the problem at hand, nothing will fix it. These particular textures look good, but don’t match very well with the pixelated input texture. This chosen style didn’t match very well with the original structure of the image. In this case, the only way to fix it is to go back to find better reference textures!


2) Repeated Patterns (Before Fix)

The problem of repeated patterns is often cited as a flaw with the original algorithm that we call Neural Patches. In the process of rendering these images, in particular the grass, we fixed this in neural-doodle: you can now use the --variety parameter to encourage the code to use a wider diversity of patches.

All of the images rendered here had some additional variety, typically 0.5 or even 1.0. Usually, the optimization is seeded with random noise, which helps encourage using a wider variety of patches. When using pixelized Minecraft textures as the seed image there’s less randomness, so you need this extra parameter for the results to shine!

3) Physically Implausible Sections

The original patch-based image processing algorithms (see this overview) tend to exhibit glitches either when patches don’t match very well or patches are blended awkwardly. Using Neural Networks helps blend the patches in a more sensible fashion, but doesn’t help (yet) when patches are missing and the match quality is low.

An answer to this may be using a pair of neural networks that are called generative adversarial networks. One network would learn more about the image patches (trying to predict plausible image sections) and the other is used to detect if those patches are plausible enough. See this very recent paper on the topic!

Visualizing Patch Variety

The top images are the original images, generated by matching only the nearest neural patches and then generating the image based on those. The bottom images are generated by forcing the algorithm to pick a wider variety of patches, which means the results are mare organic and creative—but at the cost of the image looking different.

The patch diversity code works by measuring how similar the style patches are to the current image, then giving the worst-matching patches a boost while best-matching patches are punished. This levels the playing field so a bigger variety of patches are selected — depending on the user specified parameter. We think it looks good!

Conclusion

The applications for these techniques are already very promising! But as these algorithms improve, you’ll be able to apply upscaling and stylization to entire screens or world maps that mix a variety of different source textures. For this, the implementation needs a few changes to split up the image and patches into chunks—especially for those with 1Gb or 2Gb GPUs. This would also allow scaling up to larger textures efficiently without requiring top of the range 12Gb cards! (Watch this GitHub Issue for progress.)

I’m sure you’ll agree it’s an incredible time to be involved in Creative AI. The core algorithm that generated all these images was published in January 2016, our version has been open source and generally usable for over a month, and significant improvements we made to the output are dated just yesterday! There’s so much low hanging fruit, things are moving incredibly fast and it’s inspiring.

» Want to learn more or join the community, see you in Vienna at nucl.ai Conference, July 18-20. Also feel free to post a comment or question on CreativeAI.net!

Alex J. Champandard


[1] This research was funded out of the marketing budget for the nucl.ai Conference 2016, our event dedicated Artificial Intelligence in Creative Industries. It’s more constructive this way, right? ;-)

Extreme Style Machines:<br/> <small>Using Random Neural Networks to Generate Textures</small>

Extreme Style Machines:
Using Random Neural Networks to Generate Textures

Wait, what! Generating high-quality images based on completely random neural networks? That’s the unreasonable effectiveness of deep representations…
(This research project was possible thanks to the nucl.ai Conference [1].)

Synthesizing high-quality images with deep learning currently relies on neural networks trained extensively for classification on millions of images. It takes days! Most neural networks used for classification are very big too, which makes the generative algorithms even slower. It’s a huge problem for adoption since it takes a lot of resources to train these models, so scaling quality up and down is difficult.

While trying to train a more efficient neural network for Neural Doodle using current best practices, I stumbled on a random discovery… I found I could use completely random neural networks as feature detectors and still get compelling results! Think of this as a form of reservoir computing similar to Extreme Learning Machines — which has known limitations, but could help out here.

After investigating this, I identified that two big architectural decisions were required for this to work at all. The details follow below, but here’s the punchline:

  1. Exponential Linear Units as activation functions.
  2. Strided Convolution as down-sampling strategy.

The underlying image generation algorithm is from a paper I call Neural Patches — which is basically brute force nearest neighbor matching of 3×3 patterns — but using the neural network’s post-activation outputs (conv3_1 and conv4_1) rather than doing operations in image space. This tends to improve the quality of the results significantly, and you’ll see both good and bad examples below.

WARNING: This research was done in less than 24h. I’m writing this blog post already because I Tweeted about early results and everyone is excited — so I can’t keep this contained ;-) If you’d like to collaborate on further research and writing a full paper, let me know!

Experiment Setup

The main focus of this report is on example-based texture generation. The neural network is given an input photograph (grass, dirt, brick) and must re-synthesize a new output image from random noise. This is a great application because it’s the easiest problem in image synthesis and it only involves one loss component so it’s nice and simple too! Other types of image synthesis don’t work quite as well with these Extreme Style Machines (yet?).

48 units,  3x3 shape              # conv1_1
48 units,  3x3 shape              # conv1_2
80 units,  2x2 shape, stride 2x2  # conv2_1
80 units,  3x3 shape              # conv2_2
112 units, 2x2 shape, stride 2x2  # conv3_1
112 units, 3x3 shape              # conv3_2
112 units, 3x3 shape              # conv3_3
176 units, 2x2 shape, stride 2x2  # conv4_1
176 units, 3x3 shape              # conv4_2
176 units, 3x3 shape              # conv4_3

For all the other parameters, they are as follows: there were a total of four scales processed using --phases=4, each with ten optimization steps as --iterations=10. Style weight was set to hundred, higher than usual with --style-weight=100.0 and the patch variety was set to a small amount for visual fidelity --variety=0.1. Total variation smoothing was set to very small value --smoothness=0.1.

Weight Initialization

The weights are initialized to default of the Lasagne library, which is GlorotUniform. See the source code for more details, or read the full paper on the subject.

No experiments were performed on the type of weight initialization yet, though any approach that helps add more diversity to the weight matrices will improve the overall quality. This points towards orthogonal initialization strategies as a great option!

Source Code, Images, Scripts

You can find the code in the random branch of the Neural Doodle repository. Here are the details for downloading the images and running the script.

python3 doodle.py --style GrassPhoto.jpg --output GrassTest.png \
        --iterations=10 --phases=4 \
        --style-weight=100.0 --variety=0.1 --smoothness=0.1

Activation Function

As each image is processed by the neural network, the values accumulated in each neuron are passed through a non-linear function called an activation. This section compares the effect of different activation functions: Rectified Linear (the standard approach that VGG uses), Leaky Rectifier (used in recent adversarial architectures, but a very leaky version), and the most recent Exponential Linear Units. I was experimenting with ELU due to the beautiful function shape that looks suited to image synthesis, and the fact it could reduce the need for batch normalization in deeper networks.

Rectified Linear

Average Pool Maximum Pool Strided Convolution

Very Leaky Rectifier

Average Pool Maximum Pool Strided Convolution

Exponential Linear

Average Pool Maximum Pool Strided Convolution

Speculative Explanation

It seems ReLU units discard too much information by setting the output to zero if the input is negative. LReLU — specifically the very leaky variant — do much better because no information is lost. However, ELU seems to do even better because the distribution of the output is more balanced by the time it reaches layers conv3_1 and conv4_1, as mentioned in the original paper.

This generative experiment is a good way to visualize just how good ELU is at doing its job! It’s possible batch normalization would work similarly, however in the generative setting there are no batches and it seems significantly easier to use the right activation function in the first place…

Down-Sampling Strategy

When images are processed by neural networks, the internal representation gets smaller and smaller as is propagates through the network. This section now compares the different ways of down-sampling the activations at regular intervals in the deep network. The different options are average pooling (often used in generative architectures), max pooling (often used in classification), and strided convolutions (recent favorite to avoid pools).

Average Pooling

Rectified Linear Very Leaky Rectifier Exponential Linear

Maximum Pooling

Rectified Linear Very Leaky Rectifier Exponential Linear

Strided Convolution

Rectified Linear Very Leaky Rectifier Exponential Linear

Speculative Explanation

Averaging the activations causes the output to become blurry, particularly as the network gets deeper. The combination of input and random weights likely causes the activation values to converge to a constant value (0.0) as more depth is added. Max pool makes the output more crisp, but the diversity in the activations is probably also lost—which reduces the quality of the patch matching. Strided convolutions work because the connections in its weights are also random.

Unit Numbers

As you can see from these trial runs, the number of neurons affects quality incrementally. With fewer neurons, the results are less predictable and certain patterns are not correctly “understood” by the algorithm and become patchy or noisy.

Full Units

Quarter Units

Half Units

Network Depth

The following experiments were run with half of the units to see if depth improves the quality or degrades it. Overall, from this informal analysis, it seems like additional layers degrades the quality.

Double Layers (Half Units)

Triple Layers (Half Units)

Technical Summary

The idea of deep random feature detectors is appealing: it avoids a lot of intensive training and helps adapt existing models to new constraints quickly, as well as scale up and down in quality on demand. Taken further, it could certainly help apply example-based neural style transfer techniques to domains where classifier models are not as strong as in the image domain.

Of course, there are known theoretical limitations to the “Extreme Learning Machines” model (see this reply by Soumith). Adding depth doesn’t help much when first layers already have high-frequency coverage thanks to randomness, compared to deep learning with gradient descent that typically works better with depth. However, there’s likely a balance to be found using trained models (time consuming, significant investment) and replying on random networks (fast setup, low cost).

Short term, here are specific things to take away from this experiment:

  • Strided Convolution has become more popular for a variety of reasons, it helps significantly here.
  • Exponential Linear Units are well placed to become the default activation for generative models.
  • Neural Architectures used for image processing have a very strong and useful prior built-in!

Random neural networks don’t seem work very well for more advanced image synthesis operations like style transfer or even high quality Neural Doodles; more research is required. However, for operations like example-based super-resolution of images this could also work very effectively.

Alex J. Champandard

Addendum

This particular project has been fascinating! It’s been less than 24h since I randomly stumbled on a surprising result, and after Tweeting about it (with follow up), I was almost arm-wrestled into doing more experiments and posting the results. It’s not yet clear if and how more complex style transfer operations could work reliably for images or other media, but it’s a fascinating line of research nonetheless.

The process itself has also been interesting for me personally: while I specifically set out to investigate Exponential Linear Units using best practices like strided convolution, I never expected this to come out of the process. It feels appropriate to finish with a quote from Louis Pasteur:

“In the field of observation, chance favours only the prepared mind.”
— Louis Pasteur