Do you know how popular machine learning is nowadays? Probably yes. The topic is getting hotter and hotter every year. Are you aware that Silicon Valley is not the only place where some amazing things are presented to the world? Some time ago, I had a pleasure to attend “Polish View in Machine Learning”, conference held in Warsaw. Polish researchers have a lot of successes in machine learning fields – the aim of the conference was to gather them all in one place and share some experiences, especially in the context of emerging ML discipline, which undoubtedly is deep learning. I would like to describe and share some inspiring ideas and interesting research projects that were discussed during the conference.
Some of the lectures put emphasis on math standing behind all artificial intelligence methods, some of them gained lots of attention (reinforcement learning used to create bot playing Dota 2) and all of them made up a great conference.
Generative Adversarial Networks – K. Kurach
Generative adversarial networks are gaining a lot of attention recently. They are used to generate the artificial data which for example is helpful in semi-supervised machine learning (when you don’t have enough data to train your model). What are GANs in practice? Intuitively, we can think of them as of two-player games. The first player is represented by the discriminator network which aims to distinguish between real and fake example. The second one – generator – generates data example which should look like a real one. In the original paper (!!) the authors compared this game to bill forgery – generator (the forger) wants to create fake money while discriminator (the policeman) checks if it is valid money. What is the effect on training such a pair of neural networks? We can look at the latest Nvidia’s project:
All of those faces where generated artificially by neural nets.
The talk by Karol Kurach from Google Brain described efforts to create a large comparative study of generative adversarial networks. It is challenging to come up with a metric that would objectively show the capabilities of the architectures – looking at generated samples and comparing visual quality may be misleading. The Frechet Inception Distance was used to compare the ability to create comparisons between GAN models – it compares the embeddings taken from one of the Inception Net layers for both classes of images – real and fake-generated. The problem of comparing GANs is multidimensional – it requires some assumptions (about architectures, hyperparameters, datasets, random seeds and computational budget) to be taken upfront to make the comparison as fair as possible.
More detailed insights here.
Understanding how deep neural networks learn – S. Jastrzębski
The title of the talk may suggest some basics about how the deep networks work but the author decided to show recent research concerning the differences between memorization and generalization when training the models with Stochastic Gradient Descent.
One of the described ideas was the tradeoff between batch-size and minima type. Stanisław talked about the difference between sharp and flat objective function minima – networks trained with a large batch size tend to optimize in sharp minimum which is not desirable because of the higher precision needed to save the model and lower generalization capabilities, while the smaller batch sizes result in reaching flat minimum.
The author argued that it is not only the batch size that leads to a particular type of minimum – it is also the ratio of learning rate to batch size.
The way to test memorization behavior of the neural net is to analyze how good nets are in fitting training data when some percentage of the labels are drawn at random. The results shown by the author indicated that the low noise (understood as learning rate to batch size ratio) ends with lower generalization capabilities. (red – higher noise, blue – lower noise)
Topics on reinforcement learning – S. Sidor
Szymon representing OpenAI delivered a lecture on reinforcement learning – it’s a field in machine learning that does not use labelled data (you can read more on reinforcement learning in a blogpost from Peter). In reinforcement learning, we have an agent, which is learning from its experience – it operates through trial and error, and adjusts the behavior when achieving a state that is associated with defined reward.
OpenAI created a bot that learned how to beat best human-players in Dota 2. It learned how to play only by playing the game, no prior knowledge about the subject was introduced to the agent. The video below shows the behaviors that were learned.
The challenge was harder than most of the examples seen on the Internet based mainly on old 8-bit games – the environment in Dota is much more complex than any old games. The speaker also emphasized the importance of taking care of bugs which are really hard to track in ML software.
TrueSkill of bot over time of development:
There were more interesting talks – Architectures for big scale machine vision applications –by Zbigniew Wojna – the lecture went into some technical troubles thus the author did not have enough time to talk about all planned topics. Zbigniew talked mostly about text transcription. The project was related to updating Google Maps based on Google Street Map Imagery – the two main tasks were related to extracting street business names from images. He presented the neural net architecture that achieved very good results on French Street Name Signs dataset – it is based on CNN and RNN with a novel attention mechanism. Another talk was prepared by Krzysztof Choromański – Charming kernels, colorful Jacobians and Hadamard – minitaurs. The author showed some inspiring math ideas that stand behind all new machine learning concepts. Unfortunately, I could not attend the talk of Jan Chorowski and the discussion panel. Last, but not least – big Kudos to conference organizers– the event was organized really well and was very insightful – I would like to attend the next edition 😉