independent-research

Home Independent Research Next

Index

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015)

Motivation

“Conventional machine-learning techniques were limited in their ability to process natural data in their raw form”, hence required expertise in feature engineering. In this paper, the concept of Deep Learning, example architectures as “Converlutional Neural Network” (CNN) and “Recurrent Neural Network” (RNN) was introduced to discover intricate structures in high-dimension data, which requires litter engineering by hand.

Approach

CNN

One example architecture of Deep Neural Networks, which ususally consists of “Input Layer”, “Convolutional Layer”, “Pooling Layer” and “Output Layer”. While converlutional and pooling layer can be repeated several times, the output is ussually followed by a fully connected network for classification problem. For most cases, Relu function is used as activation in convolutional layer and Max pooling is used in pooling layer. During the learning process, filters will be learnt and feature maps will then be generated, which is the output of representation learning.

CNN has achieved good performance in Computer Vision area.

RNN

Another example architecture of Deep Neural Networks, which involveds sequential inputs, such as speech and language. “RNNs process an input sequence one element at a time”, also maintain a “state vector” that implicity contains all the historial information. An unfold RNN could be considended as a deep multi-layer network.

As it is not applicable to store information for very long, “Long Short Term Memory” (LSTM) was proposed to solve the problem, which contains “a special unit called memory cell act like an accumulator or a gated leaky neuron”. Meanwhile, There are other augment RNNs with a memory module such as “Neural Turing Manchine” and “memory networks”. These models are being used for tasks need reasoning and symbol manipulation.

Limitation

  1. Training a deep learning tasks is very expensive, usually needs GPU.
  2. Traning a deep learning task needs a lot of annotations (labeled data), which means labour work.
  3. This paper mainly focuses on supervised learning, we need better solutions on unsupervised tasks.

Ideas

  1. For those expensive tasks, maybe we can select instances, or make use of distributed computing.
  2. For the lack of labeled data, we may look into transfer learning
  3. For unsupervised learning, maybe we can somehow combine reinforcement learning with deep learning. Also, we can look into attention layer, which may be replacement of reinforcement learning.