History, Current Status and Future Directions of Deep Learning

Deep learning is a subset of machine learning, and machine learning is also a subset of artificial intelligence (AI). The biggest difference between machine learning and deep learning is that in the learning of artificial intelligence models, machine learning basically requires a human feature extraction process before learning, but deep learning does not require this process and the original data is directly used as input. The development of deep learning coincides with the development of artificial neural networks (ANNs), and many people have contributed to the development of artificial neural networks for decades. The following five models are the representative architectures most widely used in deep learning. That is, Deep Feedforward Neural Network (D FFNN), Convolutional Neural Network (CNN), Deep Belief Network (DBN), Autoencoders (AE), and Long Short-Term Memory (LSTM) Network. A convolutional neural network (CNN) is a feedforward NN composed of a convolutional layer, a ReLU activation function, and a pooling layer. CNNs provide properties of weight sharing and local connectivity to process high-dimensional data. In dental and medical fields, an AI model that can be interpretable or explainable (XAI) is needed to increase patient persuasiveness. In the future, explainable AI (XAI) will become an indispensable and practical component in order to obtain an improved, transparent, secure, fair and unbiased AI learning model.

keywords: Artificial Intelligence (AI), Deep learning, Artificial Neural Networks (ANN), Convolutional Neural Network (CNN), Explainable AI (XAI)

참고문헌

1. McCulloch, W., and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133.

2. Hebb, D. (1949). The Organization of Behavior. New York, NY:Wiley.

3. Rosenblatt, F. (1957). The Perceptron, A Perceiving and Recognizing Automaton Project Para. Cornell Aeronautical Laboratory.

4. Widrow, B., and Hoff,M. E. (1960). Adaptive Switching Circuits. Technical Report, Stanford University, California; Stanford Electronics Labs.

5. Ivakhnenko, A. G. (1968). The group method of data of handling; a rival of the method of stochastic approximation. Soviet Autom. Control 13, 43–55.

6. Ivakhnenko, A.G. (1971). Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybernet. SMC-1, 364–378.

7. Minsky, M., and Papert, S. (1969). Perceptrons. MIT Press.

8. Werbos, P. (1974). Beyond regression: new tools for prediction and analysis in the behavioral sciences (Ph.D. thesis), Harvard University, Harvard, MA, United States.

9. Werbos, P. J. (1981). “Applications of advances in nonlinear sensitivity analysis,” in Proceedings of the 10th IFIP Conference, 31.8–4.9, New York, 762–770.

10.

10. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernet. 36, 193–202.

11.

11. Fukushima, K. (2013). Training multi-layered neural network neocognitron. Neural Netw. 40, 18–31. doi: 10.1016/j.neunet. 2013.01.001.

12.

12. Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U.S.A. 79, 2554–2558.

13.

13. Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning representations by back-propagating errors. Nature 323, 533–536.

14.

14. Le Cun, Y. (1989). Generalization and Network Design Strategies. Technical Report CRG-TR-89-4, Connectionism in Perspective. University of Toronto Connectionist Research Group, Toronto, ON.

15.

15. LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521:436.

16.

16. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551.

17.

17. Hochreiter, S. (1991). Untersuchungen zu Dynamischen Neuronalen Netzen. Diploma, Technische Universität München 91.

18.

18. Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Comput. 4, 234–242.

19.

19. Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9, 1735–1780.

20.

20. Hinton, G. E., and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science 313, 504–507. doi:10.1126/science. 1127647.

21.

21. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012a). ImageNet Classification with Deep Convolutional Neural Networks. Curran Associates, Inc.

22.

22. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012b). “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 1097–1105.

23.

23. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2672–2680.

24.

24. Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification. 2nd Edn. Wiley.

25.

25. Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257.

26.

26. Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L. (2017). “The expressive power of neural networks: a view from the width,” in Advances in Neural Information Processing Systems, 6231–6239.

27.

27. Bottou, L. (2010). “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010 (Springer), 177–186.

28.

28. Mayr, A., Klambauer, G., Unterthiner, T., and Hochreiter, S. (2016). Deeptox: toxicity prediction using deep learning. Front. Environ. Sci. 3:80. doi: 10.3389/fenvs.2015.00080.

29.

29. Mayr, A., Klambauer, G., Unterthiner, T., Steijaert, M., Wegner, J. K., Ceulemans, H., et al. (2018). Large-scale comparison of machine learning methods for drug target prediction on chembl. Chem. Sci. 9, 5441–5451. doi: 10.1039/C8SC00148K.

30.

30. Scherer, D.,Müller, A., and Behnke, S. (2010). “Evaluation of pooling operations in convolutional architectures for object recognition,”in International Conference on Artificial Neural Networks (Springer), 92–101.

31.

31. Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv [Preprint]. arXiv:1408.5882. doi: 10.3115/v1/D14-1181.

32.

32. Yang, Z., Dehmer, M., Yli-Harja, O., and Emmert-Streib, F. (2020). Combining deep learning with token selection for patient phenotyping from electronic health records. Sci. Rep. 10:1432. doi:10.1038/s41598-020-58178-1.

33.

33. Yin, W., Kann, K., Yu, M., and Schütze, H. (2017). Comparative study of cnn and rnn for natural language processing. arXiv [Preprint]. arXiv:1702.01923.

34.

34. Yoshua, B. (2009). Learning deep architectures for AI. Foundat. Trends Mach. Learn. 2, 1–127. doi: 10.1561/2200000006.

35.

35. Sarikaya, R., Hinton, G. E., and Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 778–784. doi:10.1109/TASLP.2014.2303296.

36.

36. Mohamed, A.-R., Dahl, G. E., and Hinton, G. (2011). Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22. doi: 10.1109/TASL.2011.2109382.

37.

37. Zhang, S., Zhou, J., Hu, H., Gong, H., Chen, L., Cheng, C., et al. (2015). A deep learning framework for modeling structural features of rna-binding protein targets. Nucleic Acids Res. 43:e32. doi:10.1093/nar/gkv1025.

38.

38. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P. -A. (2010). Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408. Available online at: http://www.jmlr.org/papers/v11/vincent10a.html.

39.

39. Deng, J., Zhang, Z.,Marchi, E., and Schuller, B. (2013). “Sparse autoencoder-based feature transfer learning for speech emotion recognition,” in 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (IEEE), 511–516.

40.

40. Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A., et al. (2016). “Variational autoencoder for deep learning of images, labels and captions,” in Advances in Neural Information Processing Systems, 2352–2360.

41.

41. Gers, F. A., Schmidhuber, J., and Cummins, F. (1999). Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471. doi: 10.1162/089976600300015015.

42.

42. Biran, O., and Cotton, C. (2017). “Explanation and justification in machine learning: a survey,” in IJCAI-17Workshop on Explainable AI (XAI). Vol. 8, 1.

43.

43. Doshi-Velez, F., and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv [Preprint]. arXiv:1702.08608.

44.

44. Holzinger, A., Biemann, C., Pattichis, C. S., and Kell, D. B. (2017). What do we need to build explainable AI systems for the medical domain? arXiv [Preprint]. arXiv:1712.09923.

45.

45. Cohen, G., Afshar, S., Tapson, J., and van Schaik, A. (2017). Emnist:an extension of mnist to handwritten letters. arXiv[Preprint]. arXiv:1702.05373. doi: 10.1109/IJCNN.2017.7966217.

46.

46. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature 518:529. doi: 10.1038/nature14236.

47.

47. Arulkumaran, K., Deisenroth, M. P., Brundage, M., and Bharath, A. A. (2017). Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26–38. doi: 10.1109/MSP.2017.2743240.

48.

48. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., andMeger, D. (2018). “Deep reinforcement learning that matters,” in Thirty-Second AAAI Conference on Artificial Intelligence.

49.

49. Henaff, M., Bruna, J., and LeCun, Y. (2015). Deep convolutional networks on graph-structured data. arXiv [Preprint]. arXiv:1506.05163.

50.

50. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2019). A comprehensive survey on graph neural networks. arXiv [Preprint]. arXiv:1901.00596.

51.

51. An, J., and Cho, S. (2015). Variational Autoencoder Based Anomaly Detection Using Reconstruction Probability. Special Lecture on IE 2.

52.

52. Doersch, C. (2016). Tutorial on variational autoencoders. arXiv [Preprint]. arXiv:1606.05908.

53.

53. Emmert-Streib F, Yang Z, Feng H, Tripathi S, Dehmer M. (2020). An Introductory Review of Deep Learning for Prediction Models With Big Data. Front Artif Intell. Feb 28;3:4. doi: 10.3389/frai.2020.00004.

바로가기메뉴

논문 상세

60권 5호

인공지능 딥러닝의 역사와 현황, 그리고 미래 방향

History, Current Status and Future Directions of Deep Learning

Abstract

참고문헌