End-To-End Neural Network Based Captcha Recognition


  • Jusin Jusin
  • Wilbert Harriman Universitas Pelita Harapan
  • Robin Robin Universitas Pelita Harapan




CAPTCHA, Deep Learning, Neural Network, Supervised Learning


Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is widely used as a security measure against spam and bot attacks via the Internet. CAPTCHA works by the assumption that it takes human sensory and cognitive skills (that are not present in computers) to successfully identify objects or letters within a noisy graphical environment. In this work, we propose a way to teach machines to recognize CAPTCHAs with deep learning. Our deep learning model uses a Convolutional Neural Network (CNN) encoder to convert CAPTCHA images into vector representations, followed by a Recurrent Neural Network (RNN) decoder to convert vector representations into text. Our model is able to achieve a validation accuracy of 90% after about an hour of training. Code is available at https://github.com/wilbertharriman/tf2-attention-captcha-recognizer.


[1] I. GoodFellow, Y. Bengio, and A. Courville, Deep Learning, Cambridge, MA: MIT Press, 2016.
[2] K. Xu, J. Lei, R Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", Proceedings of Machine Learning Research, 2015.
[3] D. Bahdanau, K. Cho and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate", arxiv, Cornell University, 1409.0473v7, 2016.
[4] A. Krizhevsky, I. Sutskever and G. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NeurIPS Proceedings, 2012.
[5] "What is teacher forcing" in Towards Data Science. [Online]. Available: https://towardsdatascience.com/what-is-teacher-forcing-3da6217fed1c.