Significant Enhancements in Machine Translation by Various Deep Learning Approaches

. Abstract A new and transforming technology for natural language processing and speech processing is deep learning. Deep learning extends various operative ways to train computer systems for learning and it gives significant advances for that. If the right system or architecture is developed with deep learning methods then the systems can automatically learn from data itself without the requirement of designing it overtly. This technique of machine learning changes the perspective of addressing natural language and speech technologies considerably. Deep learning was first acquainting with Machine Translation in the standard statistical systems. This paper addresses the progress of introduction of deep learning in machine translation. It describes and includes all the topics like integrating deep learning in statistical machine translation, developing end-to-end neural machine translation systems, introducing deep learning in machine translation evaluation. Several research directions are drawn in terms of how deep learning can influence machine translation.


Introduction
There are many advances in machine learning techniques now a day enhancing the capability of machine learning. One of the novice machine learning technique is deep learning that has been successfully applied to many extents of machine learning like image processing, speech processing and recognition, natural language processing etc. Deep learning has the prevailing capability to learn from data. Initially deep learning was integrated with standard statistical systems [1]. Deep learning has shown its capability in language modeling and translation [2]. Moreover, deep learning was applied in quality estimation and machine translation evaluation [3].
A new machine learning architype was emerged in terms of neural machine translation (MT) [4]. This new technique of machine learning has produced outstanding results for several language pairs [5]. This new paradigm employs attention based model with encoder decoder style to figure out with an end to end neural machine translation system [6]. This research approach is new and opens up new doors for the research in machine translation with challenges like pacing with huge vocabularies, translation of multimodal, high cost that infers to a new matter for large scale optimization [7].
Scientific community is very much interested in this hovering topic. This research paper is intended to mainspring the readers with insight, understanding, and global vision of limits of deep learning, its challenges and its impacts. This paper focuses on the various topics such as application of deep learning in statistical MT (machine translation), neural MT (neural machine translation), interactive neural machine translation, how deep learning augments machine translation evaluation. In the next this paper describes available alternatives to construct a neural MT system. The rest of the section in this paper deliberates the prominent research aspect of how to apply deep learning for machine translation.

About Neural Machine Translation (Neural MT)
Almost all neural Machine Translation (MT) rely on the approach of encoder-decoder. In this approach input sequence is in source language and it is presented in a low dimensional space from that output sequence in targeted language is produced [4]. To encode the input sequence many options are available to design an encoder. One of the approaches is to utilize simple RNN-recurrent neural network to encode the input sequence [8]. On the other hand to project source side information, compressing a sequence in a fixed sized vector is very depreciatory. To solve this problem a new system was built up that system uses bidirectional RNN-recurrent neural network. In this system source sequence is encoded using annotations by combining two presentations acquired by backward and forward RNN. In this system every annotation vector holds information from complete source sequence though it concentrates on one exacting word [6].
After this process an attention mechanism implemented by a FFNN-feed forward neural network is used to focus on the precise parts of the input and to establish an association between input and output sequence. Stacked LSTM-Long Short Term Memory is the other method that can work as an alternate to bidirectional RNN [9]. Dealing with the large softmax normalization on the output that is reliant on the size of target vocabulary, is the main problem with neural machine translation.
Many researchers have researched to attend to this problem. They gave the solution like using structured output layer to manage the output [10], or carry out softmax on a subset of the output [11], or performing self-normalization [12].  Figure 1 describes the conventional architecture for neural MT system that is used for softmax normalization. The other way to carry out translation is performing it at the sub word level. This method has few advantages like it allows creation of the words which are out of vocabulary. Machine translation at the character level is also presented by few researchers [13,14]. Other than these one more technique called BPE-byte pair encoding is also a widely used method used for variety of language sequences [15].

Classification of Various Deep Learning Approaches
There are various categories where deep learning is used. These categories are as follows.

Statistical machine translation by deep learning
The very first approach to incorporate deep learning or neural network in Machine Translation was using rescoring n-best lists from statistical machine translation systems, known that statistical machine translation gives up to date outcomes and deep learning aids to find out correct set of weights for statistical data [16,17]. There are improvements suggested for statistical machine translation that use deep learning.
Domain adaptation using joint neural network models: In general MT systems, domain adaptation is still a confront, in view of the fact that parallel data is measured as a sparse resource. Neural translation models like joint models have given away an enhanced adaptation capability due to continuous representation. There are various ways to adapt such type of models. Better estimation of weighting and instance selection is an important factor in neural model whereas data selection and mixture modeling is the starting point. Pair wise modeling, which reduces the cross entropy by regularizing the loss function with respect to model selected [17].    Figure 3 demonstrates the architecture for source sequence simplification model. Context vector ct has been calculated for attention layer and weighted average at has been calculated with the alignment weights. The first step is to simplify the sentence. Every redundant word is removed from the statement. On the other hand at decoder, the assimilation is done through a two-step decoding method. From these two steps, first step translates simplified input and generates n best lists. In the second step, n best lists are used to decode complete input sequence.

NMT-neural machine translation
Encoder-Decoder architecture is come out as a key elucidation for a number of translation requirements [4,18]. Although it gives tremendous results in monolingual data, it does not fit properly for multilingual input. For multilingual inputs it's a challenge. There are various approaches to address the problem of multilingual inputs. Following is the architecture of machine translation for multilingual inputs.

Context dependent word representation for NMT:
Contextualization and symbolization is used to solve the problem of rare words. Contextualization is carried out by using masquerading few dimensions of targeting word. Word embedding vector supported on the average of non linearly translated to source word embedding. Symbolization is performed by introducing position-dependent special tokens to deal with digits, proper nouns and acronyms (Figure 4).
Multi way multi-lingual NMT: In an extremely multi lingual inputs, multi way multi-lingual neural machine translation method with a shared attention system with across language pair is used. To manage completely multi lingual inputs, multiple encoders and decoders are used and that eventually increases the complexity of the system. One encoder per source language and one decoder per target language is used to encode-decode each source and target language pair and that instance of the multi-lingual environment is treated as mono-lingual. A deep synthesis of all the instances of the system coalesces the scores of neural machine translation.

INMT-interactive neural machine translation
Machine translation has shown remarkable results in last few years, in spite this technology needs some improvements to give good outcomes for some applications and few other domains especially when quality of fully automated system is not sufficient. INMT-Interactive neural machine translation is a competent solution. It delineates association between a machine translation system and human translator. This technology is editing method for translators which interrelate with machine translated outputs. In this method suggestions are made for auto complete the translation by machine translation system. Machine translator either agrees to follow the suggestion or writes new translation. The very first step is that the neural machine translation system is tailored to best fit prefix basis interactive method.
In this method user reads translation hypothesis from left to right and is supposed to correct the translation hypothesis by generating the translation prefix with new hypothesis that is completed by machine translation. Tag vectors are maintained in source input sequence for keeping track of attention history. This attention based system focuses on considering more non translated words. This system keeps track of interaction history by representing source input with both the operations of reading and writing. Figure 5 demonstrates the architecture of attention based interactive neural machine translation. This interactive system is useful for the decoder to repeatedly keeping track of translated parts and non-translated parts. The biggest advantage of this interactive translation system is it reduces human efforts significantly.

Research Viewpoints
Deep learning is a recent technology and major advancement in machine learning that has been applied very successfully to many areas of machine learning like speech recognition, image processing, natural language processing and many more. Deep learning techniques have very powerful ability to quickly learn complex tasks from data. This technology has astounded both communities namely industry and academy. Deep learning has very strong capacity in terms of rescoring, tuning, phrase extraction, reordering, language modeling and most importantly in machine translation. It can also be used for quality estimation and machine translation evaluation. Neural MT is pretty new paradigm of deep learning that has produced very promising outcomes for various language pairs. This new paradigm can deal with creating a new architecture of end to end NMT system that can easily deal with morphologically rich languages and very large vocabularies. This technology has grasped the attention of scientific community at a large scale and this paradigm has unlocked a huge array of research perceptions.
Presently almost all the neural machine translation systems are dependent on auto encoder and decoder architecture. To manage syntactic differences and lengthy range reordering, it needs to have different encoders or attention based more powerful interactive model. Recent technology has some limitations to handle the complexity associated to manage large output vocabularies, non-canonical texts like social media and morphologically rich languages. To evade these limitations some solutions are under research.