Understanding the Technology

It helps to understand the history of translation technology and what what the future will be for translation software.

Translation Engine Comparisons

Over the years machine translation (aka translation software) was based on three different technologies, rule-based, statistical, and hybrid.

The software required massive amounts of data and extensive programming with thousands of instructions and variables to handle the complexities of each language. The systems were only as good as the amount of data input.

These technologies were effective in reducing translation costs but were expensive to develop and often required additional editing to improve the fluidity of the translation.

With the recent emergence of Ai technology it was soon applied to machine translation with great success.

Translation Technology

Machine Translation (MT) Technology over the Years

Rule-Based MT

RBMT uses built-in linguistic rules for each language and looks at the context of words and phrases to determine  translation results.

Statistical MT

SMT utilizes large amounts of existing human translations in both the source and target languages, then looks for matching patterns.

Hybrid MT

Combination of Rule-Based and Statistical technologies provided an improvement in accuracy and slightly more fluid translations.

Neural MT

NMT uses artificial intelligence  which results in accuracy levels that are much closer to humans in both accuracy and fluidity.

Artificial Intelligence - Neural MT

SYSTRAN Pure Neural MT (PNMT)

SYSTRAN has been the pioneer and recognized leader in language translation technology for over 50 years and continues to lead the way in the new era of artificial intelligence and natural language processing.

SYSTRAN released the World’s first Pure Neural MTengine in late 2016. The higher level of accuracy and fluidity of translations far surpassed the existing technologies and quickly became the focus for all development.
 

The accuracy continues to improve in with each release. SYSTRAN has over five decades of experience in developing the previous technologies, couple that with the vast amounts existing data, and their ability to provide the best in class for Ai powered translation technology is unsurpassed. No other company has the resources and technology that SYSTRAN does.

Machine Learning vs Deep Learning

You will hear the terms machine learning and deep learning often interchanged but there is a difference. Artificial intelligence technically uses both machine learning and deep learning.

Machine learning is where algorithms parse data, learn from that data, then apply what they have learned to make informed decisions, but if machine learning makes a mistake, it requires guidance from humans.

Deep Learning is a subset of machine learning that functions in a similar way but the capabilities are more advanced.

Deep learning uses a layered structure of algorithms called artificial neural networks (ANN) that actually mimics the neural network of the human brain.

An artificial neural network with more than two layers is called a Deep Neural Network (DNN). SYSTRAN’s Neural networks used for natural language processing have 8 to 20 layers.

These neural networks learn to make informed decisions, and if a mistake is made, it learns to self correct the mistake without human engineers providing input. Basically it learns on its own, very much like a human would. This is what makes it so powerful and better than machine learning alone.

Deep Learning and Machine Learning

True Ai is self learning. Automatically the neural networks can learn to correct themselves during the training phase which usually lasts a few weeks.

How SYSTRAN's Ai Understands the Context

Training the SYSTRAN Pure Neural Machine Translation (PMNT) Engine Using 3 Components

Word Embeddings

Powerful Language Map

During its training, the PNMT engine creates multi-dimensional spaces where all the vocabulary is stored in the form of numeric values. Words are grouped by meaning, grammar (verbs, nouns, pronouns, etc.), or any other commonality they have (negative/positive, masculine/feminine, etc.).

Each of these word specificities is a dimension. When performing a translation, the first task of the engine is to encode all the words in a sentence to numeric values and search for them in this word embedding space.

Recurrent Neural Networks

Contextul Knowledge

Recurrent Neural Networks (RNN) provides contextual knowledge and brings global consistency and fluency. Very much like the human brain would do, recurrence allows the PNMT engine to remember the other words of the sentence when having to decide how to formulate the translation and how to resolve ambiguities.

Basically it looks at the entire sentence to determine the context, and it’s learning to look at the entire document as the technology progresses.

Attention Model

Attention Capacity

Just as a human does when looking at an image or reading a sentence, we first focus on the key elements.

In the same way, the translation model focuses its attention on words which it considers to be the most important.

With attention models the software can accurately translate very long complex sentences.

SYSTRAN Neural Network Learned Syntax

The engineers at SYSTRAN have announced another breakthrough in the ongoing improvement of the technology.

Building additional custom dictionaries has always been a powerful feature in their previous rule-based, statistical and hybrid systems. With NMT the capability was limited by Neural Networks until recently.

SYSTRAN engineers have now trained the Neural Network to understand syntax so when entries in the customer dictionaries are encountered, it knows what to do.

The SYSTRAN PNMT engine has learned morphology, part of speech tagging and syntactic analysis. This innovation is delivering smooth grammatically correct translations with less effort by the user than previously required to build dictionaries.

Other systems do not have this capability, they are limited to a simple find and replace type of dictionary.

 

dictionary grammar rules
dictionary grammar

Specialized Models for Higher Accuracy

A Neural Network that’s trained to understand your industry and knows the terminology

SYSTRAN’s PNMT engines are trained on generic data so it meets the needs of a wider audience.

The neural network can be trained to understand industry specific related information and terminology. Much like a human translator may specialize in legal, banking, or medical translations.

SYSTRAN optimizes neural networks performance in a process of post-training called specialization. This method improves the translation quality significantly harnesses the true power of Ai technology.

We can offer these pre-built specialized models that can be applied to your software, choose from a wide variety of industries. Ask one of our sales reps for more information.

Specialized Model for Ai Translation

Ask about our marketplace that provides access to existing models pre-trained on industry terminology.

SYSTRAN modelStudio

Data Preparation

Start from your Translation Memory and upload your bilingual or monolingual in-domain corpus: a Spanish-English legal corpus for example and make it ready for model training. Your data remains entirely secured during the entire training process and not used for other purposes than your own model training.

Model Training

Building a translation model from scratch is an arduous task. Select within SYSTRAN’s large translation model catalog the model that you want to bring to the next level and start feeding it with your domain specific data. By specializing an already trained model, you will benefit from embedded UD Sampling, Augmentation, Filtering, Noising and Tokenization.

Model Evaluation

Evaluate your model evolution at each training iteration with SYSTRAN model scoring module. With SYSTRAN model studio, compare BLEU score evolution of your models on more than 50 test files selected by SYSTRAN and categorized by domains. You can also add your own test set to check model’s progress on your very specific domain.

Model Publication on SYSTRAN Marketplace

Once your in-domain model is ready, publish it to the SYSTRAN translation model catalog. Our worldwide community of professional end-users will be able to test your model and purchase it.
You can also choose to publish your model privately if you aim to build it for a client. Intellectual property of the model remains yours.

SYSTRAN OpenNMT Open Source Translation Technology

SYSTRAN is a pure player in machine translation technology for 50 years.With the emergence of Ai and Neural Networks the ability to share knowledge with with others is essential to maximize the potential.

Using this knowledge, SYSTRAN worked with Harvard NLP to create the OpenNMT, open source translation technology. This is a large community of business users and academics that are contributing to the the development of the technology.

SYSTRAN’s continues to support the R&D and is currently building an advanced end-to-end solution adapted to the professional market.

OpenNMT

Ready to learn more about SYSTRAN Products?

Learn how you can start saving money on translation costs and effectively translate the data you need, quickly and accurately.