I develop EdTech solutions, mainly mobile apps. I have mobile app “Competitive Exam Preparation” available on Android and iOS platform. It is an Entrance Exam Preparation app for AIIMS MBBS, CAT, CLAT-UG, CA CPT, GK, GEP, GATE, JEE-Main, MHT-CET, NDA & NA and NEET exam. It has vital questions from previous year’s papers. It is a free and easy to use educational app, containing 13000+ wide collections of MCQ with answers.
Many users of the app use this app for government exam preparation. One of the user asked for support of this app in regional language, primarily user asked support for “Hindi” language.
So I started doing research on tools that will help me to convert English MCQs to Hindi.
First thing I looked at was Google Translate https://translate.google.co.in/
It was fast and accurate
Next I looked at Amazon Translate https://aws.amazon.com/translate/
Looked at batch translation job page and found it super easy to use.
All I needed to do was upload my data in file format specified in s3 and run asynchronous job (selecting source and destination language).
AWS batch translation was super easy, so I decided to use it to convert from English->Hindi.
I had my source data with 13000+ rows in JSON format, which I converted to .xlsx format (choose this format for conversion, so that it can be used to import to .sqlite database and use it in apps)
Converted all my 13000 questions with answers into single .xlsx file (Total input file size was less than 1.5MB).
Ran the asynchronous job on AWS translate for English->Hindi and got output in my s3 bucket (size of converted file was around 1.5 MB)
The job was executed very fast. I thought why not translate to all other regional languages of India? So I searched for “Marathi” In Target language window.
Created another Job with same source .xlsx file and this time selected Target Language as “Marathi”. Again same result, job was executed successfully and got my 13000 questions with answers in Marathi in .xlsx format.
Got excited, as this conversion was super fast and easy. Looked at all other Target languages option available and total 74 languages were available for translation.
I didn’t really needed all other translations for my app, but I thought there is no harm in having them ready in case we decide to release app in other languages.
So over next few hours I ran the same s3 input file and only changed target language option. After few hours now I had 74 translated .xlsx files available in my s3 bucket.
Why bill was so high?
When I executed all jobs, I made mistake of just counting number of rows in excel and size of excel (only 13000+ rows in excel and less than 1.5 MB size). Thing I forgot to look at, is how Amazon Translate charges for translation (which was clearly mentioned on home screen). Service charges 15$ per million input characters, on a pay-as-you-go basis.
Bill was generated with below amount
331,621,756.000 Characters were processed within few hours.
What I forgot to calculate was number of characters per question and number of translations. So for simplicity, lets assume there were 345 characters per question.
So (13000 questions * 74 languages * 345 characters per question) comes out to be 331+million characters!
It was amazing to see AWS was able to process all this in matter of few hours.
I had never used AWS translate before in batch or real-time.
If you are using a new cloud service, complete a POC with small subset. See results, specially see billing and then only run a large set.