'Translation of HTML content in Google Sheets

I am using the following Google Sheets formula to translate some fields containing HTML tags:

=GOOGLETRANSLATE(A2, "en", "de")

However, the translation results in a messed up HTML and extra spaces between tags opening or closing, especially if there are many nested tags.

For example:

<div> <p>paragraph text</p> </div>

will result in:

<div> <p> Absatztext </ P> </ Div>

Sometimes, the translator changes the tags opening and closing and put extra spaces between some attributes also the closing tags letters are in uppercase.

Issues like:

<p> Absatztext <P />

<a href = " # "> Link </ A>

Sometimes, text are added before the tag closing

<h2 Was> ist Pilates? </h2> 
it should be:
<h2> Was ist Pilates? </h2> 

enter image description here

Demo here: https://docs.google.com/spreadsheets/d/11MOZjTknFGdwuAp6g3VUa0o5OQaW44hxN2uEvqnL3jw/edit?usp=sharing

How can I fix those problems?



Solution 1:[1]

try simple fix like:

=LOWER(SUBSTITUTE(GOOGLETRANSLATE(A1, "en", "de"), "/ ", "/"))

0


UPDATE:

=SUBSTITUTE(A1,  TRIM(REGEXREPLACE(A1, "</?\S+[^<>]*>", )), 
 GOOGLETRANSLATE(TRIM(REGEXREPLACE(A1, "</?\S+[^<>]*>", )), "EN", "DE"))

enter image description here

Solution 2:[2]

If you don't mind doing it in a single formula and just want to solve the problem, you could try splitting it and only translating what's not an HTML tag.

Put all HTML tags and closing tags in a separate sheet, so you can check for them. I'll put mine in 'tags'!A1:B128.

Considering you have your original text in A1, you can split it up by < and >:

=SPLIT(A1,"<>")

then on a line below (or elsewhere, for me it'll be A2) you can check if the first word in each cell is found among the tags with:

NOT(COUNTIF(tags!$A$1:$B$128,INDEX(SPLIT(A2," "),1,1)))

translate it if it's true with GOOGLETRANSLATE(A2, "en", "de") or add the brackets back with "<"&A2&">" so the whole formula will look like

=IF(NOT(COUNTIF(tags!$A$1:$B$128,INDEX(SPLIT(A2," "),1,1))),GOOGLETRANSLATE(A2, "en", "de"), "<"&A2&">")

then on a line below, just join the whole row back to a single cell with

=JOIN("",A3:L3)

You can hide rows 2 and 3 for convenience, or even put them on a separate sheet along with the tags. You can also add a condition not to add < and > if it's empty, so you can join up the whole row without looking at how long it is.

If you'd like to do this in a single formula, you'd have to write a script for it, as some formulas act strangely with arrayformula and sometimes are barely usable.

Solution 3:[3]

I think the most most convenient solution nowerdays is to use a custom JS-Function (Extensions >> AppScripts):

var spanish = LanguageApp.translate('This is a <strong>test</strong>',
                                  'en', 'es', {contentType: 'html'});

// The code will generate "Esta es una <strong>prueba</strong>".

LanguageApp.translate (apidoc) accepts as fourth option a contentType, which can be text or html. For huge tables be aware that there are daily limits (quotas)!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 zoltankundi
Solution 3 fraank