Skip to content Skip to sidebar Skip to footer

Regex To Compare Strings With Umlaut And Non-umlaut Variations

Can anyone help me with a javascript regular expression that I can use to compare strings that are the same, taking into acccount their non-Umlaut-ed versions. for example, in Germ

Solution 1:

something like

tr = {"ä":"ae", "ü":"ue", "ö":"oe", "ß":"ss" }

replaceUmlauts = function(s) {
    return s.replace(/[äöüß]/g, function($0) { returntr[$0] })
}

compare = function(a, b) {
    return replaceUmlauts(a) == replaceUmlauts(b)
}

alert(compare("grüße", "gruesse"))

you can easily extends this by adding more entries to "tr"

not quite elegant, but works

Solution 2:

In addition to stereofrogs answer:

tr = {"\u00e4":"ae", "\u00fc":"ue", "\u00f6":"oe", "\u00df":"ss" }

ersetzeUmlauts = function(s) {
    return s.replace(/[\u00e4|\u00fc|\u00f6|\u00df]/g, function($0) { returntr[$0] })
}

I was dealing with Umlauts in an Aptana/Eclipse script and the normal characters ('ä' etc.) didn't do the trick for me.

Solution 3:

I have another way : ( purpose : sorting arrays )

functionumlaut(str) {
 return str
  .replace(/Â|À|Å|Ã/g, "A")
  .replace(/â|à|å|ã/g, "a")
  .replace(/Ä/g, "AE")
  .replace(/ä/g, "ae")
  .replace(/Ç/g, "C")
  .replace(/ç/g, "c")
  .replace(/É|Ê|È|Ë/g, "E")
  .replace(/é|ê|è|ë/g, "e")
  .replace(/Ó|Ô|Ò|Õ|Ø/g, "O")
  .replace(/ó|ô|ò|õ/g, "o")
  .replace(/Ö/g, "OE")
  .replace(/ö/g, "oe")
  .replace(/Š/g, "S")
  .replace(/š/g, "s")
  .replace(/ß/g, "ss")
  .replace(/Ú|Û|Ù/g, "U")
  .replace(/ú|û|ù/g, "u")
  .replace(/Ü/g, "UE")
  .replace(/ü/g, "ue")
  .replace(/Ý|Ÿ/g, "Y")
  .replace(/ý|ÿ/g, "y")
  .replace(/Ž/g, "Z")
  .replace(/ž/, "z"); 
}

Solution 4:

Regular expressions aren't quite powerful enough to do this properly, though you could hack it into almost working with them.

What you want is called Unicode Normalization. A Normalized string is one converted to a common form so you can compare them. You tagged your post "javascript", however, Javascript doesn't have a built in standard library to do this, and I am not aware of one offhand. Most server-side languages do have one, though. For example, the Normalizer Class in PHP. Python and Perl have equivalents, as do Microsoft stuff, I'm sure.

Check out the wikipedia article on Unicode Equivalence for more information.

Solution 5:

You can use pipe as an or in a group for each matching like this (ä|ae).

Post a Comment for "Regex To Compare Strings With Umlaut And Non-umlaut Variations"