Check If The Bytes Sequence Is Valid Utf-8 Sequence In Javascript
Is there a simple way to check if string is valid UTF-8 sequence in JavaScript? I really do not want to end with a regular expression like this: Regex to detect invalid UTF-8 strin
Solution 1:
UTF-8 is in fact a simple encoding, but still what you are asking can't be done with a one-liner. You have to:
- Override the
Content-Type
of the response to have a byte array in your script and prevent the browser/library to interpret the response itself - Looping over the bytes to make characters. Note that UTF-8 is a variable-length encoding, and that's why some sequences are invalid.
- If an invalid octet is found, skip it
- If needed, deserialize the JSON/XML/whatever string to a JavaScript object, possibly by handing failures
Deciding if a certain array is a valid UTF-8 sequence is quite a straightforward task (just a bunch of if
statements and bit shiftings), but again it's not a one line thing.
Post a Comment for "Check If The Bytes Sequence Is Valid Utf-8 Sequence In Javascript"