Convert byte to string with byte-order marker present

Java does a pretty good job of converting byte array to string, using the String constructor new String(byteArray, charset). But if a byte-order marker (BOM) is present, it can get confused. A UTF-8 encoded file may have a BOM, which is nonstandard and technically incorrect, but common since Windows applications such as Notepad will create this by default when saving UTF-8 text. When Java attempts to convert this to String, a bogus character will be placed at the beginning of the string.
Something similar happens when attempting to convert UTF-16 as well. If the endianess is specified in the Charset name ('UTF-16BE' or 'UTF-16LE'), the BOM will not be expected, and a bogus character will be added to the string.

What follows is a short method to fix this by stripping this bogus character out of the returned string.

        public static String convertToString(byte[] bytes, Charset charset) {
		String ret = new String(bytes, charset);
		if ( (bytes[0] == 0xEF - 256) && (bytes[1] == 0xBB - 256) && (bytes[2] == 0xBF - 256) ) {
			ret = ret.substring(1);
		} else if ( (bytes[0] == 0xFE - 256) && (bytes[1] == 0xFF - 256) ) {
			ret = ret.substring(1);
		} else if ( (bytes[0] == 0xFF - 256) && (bytes[1] == 0xFE - 256) ) {
			ret = ret.substring(1);
		}
		return ret;
	}

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.