CharsetDecoder Class in Java
Last Updated : 27 May, 2024
For encoding and decoding tasks, many methods are offered in Charset Encoder and Charset Decoder classes in Java. The Charset Decoder class is used for text handling to convert bytes to characters. The Charset decoder accepts a sequence of bytes as its input and displays Unicode characters as output. The input provided to the charset decoder must belong to the UTF-8 character set. In Software Development, charset is used for Text handling and decoding control processes.
What is CharsetDecoder?
The Charset Decoder class is imported from the "java.nio.charset" package with functionalities of the decoder class complementing with the encoder class. Input buffers are used to process the input byte sequence as distinct buffers. The output of each buffer is written to a character buffer which upon concatenation forms meaningful strings.
This decoder is used by making the following sequence of method invocations:
- The reset method resets the value of the decoder unless previously used.
- Decode method with the endOfInput argument as "false" fills the input buffer and flushes the output buffer between invocations. The false value of the argument conveys that the input set of bytes may not be complete. So, decoder will process as many bytes as possible in the input buffer.
- The decode method with the endOfInput argument as "true" is passed one final time and then the flush method is used so that the decoder can flush any internal state to the output buffer.
The above sequence of operations together comprise the functions of the decode method. Each invocation of this decoding method will decode as many bytes as possible from the input buffer and write the resulting characters to the output buffer.
Syntax of CharsetDecoder
public abstract class CharsetDecoder extends Object
Constructor of CharsetDecoder
Constructor | Modifier | Description |
---|
CharsetDecoder(Charset cs, float averageCharsPerByte, float maxCharsPerByte) | protected | This is the initialization of the decoder |
---|
Methods in CharsetDecoder
Modifier with type | Method | Description |
---|
final float | averageCharsPerByte() | Returns average number of characters produced for each byte of input. |
---|
final Charset | charset() | Returns charset that created this decoder. |
---|
final CharBuffer | decode(ByteBuffer in) | Convenience method that decodes remaining content of single input byte buffer into a newly-allocated character buffer. |
---|
final CoderResult | decode(ByteBuffer in, CharBuffer out, boolean endOfInput) | Decodes multiple possible bytes from input buffer, and writes results to the output buffer. |
---|
protected abstract CoderResult | decodeLoop(ByteBuffer in, CharBuffer out) | Decodes one or more bytes into one or more characters. |
---|
Charset | detectedCharset() | Retrieves charset detected by the decoder (optional operation). |
---|
final CoderResult | flush(CharBuffer out) | Flushes decoder. |
---|
protected CoderResult | implFlush(CharBuffer out) | Flushes decoder. |
---|
protected void | implOnMalformedInput(CodingErrorAction newAction) | Reports a change to decoder's malformed-input action. |
---|
protected void | implOnUnmappableCharacter(CodingErrorAction newAction) | Reports a change to unmappable-character action of decoder. |
---|
protected void | implReplaceWith(String newReplacement) | Reports a change to this decoder's replacement value. |
---|
protected void | implReset() | Resets decoder and clears any charset-specific internal state. |
---|
Boolean | isAutoDetecting() | Tells whether decoder implements an auto-detecting charset. |
---|
Boolean | isCharsetDetected() | Tells whether or not decoder has detected a charset (optional operation). |
---|
CodingErrorAction | malformedInputAction() | Returns decoder's last action for malformed-inputs. |
---|
final float | maxCharsPerByte() | Returns the maximum number of characters produced for each input byte. |
---|
final CharsetDecoder | onMalformedInput(CodingErrorAction newAction) | Changes decoder's action for malformed-input errors. |
---|
final CharsetDecoder | onUnmappableCharacter(CodingErrorAction newAction) | Changes decoder's action for unmappable-character errors. |
---|
final String | replacement() | Returns replacement value of decoder. |
---|
final CharsetDecoder | replaceWith(String newReplacement) | Changes replacement value of decoder. |
---|
final CharsetDecoder | reset() | Resets this decoder, after clearing internal state. |
---|
CodingErrorAction | unmappableCharacterAction() | Returns unmappable-character errors of decoder and returns them. |
---|
Above table describing modifier, method and it's description has been mentioned according to the documentation in Java SDK 21. The java SE APIs 21 defines core Java platform for general purpose computing.
Error Handling during decoding
There are 2 cases of error handling in CharsetDecoder class namely, malformed byte sequence or unmappable character. The error can be dealt with using ignore, report or replace with actions. Malformed inputs can be reported using onMalformedInput method.
Java // Java Program to Implement // CharsetDecoder Class import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.Charset; import java.nio.charset.CharsetDecoder;//decoding operations library import java.nio.charset.CoderResult;// result promtomg! import java.nio.charset.CodingErrorAction; // Error message handling class GFG { public static void main (String[] args) { byte[] bytes = { (byte) 0x40, (byte) 0x40};// '@' is being passed twice in UTF - 8 CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder(); //Error handling actions for malformed and unmappable characters decoder.onMalformedInput(CodingErrorAction.REPLACE); decoder.onUnmappableCharacter(CodingErrorAction.REPORT); //I/O buffer creation CharBuffer charStore = CharBuffer.allocate(bytes.length); ByteBuffer utfStore = ByteBuffer.wrap(bytes); //Output string instance to concatenate each decoded byte StringBuilder decodedText = new StringBuilder(); CoderResult result; do { result = decoder.decode(utfStore, charStore, false); charStore.flip(); decodedText.append(charStore); charStore.clear(); if (result.isError()) { // Error handling logic if (result.isMalformed()) { System.err.println("Encountered malformed byte sequence!");//Malformed error } else if (result.isUnmappable()) { System.err.println("Encountered unmappable character!");//Unmappable error } } } while (!result.isUnderflow()); System.out.println("Decoded Text: " + decodedText); // Decoded text is shown as output! } }
Explanation of the above Program:
The above code depicts how '@' is being decoded using CharsetDecoder library with correct error handling procedure. Following steps are involved in above code:
- bytes is an input list of type byte containing '@' passed as UTF-8 encoded values.
- Error handling is performed such that malformed inputs are replaced and unmappable inputs are reported.
- Two buffers CharBuffer and byteBuffer are created to store output characters and input byte sequence respectively.
- decodedText is used to concate each output character and display the output to users.
- The do-while performs the decoding task for each byte in the input byte sequence using previously created buffers under underflow condition.
- After completing the loop output of the decosing program is prointed to the terminal as '@@'.