Package com.ibm.icu.text
Class CharsetRecognizer
java.lang.Object
com.ibm.icu.text.CharsetRecognizer
- Direct Known Subclasses:
CharsetRecog_2022
,CharsetRecog_mbcs
,CharsetRecog_sbcs
,CharsetRecog_Unicode
,CharsetRecog_UTF8
Abstract class for recognizing a single charset.
Part of the implementation of ICU's CharsetDetector.
Each specific charset that can be recognized will have an instance
of some subclass of this class. All interaction between the overall
CharsetDetector and the stuff specific to an individual charset happens
via the interface provided here.
Instances of CharsetDetector DO NOT have or maintain
state pertaining to a specific match or detect operation.
The WILL be shared by multiple instances of CharsetDetector.
They encapsulate const charset-specific information.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionGet the ISO language code for this charset.(package private) abstract String
getName()
Get the IANA name of this charset.(package private) abstract CharsetMatch
match
(CharsetDetector det) Test the match of this charset with the input text data which is obtained via the CharsetDetector object.
-
Constructor Details
-
CharsetRecognizer
CharsetRecognizer()
-
-
Method Details
-
getName
Get the IANA name of this charset.- Returns:
- the charset name.
-
getLanguage
Get the ISO language code for this charset.- Returns:
- the language code, or
null
if the language cannot be determined.
-
match
Test the match of this charset with the input text data which is obtained via the CharsetDetector object.- Parameters:
det
- The CharsetDetector, which contains the input text to be checked for being in this charset.- Returns:
- A CharsetMatch object containing details of match with this charset, or null if there was no match.
-