wpseek.com
A WordPress-centric search engine for devs and theme authors



wp_is_valid_utf8 › WordPress Function

Since6.9.0
Deprecatedn/a
wp_is_valid_utf8 ( $bytes )
Parameters:
  • (string) $bytes String which might contain text encoded as UTF-8.
    Required: Yes
See:
Returns:
  • (bool) Whether the provided bytes can decode as valid UTF-8.
Defined at:
Codex:

Determines if a given byte string represents a valid UTF-8 encoding.

Note that it’s unlikely for non-UTF-8 data to validate as UTF-8, but it is still possible. Many texts are simultaneously valid UTF-8, valid US-ASCII, and valid ISO-8859-1 (latin1). Example: true === wp_is_valid_utf8( '' ); true === wp_is_valid_utf8( 'just a test' ); true === wp_is_valid_utf8( "xE2x9Cx8F" ); // Pencil, U+270F. true === wp_is_valid_utf8( "u{270F}" ); // Pencil, U+270F. true === wp_is_valid_utf8( '✏' ); // Pencil, U+270F. false === wp_is_valid_utf8( "just xC0 test" ); // Invalid bytes. false === wp_is_valid_utf8( "xE2x9C" ); // Invalid/incomplete sequences. false === wp_is_valid_utf8( "xC1xBF" ); // Overlong sequences. false === wp_is_valid_utf8( "xEDxB0x80" ); // Surrogate halves. false === wp_is_valid_utf8( "BxFCch" ); // ISO-8859-1 high-bytes. // E.g. The “ü” in ISO-8859-1 is a single byte 0xFC, // but in UTF-8 is the two-byte sequence 0xC3 0xBC.


Source

function wp_is_valid_utf8( string $bytes ): bool {
	/*
	 * Since PHP 8.3.0 the UTF-8 validity is cached internally
	 * on string objects, making this a direct property lookup.
	 *
	 * This is to be preferred exclusively once PHP 8.3.0 is
	 * the minimum supported version, because even when the
	 * status isn’t cached, it uses highly-optimized code to
	 * validate the byte stream.
	 */
	return function_exists( 'mb_check_encoding' )
		? mb_check_encoding( $bytes, 'UTF-8' )
		: _wp_is_valid_utf8_fallback( $bytes );
}