Documentation

Confusables.php

Simple Machines Forum (SMF)

Tags
author

Simple Machines https://www.simplemachines.org

copyright

2025 Simple Machines and individual contributors

license

https://www.simplemachines.org/about/smf/license.php BSD

version
3.0

Alpha 2

Table of Contents

Functions

utf8_confusables()  : array<string|int, mixed>
Helper function for SMF\Unicode\SpoofDetector::getSkeletonString.
utf8_character_scripts()  : array<string|int, mixed>
Helper function for SpoofDetector::resolveScriptSet.
utf8_regex_identifier_status()  : array<string|int, mixed>
Helper function for SpoofDetector::checkHomographNames.

Functions

utf8_confusables()

Helper function for SMF\Unicode\SpoofDetector::getSkeletonString.

utf8_confusables() : array<string|int, mixed>

Returns an array of "confusables" maps that can be used for confusable string detection.

Data compiled from: https://www.unicode.org/Public/security/latest/confusables.txt

Developers: Do not update the data in this function manually. Instead, run "php -f other/update_unicode_data.php" on the command line.

Return values
array<string|int, mixed>

"Confusables" maps.

utf8_character_scripts()

Helper function for SpoofDetector::resolveScriptSet.

utf8_character_scripts() : array<string|int, mixed>

Each key in the returned array defines the END of a range of characters that all have the same script set. For example, the first key, "\x40", means the range of characters from "\x0" to "\x40". Then the second key, "\x5A", means the range from "\x41" to "\x5A".

The first entry in each value array indicates the primary script (i.e. the value of the Script property) for that set of characters. If those characters can also occur in a limited number of other scripts (i.e. the Script_Extensions property for those characters is not empty), those additional scripts are listed after the first.

See https://www.unicode.org/reports/tr24/ for more info.

Developers: Do not update the data in this function manually. Instead, run "php -f other/update_unicode_data.php" on the command line.

Return values
array<string|int, mixed>

Script data for ranges of Unicode characters.

utf8_regex_identifier_status()

Helper function for SpoofDetector::checkHomographNames.

utf8_regex_identifier_status() : array<string|int, mixed>

Returns an array of regexes that can be used to check the "identifier status" of characters in a string.

Developers: Do not update the data in this function manually. Instead, run "php -f other/update_unicode_data.php" on the command line.

Return values
array<string|int, mixed>

Character classes for identifier statuses.


        
On this page

Search results