PHP Tutorial - PHP html_entities() Function






Definition

The html_entities() function converts characters that are illegal in HTML, such as &, <, and ", into their safe equivalents: &amp;, &lt;, and &quot;, respectively.

Syntax

PHP html_entities() Function has the following syntax.

string html_entities ( string html [, int options [, string charset]] )

Parameter

ParameterDescription
htmlthe html string to convert
optionsA bitmask of flags
charsetdefines encoding used in conversion. Default value is ISO-8859-1 prior to PHP 5.4.0, and UTF-8 from PHP 5.4.0 onwards. You are highly encouraged to specify the correct value for your code.




Options

options is a bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_COMPAT | ENT_HTML401.

Available flags constants

Constant NameDescription
ENT_COMPATWill convert double-quotes and leave single-quotes alone.
ENT_QUOTESWill convert both double and single quotes.
ENT_NOQUOTESWill leave both double and single quotes unconverted.
ENT_IGNORESilently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it ? may have security implications.
ENT_SUBSTITUTEReplace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
ENT_DISALLOWEDReplace invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of leaving them as is. This may be useful, for instance, to ensure the well-formedness of XML documents with embedded external content.
ENT_HTML401Handle code as HTML 4.01.
ENT_XML1Handle code as XML 1.
ENT_XHTMLHandle code as XHTML.
ENT_HTML5Handle code as HTML 5.




Charset

The following character sets are supported:

CharsetAliasesDescription
ISO-8859-1ISO8859-1Western European, Latin-1.
ISO-8859-5ISO8859-5Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15ISO8859-15Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8NoAliasASCII compatible multi-byte 8-bit Unicode.
cp866ibm866, 866DOS-specific Cyrillic charset.
cp1251Windows-1251, win-1251, 1251Windows-specific Cyrillic charset.
cp1252Windows-1252, 1252Windows specific charset for Western European.
KOI8-Rkoi8-ru, koi8rRussian.
BIG5950Traditional Chinese, mainly used in Taiwan.
GB2312936Simplified Chinese, national standard character set.
BIG5-HKSCSNoAliasBig5 with Hong Kong extensions, Traditional Chinese.
Shift_JISSJIS, SJIS-win, cp932, 932Japanese
EUC-JPEUCJP, eucJP-winJapanese
MacRomanCharset that was used by Mac OS.
''NoAliasAn empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale()), in this order. Not recommended.

Return

PHP html_entities() function returns the escaped string.

Note

We can reverse this conversion using the html_entity_decode() function.

Example

Convert string to HTML friendly


<?PHP
$title = "java2s.com & PHP"; 
$safe = htmlentities($title); 
echo $safe;
?>

The code above generates the following result.