JavaScript Editor Ajax software     Free javascripts 



Main Page

Sometimes, however, it is desirable to permit a limited dialect of HTML tags. To that end it is necessary
to sanitize the input by removing only potentially malicious tags and attributes (or, because achieving
security is easier as such — allow only tags and attributes that
cannot
be used maliciously).
Some applications take the approach of using a proprietary markup language instead of HTML. A simi-
lar topic was discussed in Chapter 6 in the section “Using a Custom Markup Language to Generate
SE-Friendly HTML,” but to a different end — enhancing on-page HTML optimization. It can also
be used to ensure that content is sanitized. In this case, you would execute
htmlspecialchars()
over or strip the HTML, then also use a translation function and a limited set of proprietary tags such
as
{link}
and
{/link}
,
{image}
and
{/image}
, to permit only certain functionality. This is
the approach of many forum web applications such as vBulletin and phpBB. And indeed for specific
applications where users are constantly engaged in dialog and willing to learn the proprietary markup
language, this makes sense. However, for such things as a comment or guest book, HTML provides
a common denominator that most users know, and allowing a restrictive dialect is probably more
prudent with regard to usability. That is the solution discussed here.
As usual, in order to keep your code tidy, group the HTML sanitizing functionality into a separate file.
Go through the following quick exercise, where you create and use this new little library. The code is
discussed afterwards.
Sanitizing User Input
1.
Create a new file named
sanitize.inc.php
in your
seophp/include
folder, and write
this code:
<?php
// sanitizes the HTML code in $inputHTML
function sanitizeHTML(
$inputHTML,
$allowed_tags = array(‘<h1>’, ‘<b>’, ‘<i>’, ‘<a>’,
‘<ul>’, ‘<li>’, ‘<pre>’, ‘<hr>’,
‘<blockquote>’, ‘<img>’))
{
$_allowed_tags = implode(‘’, $allowed_tags);
$inputHTML = strip_tags($inputHTML, $_allowed_tags);
return preg_replace(‘#<(.*?)>#ise’, “‘<’ . removeBadAttributes(‘\\1’) . ‘>’“ ,
$inputHTML);
}
// removes the unallowed attributes from $inputHTML
function removeBadAttributes($inputHTML)
{
// define the list of unallowed attributes
$bad_attributes = ‘onerror|onmousemove|onmouseout|onmouseover|’ .
‘onkeypress|onkeydown|onkeyup|javascript:’;
// remove the bad attributes and return the result
return stripslashes(preg_replace(“#($bad_attributes)(\s*)(?==)#is” ,
‘SANITIZED’, $inputHTML));
}
?>
185
Chapter 8: Black Hat SEO
c08.qxd:c08 10:59 185


JavaScript Editor Ajax software     Free javascripts