Ajax software
Free javascripts
↑
Main Page
To sanitize user input, you simply call the
sanitizeHTML()
function on the user-provided input. It
will strip any tags that are not in the variable
$allowed_tags
, as well as common attributes that can
be cleverly used to execute JavaScript.
Without executing
sanitizeHTML()
over the input HTML, the cleverly constructed HTML would
redirect to
http://too.much.spam
. The event
onerror
is executed upon an error. Because the
image
INVALID-IMAGE
does not exist (which causes an error), it executes the
onerror
event,
location.href=‘http://www.spamsite.com’
, causing the redirection.
After executing
sanitizeHTML()
,
onerror
is replaced with
SANITIZED
, and nothing occurs.
The
sanitizeHTML
function does not typically return valid HTML. In practice, this does not matter,
because this function is really designed as a stopgap method to prevent spam. The modified HTML code
will not likely cause any problems in browsers or search engines, either. Eventually, the content would
be deleted or edited by the site owner anyway.
Having such “black hat” content within a web site can damage both the human as well as a search engine
perception of reputation. Embedding JavaScript-based redirects can raise red flags in search engine algo-
rithms and may result in penalties and web site bans. It is therefore of the utmost importance to address
and mitigate these concerns.
Note that the nofollow library was not used in this latest example, but you could combine nofollow with
sanitize to obtain a better result, like this:
// display third comment
$inHTML = ‘<p>Sanitizing <img src=”INVALID-IMAGE”‘ .
‘onerror=”location.href=\‘http://too.much.spam/\‘“>!</p>’;
$sanitized = noFollowLinks(sanitizeHTML($inHTML));
echo $sanitized;
Lastly, your implementations — both
noFollowLinks()
and
sanitizeHTML()
— will not exhaustively
block
every
attack, or allow the flexibility some programmers require. They do, however, make a spam-
mer’s life much more difficult, and he or she will likely proceed to an easier target. A project called safe-
html by Pixel-Apes is a more robust solution. It is open-source and written in PHP. You can find it at
http://pixel-apes.com/safehtml/
.
Requesting Human Input
One common problem webmasters and developers need to consider are the automatic spam robots,
which submit comments on unprotected blogs or other web sites that support comments.
The typical solution to this problem is to use what is called a “CAPTCHA” image that requires the
visitor to read a graphical version of text with some sort of obfuscation. A typical human can read the
image, but an automated script cannot. This approach, however, unfortunately presents usability prob-
lems, because blind users can no longer access the functionality therein. For more information on this
type of CAPTCHA, visit
http://freshmeat.net/projects/kcaptcha/
. An improvement on this
188
Chapter 8: Black Hat SEO
c08.qxd:c08 10:59 188
Ajax software
Free javascripts
→