Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Perl (http://www.go4expert.com/articles/perl-tutorials/)
-   -   Clean User Input HTML using HTML::Scrubber (http://www.go4expert.com/articles/clean-user-input-html-using-htmlscrubber-t28443/)

pradeep 24May2012 15:44

Clean User Input HTML using HTML::Scrubber
 
Most modern day websites take inputs from user in the form of comments, reviews, PMs etc. and it's needed to control the HTML tags in the users' content to prevent XSS attacks, spamming with URLs, embedding videos - which might attract copyright problems - and similar problems. Many sites list some allowed HTML tags which can be used, and strip out the rest or show an error message to the user.

It's best to strip out the tags because many users may not be aware of the tags present or may not know how to fix them. In this article we'll explore the Perl module HTML::Scrubber which is highly configurable and we'll use it to strip unwanted HTML tags, write validation rules to strip tags based on certain conditions.

Basic Usage



In the following code example we'll see the basic usage of HTML scrubber, we'll allow only the following tags: B, I, BR ; so, all other tags except for these will be stripped off.

Code: Perl

use HTML::Scrubber;

my $basic_scrubber = HTML::Scrubber->new( allow => [qw/b i u br/] );
print $basic_scrubber->scrub('Hi,<br> Check out <a href="http://whatanindianrecipe.com">WhatAnIndianRecipe</a> for <b>delicious</b> dishes from <i>India</i>.');


Output:
Code:

Hi,<br> Check out WhatAnIndianRecipe for <b>delicious</b> dishes from <i>India</i>.
As you can see from the output the A tag has been stripped off, only leaving the B & I tags.

Advanced Usage



In more advanced use we can control what attributes of certain tags we would like to allow, or if we would like to set default rules like not allowing onmouseover attribute at all, etc. Have a look at the example code below, this is would help you understand the idea behind the package.

Code: Perl

#!/usr/bin/perl

use HTML::Scrubber;

## allowed tags
my @allow = qw/br i a/;

## allow/disallow tags & attributes
my @rules = (
    script => 0,
    img    => {
        ## allow images only from a specific domain
        src => qr{^(http://www.go4expert.com)}i,
        ## allow
        alt => 1,    # alt attribute allowed
        '*' => 0,    # deny all other attributes
    },
);

## default rules
my @default = (
    0 => {
        ## allow all attributes
        '*' => 1,
        ## title attribute in all tags will be removed
        title => 0,
        ## set to disallow all JS event attributes
        'onblur'      => 0,
        'onchange'    => 0,
        'onclick'     => 0,
        'ondblclick'  => 0,
        'onerror'     => 0,
        'onfocus'     => 0,
        'onkeydown'   => 0,
        'onkeypress'  => 0,
        'onkeyup'     => 0,
        'onload'      => 0,
        'onmousedown' => 0,
        'onmousemove' => 0,
        'onmouseout'  => 0,
        'onmouseover' => 0,
        'onmouseup'   => 0,
        'onreset'     => 0,
        'onselect'    => 0,
        'onsubmit'    => 0,
        'onunload'    => 0
    }
);

my $advanced_scrubber = HTML::Scrubber->new(
    allow   => \@allow,
    rules   => \@rules,
    default => \@default
);

print $advanced_scrubber->scrub('Hi,<br> Check out <a href="http://www.google.com" title="Search">Google</a> for <b>delicious</b> dishes from <i>India</i>. More info at <a href="http://en.wikipedia.org/Recipes">Recipes at Wikipedia</a>.<br><img src="/images/avatar.jpg" alt="Avatar Image" onMouseOver="alert(window.location)"> <embed src="api.flv"></embed> img src="http://www.go4expert.com/images/logo.png" alt="Avatar Image" onMouseOver="alert(window.location)">');


References



http://search.cpan.org/dist/HTML-Scrubber/

FredTighe 17Jun2012 20:54

Re: Clean User Input HTML using HTML::Scrubber
 
Great Post!I like this blog very much.I knew many important info from this blog.
Keep up the good work

Scripting 18Jun2012 00:54

Re: Clean User Input HTML using HTML::Scrubber
 
Very interesting, I think I will learn Perl more!


All times are GMT +5.5. The time now is 21:41.