Clean User Input HTML using HTML::Scrubber

Discussion in 'Perl' started by pradeep, May 24, 2012.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    Most modern day websites take inputs from user in the form of comments, reviews, PMs etc. and it's needed to control the HTML tags in the users' content to prevent XSS attacks, spamming with URLs, embedding videos - which might attract copyright problems - and similar problems. Many sites list some allowed HTML tags which can be used, and strip out the rest or show an error message to the user.

    It's best to strip out the tags because many users may not be aware of the tags present or may not know how to fix them. In this article we'll explore the Perl module HTML::Scrubber which is highly configurable and we'll use it to strip unwanted HTML tags, write validation rules to strip tags based on certain conditions.

    Basic Usage



    In the following code example we'll see the basic usage of HTML scrubber, we'll allow only the following tags: B, I, BR ; so, all other tags except for these will be stripped off.

    Code:
    use HTML::Scrubber;
    
    my $basic_scrubber = HTML::Scrubber->new( allow => [qw/b i u br/] );
    print $basic_scrubber->scrub('Hi,<br> Check out <a href="http://whatanindianrecipe.com">WhatAnIndianRecipe</a> for <b>delicious</b> dishes from <i>India</i>.');
    
    Output:
    Code:
    Hi,<br> Check out WhatAnIndianRecipe for <b>delicious</b> dishes from <i>India</i>.
    
    As you can see from the output the A tag has been stripped off, only leaving the B & I tags.

    Advanced Usage



    In more advanced use we can control what attributes of certain tags we would like to allow, or if we would like to set default rules like not allowing onmouseover attribute at all, etc. Have a look at the example code below, this is would help you understand the idea behind the package.

    Code:
    #!/usr/bin/perl 
    
    use HTML::Scrubber;
    
    ## allowed tags
    my @allow = qw/br i a/;
    
    ## allow/disallow tags & attributes
    my @rules = (
        script => 0,
        img    => {
            ## allow images only from a specific domain
            src => qr{^(http://www.go4expert.com)}i,
            ## allow
            alt => 1,    # alt attribute allowed
            '*' => 0,    # deny all other attributes
        },
    );
    
    ## default rules
    my @default = (
        0 => {
            ## allow all attributes
            '*' => 1,
            ## title attribute in all tags will be removed
            title => 0,
            ## set to disallow all JS event attributes
            'onblur'      => 0,
            'onchange'    => 0,
            'onclick'     => 0,
            'ondblclick'  => 0,
            'onerror'     => 0,
            'onfocus'     => 0,
            'onkeydown'   => 0,
            'onkeypress'  => 0,
            'onkeyup'     => 0,
            'onload'      => 0,
            'onmousedown' => 0,
            'onmousemove' => 0,
            'onmouseout'  => 0,
            'onmouseover' => 0,
            'onmouseup'   => 0,
            'onreset'     => 0,
            'onselect'    => 0,
            'onsubmit'    => 0,
            'onunload'    => 0
        }
    );
    
    my $advanced_scrubber = HTML::Scrubber->new(
        allow   => \@allow,
        rules   => \@rules,
        default => \@default
    );
    
    print $advanced_scrubber->scrub('Hi,<br> Check out <a href="http://www.google.com" title="Search">Google</a> for <b>delicious</b> dishes from <i>India</i>. More info at <a href="http://en.wikipedia.org/Recipes">Recipes at Wikipedia</a>.<br><img src="/images/avatar.jpg" alt="Avatar Image" onMouseOver="alert(window.location)"> <embed src="api.flv"></embed> img src="http://www.go4expert.com/images/logo.png" alt="Avatar Image" onMouseOver="alert(window.location)">');
    

    References



    http://search.cpan.org/dist/HTML-Scrubber/
     
    Scripting and shabbir like this.
  2. FredTighe

    FredTighe New Member

    Joined:
    Jun 13, 2012
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    Service
    Location:
    USA
    Home Page:
    http://www.yourcleaner.com.au/
    Great Post!I like this blog very much.I knew many important info from this blog.
    Keep up the good work
     
  3. Scripting

    Scripting John Hoder

    Joined:
    Jun 29, 2010
    Messages:
    421
    Likes Received:
    57
    Trophy Points:
    0
    Occupation:
    School for life
    Location:
    /root
    Very interesting, I think I will learn Perl more!
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice