Go4Expert (http://www.go4expert.com/)
-   Ruby on Rails (http://www.go4expert.com/articles/ruby/)
-   -   Clean User Generated HTML In Ruby (http://www.go4expert.com/articles/clean-user-generated-html-ruby-t29749/)

pradeep 29Jul2013 19:14

Clean User Generated HTML In Ruby
Web applications have always to deal with user input, nowadays more HTML, so there is a risk of malicious HTML code, XSS, etc. So, the best way to deal with user input would be sanitize it i.e. the removal of unwanted HTML tags or attributes, like we might not want to have links or scripts in the user's HTML, so we'll have to remove script & a tags. In another case we might want to allow anchor tags with absolute URLs, the cases might be numerous, in this article we'll try to get the basics right.

In this article we'll be looking at a Ruby gem Sanitize, which is a whitelist based sanitizing module, which means that you have to mentioned the allowed tags, attributes, etc. inversely you cannot specify disallowed tags or attributes.

Installing Sanitize

It's pretty easy to install the Sanitize gem in Ruby, just issue the following command and wait:


$ gem install sanitize

Understanding & Using Sanitize

Sanitize comes in with a few built-in modes, which help you complete a few mundane configurations without much effort, here are the built-in modes:

Sanitize::Config::RESTRICTED - Only allows very simple inline formatting.
Sanitize::Config::BASIC - Allows all formatting tags, links & lists. Does not allow tables & images, and a rel="nofollow" attribute is added to all links.
Sanitize::Config::RELAXED - Like BASIC, but allows images & tables and does not add rel="nofollow" attribute to links.

Code: Ruby

require 'rubygems'
require 'sanitize'

html = '<a href="http://www.go4expert.com>G4E</a>, <img src="http://www.go4expert.com/logo.jpg">'

print Sanitize.clean(html, Sanitize::Config::RESTRICTED)

You can also customize it to your needs, here's how to go about it.

Code: Ruby

require 'rubygems'
require 'sanitize'

html  = '<a href="http://www.go4expert.com" id="MyLink"  title="Go4expert">G4E</a>, <img  src="http://www.go4expert.com/logo.jpg"> <div id="myIdDiv"  class="myClass">Test for div tag</div> Send email to <a  href="mailto:info@go4expert.com">info@go4expert.com</a> or  Download from <a href="ftp://www.go4expert.com">FTP  Downloads</a>'

## Now I'll specify the list of allowed tags, tag specific attributes, and tag specific protocols
print Sanitize.clean(html, :elements => ['a', 'span', div],
    :attributes => {'a' => ['href', 'title'], 'div' => ['class'] },
    :protocols => {'a' => {'href' => ['http']} }

## the above config is pretty easy to understand

Try the code for yourself, tweak it to improve your understanding. I hope this was helpful.

jyotimiss123 5Dec2013 13:17

Re: Clean User Generated HTML In Ruby
it,s really nice

All times are GMT +5.5. The time now is 17:47.