![]() |
Strip/sanitize HTML with Perl
IntroductionSanitizing HTML is just removing unwanted HTML elements from any inputted HTML code, it does not validate HTML code. We all have seen many sites which allow you to post comments using only a few HTML elements like <a>, <b>, <i> etc. the other HTML tags are automatically removed, you may even want to remove all HTML tags completely or you may want to allow HTML tags with some conditions like <img> tags' src attribute should have only relative URL, or the HTML may contain <span> tags but no style attributes etc. etc. SolutionThere are a couple of modules available on CPAN like HTML::Sanitizer, HTML::Strip, HTML::Scrubber; I personally like to use HTML::Scrubber, it's easy to use, you can have complex conditions if you want and is fast. The codeExample: We want to strip all HTML from a string or file Code: Perl
Wasn't that easy? Let's take a look at some more interesting examples. Example: Strip <script> and <style> tags Code: Perl
Example: Anchor tags allowed only if contain relative URLs Code: Perl
Referenceshttp://search.cpan.org |
Re: Strip/sanitize HTML with Perl
Nomination this Article for Article of the month - May 2009
|
| All times are GMT +5.5. The time now is 08:02. |