string search

abhisheksainiabhishe's Avatar
Light Poster
I am writing a perl program which should do the following...

for ex. if I have a html file like..

<b>this is bold.</b>This is
bold too</b>

I have to write the program (without using any html parser function) that would print it like.....

<b>this is bold.This is bold too</b>

basically it would remove unnecarry tags.

I just have to use regular expressions for it.

My instructor advised me not to read the html file line by line as it would not take care of if a tags have beginning tags in on line 1 and the end tag is on the line after (as seen in the file above). I was suggested to put all the html file into one scalar variable.
Now I have made the program so it puts all the html file in one scalar variable. Now my question is how would I search for several instances of <b> and </b> tags in the scalar variable. Should I read it character by character? I am very consfused on this part. Please advise me. Thanks!
abhisheksainiabhishe's Avatar
Light Poster

so far i have am able to remove the bold tags as.....




by using...
$allHtmlDocument =~ s/$endBoldTag(\s*)$startBoldTag//gi;

now the problem is...

if I have <b>abcd</b><i><b>efgh</i></b>

and I want to make it like


then I still need to remove the bold tags (as there are only tags between them) but I also need to keep the tags between would i capture those tags. I am unable to figure out any way since I am not reading the whole document line by line.

abhisheksainiabhishe's Avatar
Light Poster
i got it too
will post the solution sometime.

thanks anyways!

this thread can be closed now.