I am writing a perl program which should do the following...
for ex. if I have a html file like..
<b>this is bold.</b>This is
I have to write the program (without using any html parser function) that would print it like.....
<b>this is bold.This is bold too</b>
basically it would remove unnecarry tags.
I just have to use regular expressions for it.
My instructor advised me not to read the html file line by line as it would not take care of if a tags have beginning tags in on line 1 and the end tag is on the line after (as seen in the file above). I was suggested to put all the html file into one scalar variable.
Now I have made the program so it puts all the html file in one scalar variable. Now my question is how would I search for several instances of <b> and </b> tags in the scalar variable. Should I read it character by character? I am very consfused on this part. Please advise me. Thanks!
so far i have am able to remove the bold tags as.....
$allHtmlDocument =~ s/$endBoldTag(\s*)$startBoldTag//gi;
now the problem is...
if I have <b>abcd</b><i><b>efgh</i></b>
and I want to make it like
then I still need to remove the bold tags (as there are only tags between them) but I also need to keep the tags between them.how would i capture those tags. I am unable to figure out any way since I am not reading the whole document line by line.
i got it too
will post the solution sometime.
this thread can be closed now.