I am writing a perl program which should do the following... for ex. if I have a html file like.. <b>this is bold.</b>This is bold too</b> I have to write the program (without using any html parser function) that would print it like..... <b>this is bold.This is bold too</b> basically it would remove unnecarry tags. I just have to use regular expressions for it. My instructor advised me not to read the html file line by line as it would not take care of if a tags have beginning tags in on line 1 and the end tag is on the line after (as seen in the file above). I was suggested to put all the html file into one scalar variable. Now I have made the program so it puts all the html file in one scalar variable. Now my question is how would I search for several instances of <b> and </b> tags in the scalar variable. Should I read it character by character? I am very consfused on this part. Please advise me. Thanks!
Hi, so far i have am able to remove the bold tags as..... <b>abcd</b>efgh<b>ijkl</b> to <b>abcdefghijkl</b> by using... $allHtmlDocument =~ s/$endBoldTag(\s*)$startBoldTag//gi; now the problem is... if I have <b>abcd</b><i><b>efgh</i></b> and I want to make it like <b>abcd<i>efgh</i></b> then I still need to remove the bold tags (as there are only tags between them) but I also need to keep the tags between them.how would i capture those tags. I am unable to figure out any way since I am not reading the whole document line by line. Thanks!