Hi I've spent 2hrs on a regex and i cant get it to work Code: \<div\\sclass=\"Summary\">(.*?)\<\/div> and I'm using boost regex in c++ and visual studio 2008. I cant seam to get that regex above to give out the content between <div class="Summary"> and </div> below is an example of what Im trying to do <div class=\"Summary\"> <b>heading</b> </br>desc <div class="anotherDiv"> </div> </div> when i run the regex i should get this back Code: <b>heading</b> </br>desc <div class="anotherDiv"></div> thanks in advance
You have an extra backslash in your regex: \<div\\sclass=\"Summary\">(.*?)\<\/div> And you shouldn't have the backslashes before the quotes in your html: <div class=\"Summary\">
that didnt work removing the \ before the " gave a compile err and removing the \ before the \s didnt return anything. thanks
Sorry, I didn't realize the html was a string constant. I assumed you were reading from an actual html file, which of course shouldn't have backslashes before the quotes. But the extra backslash in your regex is definitely a problem. Two backslashes in a row give a literal backslash, so you have to remove one of them to give a proper \s. Your problem may be that the period doesn't usually match newlines. In Perl you make it do so with the /s option. Perhaps there is something similar in boost.
It just occurred to me that since this is all taking place in C strings you may have to double up all (or most) of the backslashes, like this: \\<div\\sclass=\"Summary\">(.*?)\\<\\/div>
I tried that. I doubled on everything and some of the things i tried every variation I could think of sometimes though it does return this part Code: <div class="Summary"> but nothing else which is really odd
One last thought. Try this: boost::regex re ("<div\\sclass=\"Summary\">(.*?)</div>", boost::regex::mod_s); If that doesn't work, try double-backslashes before the forward slash. The mod_s switch ensures that the period can match newlines. You've spurred me into installing boost (for the regexes if nothing else), but I haven't done it yet, so I can't test it myself!
Hey thekev, I've had dinner, installed boost, and I think I've found the problem. Try this as your regex string: ".*<div\\sclass=\"Summary\">(.*?)</div>.*"
hmm that didn't work either same result as before it would just show a blank line. here is some more code the first part is a function that returns the string i need to get based on the regex. I have tested this function with about 8 other regexs and it should work with this one. the first parameter is the actual regex, the 2nd is the buffer string that contains our html page and the 3rd clears out xtra tags i dont need but i have omited that so it shouldnt be a problem. The second part is the actual calling of the function with the regex we are trying to figure out Code: string getListingData(string regexstring,string string1,string replaceregex) { string content; boost::regex expression(regexstring, boost::regex::mod_s); boost::smatch match; expression.assign(regexstring, boost::regex_constants::icase); while(boost::regex_search(string1,match,expression,boost::match_not_dot_newline) ) { content=match[0]; //used to clear xtra chars on content //content=boost::regex_replace(content, boost::regex(replaceregex), ""); string1 = match.suffix(); } return content; } Code: string tmp; tmp=getListingData("<div\\sclass=\"Summary\">(.*?)</div>",string1,""); cout<<tmp<<endl; I have tried every variation of the regex u gave but still the same problem
just in case here are my includes as well Code: #include <string> #include <cstring> #include <iostream> #include "curl.h" #include <sstream> #include "boost/regex.hpp" #include <iterator> #include <boost/algorithm/string/regex.hpp>
Have you tried adding .* to the beginning and end of the regex string? That's what my last post was about (I should have emphasized that). Here's the regex string again: ".*<div\\sclass=\"Summary\">(.*?)</div>.*"
it seams to be working now i changed some stuff with the new line operator and it seams to work great now. Never would have gotten it without your help thanks man