Go4Expert

Go4Expert (http://www.go4expert.com/)
-   C++ (http://www.go4expert.com/forums/cpp/)
-   -   Boost Regex wont return value [c++] (http://www.go4expert.com/forums/boost-regex-wont-return-value-cpp-t13963/)

thekevin07 17Sep2008 07:43

Boost Regex wont return value [c++]
 
Hi

I've spent 2hrs on a regex and i cant get it to work

Code:

\<div\\sclass=\"Summary\">(.*?)\<\/div>
and I'm using boost regex in c++ and visual studio 2008. I cant seam to get that regex above to give out the content between <div class="Summary"> and </div> below is an example of what Im trying to do

<div class=\"Summary\">
<b>heading</b>
</br>desc
<div class="anotherDiv">

</div>
</div>

when i run the regex i should get this back

Code:

<b>heading</b>
    </br>desc
    <div class="anotherDiv"></div>


thanks in advance

oogabooga 17Sep2008 20:45

Re: Boost Regex wont return value [c++]
 
You have an extra backslash in your regex:
\<div\\sclass=\"Summary\">(.*?)\<\/div>
And you shouldn't have the backslashes before the quotes in your html:
<div class=\"Summary\">

thekevin07 17Sep2008 23:35

Re: Boost Regex wont return value [c++]
 
that didnt work removing the \ before the " gave a compile err and removing the \ before the \s didnt return anything.

thanks

oogabooga 18Sep2008 00:21

Re: Boost Regex wont return value [c++]
 
Sorry, I didn't realize the html was a string constant.
I assumed you were reading from an actual html file,
which of course shouldn't have backslashes before the quotes.

But the extra backslash in your regex is definitely a problem.
Two backslashes in a row give a literal backslash, so you
have to remove one of them to give a proper \s.

Your problem may be that the period doesn't usually match
newlines. In Perl you make it do so with the /s option.
Perhaps there is something similar in boost.

oogabooga 18Sep2008 00:59

Re: Boost Regex wont return value [c++]
 
It just occurred to me that since this is all taking place in C strings
you may have to double up all (or most) of the backslashes, like this:
\\<div\\sclass=\"Summary\">(.*?)\\<\\/div>

thekevin07 18Sep2008 23:12

Re: Boost Regex wont return value [c++]
 
I tried that. I doubled on everything and some of the things i tried every variation I could think of sometimes though it does return this part

Code:

<div class="Summary">
but nothing else which is really odd

oogabooga 19Sep2008 01:00

Re: Boost Regex wont return value [c++]
 
One last thought. Try this:

boost::regex re ("<div\\sclass=\"Summary\">(.*?)</div>", boost::regex::mod_s);

If that doesn't work, try double-backslashes before the forward slash.
The mod_s switch ensures that the period can match newlines.
You've spurred me into installing boost (for the regexes if nothing else),
but I haven't done it yet, so I can't test it myself!

oogabooga 19Sep2008 07:16

Re: Boost Regex wont return value [c++]
 
Hey thekev,
I've had dinner, installed boost, and I think I've found the problem.
Try this as your regex string:
".*<div\\sclass=\"Summary\">(.*?)</div>.*"

thekevin07 19Sep2008 07:59

Re: Boost Regex wont return value [c++]
 
hmm that didn't work either same result as before it would just show a blank line. here is some more code the first part is a function that returns the string i need to get based on the regex. I have tested this function with about 8 other regexs and it should work with this one. the first parameter is the actual regex, the 2nd is the buffer string that contains our html page and the 3rd clears out xtra tags i dont need but i have omited that so it shouldnt be a problem. The second part is the actual calling of the function with the regex we are trying to figure out

Code:

string getListingData(string regexstring,string string1,string replaceregex)
{
        string content;
        boost::regex expression(regexstring, boost::regex::mod_s);
        boost::smatch match;
        expression.assign(regexstring, boost::regex_constants::icase);
        while(boost::regex_search(string1,match,expression,boost::match_not_dot_newline) )
        {
                content=match[0];
                //used to clear xtra chars on content
                //content=boost::regex_replace(content, boost::regex(replaceregex), "");
                string1 = match.suffix();
        }
        return content;
}

Code:

string tmp;
tmp=getListingData("<div\\sclass=\"Summary\">(.*?)</div>",string1,"");
cout<<tmp<<endl;

I have tried every variation of the regex u gave but still the same problem

thekevin07 19Sep2008 08:11

Re: Boost Regex wont return value [c++]
 
just in case here are my includes as well

Code:

#include <string>
#include <cstring>
#include <iostream>
#include "curl.h"
#include <sstream>
#include "boost/regex.hpp"
#include <iterator>
#include <boost/algorithm/string/regex.hpp>



All times are GMT +5.5. The time now is 21:39.