1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Boost Regex wont return value [c++]

Discussion in 'C++' started by thekevin07, Sep 17, 2008.

  1. thekevin07

    thekevin07 New Member

    Joined:
    Sep 13, 2008
    Messages:
    29
    Likes Received:
    0
    Trophy Points:
    0
    Hi

    I've spent 2hrs on a regex and i cant get it to work

    Code:
    \<div\\sclass=\"Summary\">(.*?)\<\/div>
    and I'm using boost regex in c++ and visual studio 2008. I cant seam to get that regex above to give out the content between <div class="Summary"> and </div> below is an example of what Im trying to do

    <div class=\"Summary\">
    <b>heading</b>
    </br>desc
    <div class="anotherDiv">

    </div>
    </div>

    when i run the regex i should get this back

    Code:
    <b>heading</b>
         </br>desc
         <div class="anotherDiv"></div>
    

    thanks in advance
     
  2. oogabooga

    oogabooga New Member

    Joined:
    Jan 9, 2008
    Messages:
    115
    Likes Received:
    11
    Trophy Points:
    0
    You have an extra backslash in your regex:
    \<div\\sclass=\"Summary\">(.*?)\<\/div>
    And you shouldn't have the backslashes before the quotes in your html:
    <div class=\"Summary\">
     
  3. thekevin07

    thekevin07 New Member

    Joined:
    Sep 13, 2008
    Messages:
    29
    Likes Received:
    0
    Trophy Points:
    0
    that didnt work removing the \ before the " gave a compile err and removing the \ before the \s didnt return anything.

    thanks
     
  4. oogabooga

    oogabooga New Member

    Joined:
    Jan 9, 2008
    Messages:
    115
    Likes Received:
    11
    Trophy Points:
    0
    Sorry, I didn't realize the html was a string constant.
    I assumed you were reading from an actual html file,
    which of course shouldn't have backslashes before the quotes.

    But the extra backslash in your regex is definitely a problem.
    Two backslashes in a row give a literal backslash, so you
    have to remove one of them to give a proper \s.

    Your problem may be that the period doesn't usually match
    newlines. In Perl you make it do so with the /s option.
    Perhaps there is something similar in boost.
     
  5. oogabooga

    oogabooga New Member

    Joined:
    Jan 9, 2008
    Messages:
    115
    Likes Received:
    11
    Trophy Points:
    0
    It just occurred to me that since this is all taking place in C strings
    you may have to double up all (or most) of the backslashes, like this:
    \\<div\\sclass=\"Summary\">(.*?)\\<\\/div>
     
  6. thekevin07

    thekevin07 New Member

    Joined:
    Sep 13, 2008
    Messages:
    29
    Likes Received:
    0
    Trophy Points:
    0
    I tried that. I doubled on everything and some of the things i tried every variation I could think of sometimes though it does return this part

    Code:
    <div class="Summary">
    but nothing else which is really odd
     
  7. oogabooga

    oogabooga New Member

    Joined:
    Jan 9, 2008
    Messages:
    115
    Likes Received:
    11
    Trophy Points:
    0
    One last thought. Try this:

    boost::regex re ("<div\\sclass=\"Summary\">(.*?)</div>", boost::regex::mod_s);

    If that doesn't work, try double-backslashes before the forward slash.
    The mod_s switch ensures that the period can match newlines.
    You've spurred me into installing boost (for the regexes if nothing else),
    but I haven't done it yet, so I can't test it myself!
     
  8. oogabooga

    oogabooga New Member

    Joined:
    Jan 9, 2008
    Messages:
    115
    Likes Received:
    11
    Trophy Points:
    0
    Hey thekev,
    I've had dinner, installed boost, and I think I've found the problem.
    Try this as your regex string:
    ".*<div\\sclass=\"Summary\">(.*?)</div>.*"
     
  9. thekevin07

    thekevin07 New Member

    Joined:
    Sep 13, 2008
    Messages:
    29
    Likes Received:
    0
    Trophy Points:
    0
    hmm that didn't work either same result as before it would just show a blank line. here is some more code the first part is a function that returns the string i need to get based on the regex. I have tested this function with about 8 other regexs and it should work with this one. the first parameter is the actual regex, the 2nd is the buffer string that contains our html page and the 3rd clears out xtra tags i dont need but i have omited that so it shouldnt be a problem. The second part is the actual calling of the function with the regex we are trying to figure out

    Code:
    string getListingData(string regexstring,string string1,string replaceregex)
    {
    	string content;
    	boost::regex expression(regexstring, boost::regex::mod_s);
    	boost::smatch match;
    	expression.assign(regexstring, boost::regex_constants::icase);
    	while(boost::regex_search(string1,match,expression,boost::match_not_dot_newline) )
    	{
    		content=match[0];
    		//used to clear xtra chars on content
    		//content=boost::regex_replace(content, boost::regex(replaceregex), "");
    		string1 = match.suffix();
    	}
    	return content;
    }
    
    Code:
    string tmp;
    tmp=getListingData("<div\\sclass=\"Summary\">(.*?)</div>",string1,"");
    cout<<tmp<<endl;
    
    I have tried every variation of the regex u gave but still the same problem
     
  10. thekevin07

    thekevin07 New Member

    Joined:
    Sep 13, 2008
    Messages:
    29
    Likes Received:
    0
    Trophy Points:
    0
    just in case here are my includes as well

    Code:
    #include <string>
    #include <cstring>
    #include <iostream>
    #include "curl.h"
    #include <sstream>
    #include "boost/regex.hpp"
    #include <iterator>
    #include <boost/algorithm/string/regex.hpp>
     
  11. oogabooga

    oogabooga New Member

    Joined:
    Jan 9, 2008
    Messages:
    115
    Likes Received:
    11
    Trophy Points:
    0
    Have you tried adding .* to the beginning and end of the regex string?
    That's what my last post was about (I should have emphasized that).
    Here's the regex string again:
    ".*<div\\sclass=\"Summary\">(.*?)</div>.*"
     
  12. thekevin07

    thekevin07 New Member

    Joined:
    Sep 13, 2008
    Messages:
    29
    Likes Received:
    0
    Trophy Points:
    0
    it seams to be working now i changed some stuff with the new line operator and it seams to work great now. Never would have gotten it without your help thanks man
     
  13. oogabooga

    oogabooga New Member

    Joined:
    Jan 9, 2008
    Messages:
    115
    Likes Received:
    11
    Trophy Points:
    0
    No problem.
    Thanks for introducing me to boost!
     

Share This Page