1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Need regex help

Discussion in 'PHP' started by Nextri, Jun 23, 2007.

  1. Nextri

    Nextri New Member

    Joined:
    Jun 23, 2007
    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    0
    I'm making a script that checks an url. Grabs the page. Then looks through the page, and looks for links to a specific site. And I want to capture the keywords on those links.

    Regex is not my strong side. Anyone able to help me out here?

    First find links where the href equals 'domain.com'

    then find the keyword(s) that links is linked with

    using preg_match_all
     
  2. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,646
    Likes Received:
    86
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    Keywords are found in a meta tag, like this one

    HTML:
     <meta name="keywords" content="php,perl,javascript">
     
    you can get this using the following regex!

    PHP:
     $content 'the html page content';
     
    preg_match('/<meta +name=["\']?keywords["\']? content=["\'](.+)["\'] *>/i',$content,$matches);
     
    // $matches contain the matches!
     
     
  3. Nextri

    Nextri New Member

    Joined:
    Jun 23, 2007
    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    0
    not exactly what i had in mind..

    I don't want to find the meta tag keywords.
    I want to extract all <a> links on a page that links to a given url
    Then I want to know what keyword is between the <a href="http://domain.com"> and </a>

    regardless if the link has other attributes like target, class or id. and if it uses " or ' around them.
     
  4. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,646
    Likes Received:
    86
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:

Share This Page