Regular Expressions in MySQL

Discussion in 'MySQL' started by pradeep, Dec 25, 2006.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    A very interesting and useful capability of MySQL is to incorporate Regular Expressions (regex) in SQL queries. The regular expression support in MySQL is extensive. Let's take a look at using Regular Expressions in queries and the supported metacharacters.

    Using Regular Expressions in queries



    A simple example of using Regular Expressions in a SQL query would be to select all names from a table that start with 'A'.
    Code:
    SELECT name FROM employees WHERE name REGEXP '^A'
    A slight modification in the above example to look for names starting with 'A' or 'D' or 'F' will look like this.
    Code:
    SELECT name FROM employees WHERE name REGEXP '^(A|D|F)'
    If we want to select all names ending with 'P', then the SQL query goes like this
    Code:
    SELECT name FROM employees WHERE name REGEXP 'P$'
    We can use much complex patterns in our SQL queries, but first let's have a look at various MySQL Regular Expression metacharacters.

    Regular Expression Metacharacters



    *
    Matches zero or more instances of the string preceding it​

    +
    Matches one or more instances of the string preceding it​

    ?
    Matches zero or one instances of the string preceding it​

    .
    Matches any single character, except a newline​

    [xyz]
    Matches any of x, y, or z (match one of enclosed characters)​

    [^xyz]
    Matches any character not enclosed ​

    [A-Z]
    Matches any uppercase letter​

    [a-z]
    Matches any lowercase letter​

    [0-9]
    Matches any digit​

    ^
    Anchors the match from the beginning​

    $
    Anchors the match to the end​

    |
    Separates alternatives​

    {n,m}
    String must occur at least n times, but not more than m times​

    {n}
    String must occur exactly n times​

    {n,}
    String must occur at least n times​

    [[:<:]]
    Matches beginning of words​

    [[:>:]]
    Matches ending of words​

    [:class:]
    match a character class i.e.,

    [:alpha:] for letters
    [:space:] for whitespace
    [:punct:] for punctuation
    [:upper:] for upper case letters
    Extras

    MySQL interprets a backslash (\) character as an escape character. To use a backslash in a regular expression, you must escape it with another backslash (\\).

    Whether the Regular Expression match is case sensitive or otherwise is decided by the collation method of the table. If your collation method name ends with ci then the comparison/match is case-insensitive, else if it end in cs then the match is case sensitive.

    Examples



    Checking only for numbers
    Code:
    SELECT age FROM employees WHERE age REGEXP '^[0-9]+$'
    
    /* starts, ends and contains numbers */
    
    
    Contains a specific word, for example the skill PHP in skill sets
    Code:
    SELECT name FROM employees WHERE skill_sets REGEXP '[[:<:]]php[[:>:]]'
    Fetching records where employees have entered their 10-digit mobile number as the contact number.
    Code:
    SELECT name FROM employees WHERE contact_no REGEXP '^[0-9]{10}$'
     
  2. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    We can also use RLIKE, it works just like REGEXP.
     
  3. ash0279

    ash0279 New Member

    Joined:
    Jan 24, 2008
    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    Database Administrator; Data Architect
    Location:
    Manila Philippines
    this is great!
    from my local test results, it seems the more complex the search pattern goes, the faster it performs..
    for the developers standpoint, i think this would really help them a lot since regexp are widely used in open source devt tools, esp perl.
    my 2 cents..
     
  4. malu

    malu New Member

    Joined:
    Feb 22, 2008
    Messages:
    5
    Likes Received:
    0
    Trophy Points:
    0
    Can someone tell me why would this be true?

    searching text: [[:<:]]word[[:>:]]

    DB: word-word2

    Thanks :)
     
  5. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    [[:<:]]word[[:>:]] is a word boundary match!

    These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
     
  6. mrcosgrove

    mrcosgrove New Member

    Joined:
    Jun 1, 2009
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    I dont know if you can help me or not but what i am trying to do is search database for words containing certain letters eg. i fI entered TEST it would find words contianing those letters (or a subset of them)

    so how would I write that. I tried:
    regexp '[t{0,2}e{0,1},s{0,1}]' but that isnt working it says that its the wrong syntax.

    So Im wondering how I would right an expression to do this?

    Any help would be appreciated.
     
  7. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    Try
    Code:
    regexp 't{0,2}e{0,1}s{0,1}'
     
  8. gcue

    gcue New Member

    Joined:
    Sep 23, 2009
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    i am trying to match string A to these scenarios (sql table values)

    string A = "G283H88620TS"

    matching sql table values
    MX0G2H883H7426288620TS (contains the string but has characters before, in the middle, and at the end)
    UIERG283H88620TSERU (contains the string but has characters before and at the end)
    G283H88620TSKDJF (contains the string but has characters at the end
    KEKEG283H88620TS (contains the string but has characters at the begining

    unfortunately there is no pattern as to where these interrupting characters reside

    (this only works for the last 3 scenarios but not the first)

    "SELECT * FROM aInventoryInfo WHERE SerialNumber LIKE '%" & $serial & "%'"
     
  9. gicprope

    gicprope New Member

    Joined:
    Oct 9, 2009
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    Relational Database Design is one of the most powerful ways to ensure data integrity and a great way to kick-off any project. Very often the first thing developers do when starting a new project, or stub-project, is to design the database. This way the structure of the application is already in place and we just have to fill in the pieces with some server-side code. I’ve found when adding relational constraints to your database design you add in a very powerful error reporting tool that will let you know during the development process that you have allowed something to happen that shouldn’t have. In this article, I go through, step by step, showing how to set up a simple relational database and discuss the benefits that are enjoyed.

    Let’s take a step back and describe what a relational database looks like. In any normal database design there are fields in one table that reference another table. For example, a books table might have a field labeled author_id which is meant to come from a table named authors. Creating hard-coded relations solidifies these associations and actually returns a MySQL error if violated.

    As I hinted in the opening I have found this to be invaluable during the development and testing process as MySQL will immediately let me know that I have made a glaring error that otherwise may not have been noticed until after the service has launched. At that point the data could be irreparably corrupt and forced to start from scratch.

    So let’s get right to it. For the purposes of this article, I’m going to pretend I’m creating a simple Books and Authors website with a simple 2-table setup. The first step is to create our tables.
    Code: SQL


    CREATE TABLE `library`.`books` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
    `name` VARCHAR( 150 ) NOT NULL ,
    `author_id` INT UNSIGNED NOT NULL ,
    PRIMARY KEY ( `id` ) ,
    INDEX ( `author_id` )
    ) ENGINE = InnoDB
     
  10. scvinodkumar

    scvinodkumar New Member

    Joined:
    Nov 4, 2009
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    0
    Is this both query are similar? or which query is best to you use.

    SELECT name FROM employees WHERE skill_sets REGEXP '[[:<:]]php[[:>:]]'

    SELECT name FROM employees WHERE skill_sets LIKE '%php%'
     
  11. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    They are similar but not the same, in your case as it seems you are looking for people having PHP skills, you should use the 1st query!
     
  12. scvinodkumar

    scvinodkumar New Member

    Joined:
    Nov 4, 2009
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    0
    Thanks pradeep but I have checked the execution time for two queries,1st query takes much time than 2nd query.
     
  13. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    Yes the regexp operator consumes more resources than a like!
     
  14. noky

    noky New Member

    Joined:
    Nov 8, 2011
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    0
    Hi!

    I have a table with one text field. This field can have urls I need get only de urls but no the rest of the field. I have gotten this with a query first and then a php function but I think will be more efficient get the urls with only one query. Also, If I use this solution I get repeated urls and I should create other function to delete that then is quite unefficient... My database is so big, around 3.5gb then I need be very efficient...

    For example:

    This is my solution with php:


    PHP:
    $sql " SELECT field_text FROM  table";

    function 
    researchUrl ($text_field){

        
    $pattern '`.*?((http|ftp)://[\w#$&+,\/:;=?@.-]+)[^\w#$&+,\/:;=?@.-]*?`i';
        if (
    preg_match_all($pattern,$text_field,$matches)) {
            return 
    $matches[1];
        }else{
            return 
    0;
        }
    }

    But with this solution I get "url1", "url2" and "url1"

    I would use a regular expression in a query to obtain them directly .. Something like that:


    PHP:
    SELECT text_field
    FROM table
    WHERE text_field
    REGEXP  
    '`.*?((http|ftp)://[\w#$&+,\/:;=?@.-]+)[^\w#$&+,\/:;=?@.-]*?`i'

    But with this query mysql return this error: "Got error 'repetition-operator operand invalid' from regexp" and I don't know why...

    I hope you can help me... Thank you!

    Regards!
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice