Create A Super Fast Search with Sphinx

Discussion in 'Web Development' started by pradeep, Apr 1, 2013.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    Sphinx (acronym for SQL Phrase Index) is a full-text search engine, it runs as a daemon and serves to requests of client applications. Client applications need to access Sphinx daemon via the native SphinxAPI, for which libraries are available in almost all popular languages like PHP, Perl, Ruby, Java, C# etc. The client can also access the search daemon via Sphinx's own MySQL network protocol called SphinxQL or via a MySQL storage engine called SphinxSE.

    Sphinx can load data from various sources like MySQL, PgSQL, ODBC, XML file, etc. and then it creates indexes, you can also update indexes from time to time from the data source. In this article we'll see how to install and setup Sphinx and then use it from a PHP script.

    Installing Sphinx



    Get the latest release of Sphinx from from http://sphinxsearch.com/downloads/release/ and extract everything from the tarball, and run configure:

    Code:
    $ tar xvzf sphinx.tar.gz
    $ cd sphinx
    $ ./configure --prefix=/usr/local/sphinx
    $ make
    $ make install
    
    All should go well, configure will automatically figure out the location of MySQL libraries, in case you face any trouble visit http://sphinxsearch.com/ and get help from the docs.

    Configure & Setup Datasource, Indexes & Daemon



    I am going to use MySQL as my datasource, and in the example I'll be have only one table which is a table containing articles with the following fields:

    Code:
    article_id UNSIGNED INT
    article_subject TINYTEXT
    article_body TEXT
    article_dt_added DATETIME
    
    Now, let's go through the config file I am using, the comments will be self explanatory.

    Code:
    # General Settings
    indexer {
        ## max memory limit for the indexer, if you can afford more it'd be better
        mem_limit = 32M
    }
    
    # Settings for the Sphinx daemon
    searchd {
        # port where Sphinx daemon will listen on
        listen = 9312
        
        # logs & their paths
        log = /var/log/searchd.log
        query_log = /var/log/query.log
    
        # some timeouts
        read_timeout = 5
        client_timeout = 300
    
        # this decides the no of concurrent requests search
        # daemon will entertain, set to 0 for unlimited
        max_children = 30
    
        pid_file = /var/run/searchd.pid
    
        # set maximum no of results returned
        max_matches = 1000
    
        seamless_rotate = 1
        preopen_indexes = 0
        unlink_old = 1
    }
    
    # config for indices & sources
    
    source src_my_articles {
        type = mysql
        sql_host = localhost
        sql_user = pradeep
        sql_pass = articlesRead1
        sql_db = articles
    
        # query to fetch the data the needs to be indexed
        sql_query = \
        SELECT \
        article_id*2+1 as Id, \
        article_id, \
        article_subject, \
        article_body, \
        UNIX_TIMESTAMP(article_dt_added) AS article_dt_added \
        FROM \
        articles;
    
        sql_attr_uint = article_id
        sql_attr_str2ordinal = article_subject
        sql_attr_timestamp = article_dt_added
    
        sql_query_info = SELECT Id, article_subject FROM articles WHERE article_id=($id - 1)/2
    }
    
    index idx_my_articles {
        source = src_my_articles
        path = /var/data/idx_my_articles
    
        docinfo = extern
        mlock = 0
        morphology = stem_en
        min_stemming_len = 4
        min_word_len = 1
    }
    
    Our config it ready, now let' create the indexes.

    Code:
    $ indexer --config /path/to/sphinx.conf --all
    
    If you'd like your index to be updated, setup a cron to run at an interval of your choice like this:

    Code:
    */5 * * * * /usr/bin/indexer --config /path/to/sphinx.conf --all --rotate
    
    Now, we'll start the Sphinx search daemon:

    Code:
    $ searchd --config /path/to/sphinx.conf
    
    Search daemon running, let's try some code.

    Search with PHP



    You'll need to the PECL extension for Sphinx installed for PHP, if you don't have it installed check here how to: http://www.php.net/manual/en/sphinx.installation.php

    The code below it a simple example:

    PHP:

    <?php

    $sph 
    = new SphinxClient;
    $sph->setServer("localhost"9312);
    $sph->setMatchMode(SPH_MATCH_ANY);

    $result $sph->query("perl");

    // $result["matches"] contains an associative array with keys as the document ids

    ?>
    Happy building search engines :)
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice