Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Web Development (http://www.go4expert.com/articles/web-development/)
-   -   Create A Super Fast Search with Sphinx (http://www.go4expert.com/articles/create-super-fast-search-sphinx-t29586/)

pradeep 1Apr2013 19:30

Create A Super Fast Search with Sphinx
 
Sphinx (acronym for SQL Phrase Index) is a full-text search engine, it runs as a daemon and serves to requests of client applications. Client applications need to access Sphinx daemon via the native SphinxAPI, for which libraries are available in almost all popular languages like PHP, Perl, Ruby, Java, C# etc. The client can also access the search daemon via Sphinx's own MySQL network protocol called SphinxQL or via a MySQL storage engine called SphinxSE.

Sphinx can load data from various sources like MySQL, PgSQL, ODBC, XML file, etc. and then it creates indexes, you can also update indexes from time to time from the data source. In this article we'll see how to install and setup Sphinx and then use it from a PHP script.

Installing Sphinx



Get the latest release of Sphinx from from http://sphinxsearch.com/downloads/release/ and extract everything from the tarball, and run configure:

Code:

$ tar xvzf sphinx.tar.gz
$ cd sphinx
$ ./configure --prefix=/usr/local/sphinx
$ make
$ make install


All should go well, configure will automatically figure out the location of MySQL libraries, in case you face any trouble visit http://sphinxsearch.com/ and get help from the docs.

Configure & Setup Datasource, Indexes & Daemon



I am going to use MySQL as my datasource, and in the example I'll be have only one table which is a table containing articles with the following fields:

Code:

article_id UNSIGNED INT
article_subject TINYTEXT
article_body TEXT
article_dt_added DATETIME


Now, let's go through the config file I am using, the comments will be self explanatory.

Code:

# General Settings
indexer {
    ## max memory limit for the indexer, if you can afford more it'd be better
    mem_limit = 32M
}

# Settings for the Sphinx daemon
searchd {
    # port where Sphinx daemon will listen on
    listen = 9312
   
    # logs & their paths
    log = /var/log/searchd.log
    query_log = /var/log/query.log

    # some timeouts
    read_timeout = 5
    client_timeout = 300

    # this decides the no of concurrent requests search
    # daemon will entertain, set to 0 for unlimited
    max_children = 30

    pid_file = /var/run/searchd.pid

    # set maximum no of results returned
    max_matches = 1000

    seamless_rotate = 1
    preopen_indexes = 0
    unlink_old = 1
}

# config for indices & sources

source src_my_articles {
    type = mysql
    sql_host = localhost
    sql_user = pradeep
    sql_pass = articlesRead1
    sql_db = articles

    # query to fetch the data the needs to be indexed
    sql_query = \
    SELECT \
    article_id*2+1 as Id, \
    article_id, \
    article_subject, \
    article_body, \
    UNIX_TIMESTAMP(article_dt_added) AS article_dt_added \
    FROM \
    articles;

    sql_attr_uint = article_id
    sql_attr_str2ordinal = article_subject
    sql_attr_timestamp = article_dt_added

    sql_query_info = SELECT Id, article_subject FROM articles WHERE article_id=($id - 1)/2
}

index idx_my_articles {
    source = src_my_articles
    path = /var/data/idx_my_articles

    docinfo = extern
    mlock = 0
    morphology = stem_en
    min_stemming_len = 4
    min_word_len = 1
}


Our config it ready, now let' create the indexes.

Code:

$ indexer --config /path/to/sphinx.conf --all


If you'd like your index to be updated, setup a cron to run at an interval of your choice like this:

Code:

*/5 * * * * /usr/bin/indexer --config /path/to/sphinx.conf --all --rotate


Now, we'll start the Sphinx search daemon:

Code:

$ searchd --config /path/to/sphinx.conf


Search daemon running, let's try some code.

Search with PHP



You'll need to the PECL extension for Sphinx installed for PHP, if you don't have it installed check here how to: http://www.php.net/manual/en/sphinx.installation.php

The code below it a simple example:

Code: PHP

<?php

$sph = new SphinxClient;
$sph->setServer("localhost", 9312);
$sph->setMatchMode(SPH_MATCH_ANY);

$result = $sph->query("perl");

// $result["matches"] contains an associative array with keys as the document ids

?>


Happy building search engines :-)


All times are GMT +5.5. The time now is 02:23.