• Real-time indexing and searching with Sphinx 1.10.1-dev

    Real-time indexing and searching is one of the major goals of web development in 2010. As of version 1.10.1 Sphinx is added to the list of search engines that deliver the promise of real-time.

    Remember, in this post I’m using a developers’ trunk, checked out from repository. Get your’s copy here:

    $ svn checkout http://sphinxsearch.googlecode.com/svn/trunk sphinx
    $ cd sphinx
    $ ./configure
    $ make
    $ sudo make install

    Worked for me out of the box on a Debian etch box after failing on Snow Leopard.

    First you need to setup your sphinx.conf file:

    index rt {
      type = rt
      path = /usr/local/sphinx/data/rt
      rt_field = message
      rt_attr_uint = message_id
    }
    
    searchd {
      log = /var/log/searchd.log
      query_log = /var/log/query.log
      pid_file = /var/run/searchd.pid
      workers = threads
      listen = 192.168.24.11:9312:mysql41
    }

    You need listen switch in the searchd section to set up a MySQL protocol based server. However, a couple of quick benchmarks show that it’s significantly slower that the regular searchd.

    After launching search I used the MySQL protocol to index my data. Given the structure (message, message_id) I can now connect to my server and start indexing:

     $ mysql -P 9312 -h 127.0.0.1
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 1
    Server version: 1.10.1-dev (r2309)
    
    Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
    
    mysql> INSERT INTO rt VALUES (1, 'this message has a body', 1);

    Remember, always use single quotes.

    Querying real-time index is just as easy as typing:

    mysql> SELECT * FROM rt WHERE MATCH('message');
    +------+--------+------------+
    | id   | weight | message_id |
    +------+--------+------------+
    |    1 |   1500 |          1 |
    +------+--------+------------+
    1 row in set (0.00 sec)

    And that’s it, now you’re ready to start your real-time search!

    All the details can found in the doc folder of the repository. This is just a brief write up to get you started.

  • My Tools of the Trade

    Inspired by Mike Gunderloy I thought I’d share my setup used for everyday development.

    Hardware

    Plain & simple - 15" MacBook Pro, mid-2009. Just switched from 13” white MacBook and I love the speed boost it gave me. I find Wireless Mighty Mouse very comfortable and useful, and it frees up an USB port. Apple Keyboard (full size) is a clear win. 500GB disk used for Time Machine backups and an additional 320GB 2.5” drive for extra storage.

    At the office I use an extra 19” display - good enough to display logs.

    And yeah, iPhone, of course.

    Software

    • Snow Leopard - early adoption wasn't too painful after all, and it actually might be running a little faster than good ol' 10.5,
    • Safari - my primary browser, used both for development and everyday browsing, boosted with heavily customized Glims,
    • Quicksilver - as I heavy keyboard user I find this application a must, goes great combined with Dockables,
    • 1Password - can't imagine a better credentials manager, I feel completely lost without it,
    • Adium - I don't like IM that much, but we use it at work. I always run beta versions of Adium, they're stable and offer some nice features!
    • Firefox - used from time to time for development, when Safari's debugger's not enough and with the decline of Firebug's quality and workflow Safari stands strong,
    • Fluid - standalone web applications generator, I use it for Google Reader, Instapaper and Blip with custom styles and userscripts,
    • ForkLift - did I mention I'm a keyboard guy, with a strong Norton Commander / Total Commander background?
    • FStream - I can't focus on work without music and this is a simple Internet radio streaming application,
    • GitX - still working out my git workflow and GitX helps a lot,
    • GrandPerspective - shows a graph of files and folders the hogging hard drive, really useful,
    • HTTPScoop - used for local debugging, for online testing I prefer Hurl,
    • Paperless - my document repository, invoices, agreements and such,
    • Parallels - with Windows XP, Ubuntu and Ubuntu Server for cross-browser tweaks and server setup testing,
    • Pixelmator - I'm the last person who should do anything with graphics but sometimes I really have to. Pixelmator's learning curve is acceptable and it does everything I need too,
    • Sequel Pro - Sequel's Pro nightlies are stable and robust with some features that current stable's missing,
    • SizeUp - again, keyboard lovers, resize / move your windows with a 4-finger combos,
    • Skitch - great for screenshots with annotations,
    • TextMate - enough said,
    • Things - great task manager with phone sync (I really need a phone sync for offline usage),
    • tunnelblick - for my company's VPN,
    • Tweetie - I'm more a reader than writer and Tweetie's great for that.

    There’s also a few other applications that I use but those are essential - I keep them in my Dock.

    Hosting

    For my pet projects and fun stuff I use IntoVPS with Server Density monitoring. For production we have an in-house server farm. I’m looking into SliceHost / Linode for production of some smaller private stuff.