Zend framework lucene UTF-8 problem

I had issues with the zend framework and its implementation of lucene. It saved the values from my UTF-8 database in the lucene files with characters like UTF-8 in ISO 8859-1 like on the search result page. And I wasn’t able to search case insensitive.

I noticed that the apache header (zend server CE) wasn’t sending UTF-8. So I added AddDefaultCharset utf-8 to my httpd.conf. Didn’t help.

What helped: In the Bootstrap.php adding to the init of the search

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());

In the model it is needed to decode it to ISO 8859-1 and than save it as UTF-8. Sounds insane, but it was the only thing that works for me.

$doc->addField(Zend_Search_Lucene_Field::Text('lucene_DB_CLOUMN_NAME',utf8_decode($db_apater_result['DB_CLOUMN_NAME']),'UTF-8'));

WTF Zend Lucene!

2 thoughts on “Zend framework lucene UTF-8 problem

  1. Your right. Sounds insane but it works. Thanks for the hint.
    Saved me a lot of trouble
    *thumbs up*

Leave a Reply

Your email address will not be published. Required fields are marked *