Lucene vs SolR … Exact Search Problems
Lucene and Solr are 2 differents Apache projects :
1) Lucene and Solr are NOT created to work together. Only Solr uses Lucene under the hood. Lucene has no clue about Solr API.
2) Lucene is a powerful search engine framework that lets us add search capability to our application. It exposes easy-to-use API while hiding all the search-related complex operations. Any application can use this library, not just Solr.
3) Solr is built around Lucene. It is not just a http-wrapper around Lucene but has been known to add more arsenal to Lucene. Solr is ready-to-use out of box. It is a web application that offers infrastructure related and a lot more features in addition to what Lucene offers.
4) Lucene doesn’t just create the Index for the consumption by Solr. Lucene handles all the search related operations. Any application can use lucene framework.
Examples are Solr, Elastic Search, LinkedIn (yes, under the hood), etc..
Lucene is a low level Java library (with ports to .NET, etc.) which implements indexing, analyzing, searching, etc.
Solr is a standalone pre-configured product/webapp which uses Lucene. If you prefer dealing with HTTP API instead of Java API, Solr is for you. Solr has also got some extra features on top (e.g. grouping).
A simple way to conceptualize the relationship between Solr and Lucene is that of a car and its engine. You can’t drive an engine, but you can drive a car. Similarly, Lucene is a programmatic library which you can’t use as-is, whereas Solr is a complete application which you can use out-of-box.
At the heart of Lucene is an Index. You pump your data into the Index, then do searches on the Index to get results out. Document objects are stored in the Index, and it is your job to “convert” your data into Document objects and store them to the Index.
Adding a Document/object to Index
Now you need to index your documents or business objects. To index an object, you use the Lucene Document class, to which you add the fields that you want indexed. As we briefly mentioned before, a Lucene Document is basically a container for a set of indexed fields. This is best illustrated by an example:
Document doc = new Document(); doc.add(new StringField("id", "Hotel-1345", Field.Store.YES)); doc.add(new TextField("description", "A beautiful hotel", Field.Store.YES));
In the above example, we add two fields, “id” and “description”, with the respective values “Hotel-1345” and “A beautiful hotel” to the document.
More precisely, to add a field to a document, you create a new instance of the Field class, which can be either a StringField or a TextField (the difference between the two will be explained shortly). A field object takes the following three parameters:
- Field name: This is the name of the field. In the above example, they are “id” and “description”.
- Field value: This is the value of the field. In the above example, they are “Hotel-1345” and “A beautiful hotel”. A value can be a String like our example or a Reader if the object to be indexed is a file.
- Storage flag: The third parameter specifies whether the actual value of the field needs to be stored in the lucene index or it can be discarded after it is indexed. Storing the value is useful if you need the value later, like you want to display it in the search result list or you use the value to look up a tuple from a database table, for example. If the value must be stored, use Field.Store.YES. You can also use Field.Store.COMPRESS for large documents or binary value fields. If you don’t need to store the value, use Field.Store.NO.
StringField vs TextField: In the above example, the “id” field contains the ID of the hotel, which is a single atomic value. In contrast, the “description” field contains an English text, which should be parsed (or “tokenized”) into a set of words for indexing. Use StringField for a field with an atomic value that should not be tokenized. Use TextField for a field that needs to be tokenized into a set of words. (Read more Here) (lucene-tutorial)
When should I use Lucene then?
If you need to embed search functionality into a desktop application for example, Lucene is the more appropriate choice.
For situations where you have very customized requirements requiring low-level access to the Lucene API classes, Solr may be more a hindrance than a help, since it is an extra layer of indirection.
What is Solr?
Apache Solr is a web application built around Lucene with all kinds of goodies.
It adds functionality like
- XML/HTTP and JSON APIs
- Hit highlighting
- Faceted Search and Filtering
- Geospatial Search
- Fast Incremental Updates and Index Replication
- Web administration interface etc
Unlike Lucene, Solr is a web application (WAR) which can be deployed in any servlet container, e.g. Jetty, Tomcat, Resin, etc.
Solr can be installed and used by non-programmers. Lucene cannot.
Some Extra Links around the subjets of search differences and serach problems related to razuna using Lucene in his Digital assets management :
Lucene 4 Essentials for Text Search and Indexing