Mentor(s)
Student
Estimated workload
2-3 months
Details
The objective of this project is to exploit the Apache SOLR search engine as indexing and search engine for XWiki.

XWiki is a very flexible wiki, in use in massive or small sites, with both highly structured and/or very textual content. This flexibility should be in the SOLR component:

  • based on SOLR's schema and complementary information, the indexing process should be customizable to only index and store as little information as needed.
  • through code customizability (exploiting the possibility of Groovy code in pages), the transformation of a user-query to a SOLR query should be adjustable, far beyond the simple text-parsing (enabling, for example, the prohibition of some spaces, or the conversion to multiple fields based on input parameters)

Finally, this component should support calibrating the search engine's parameter (such as the Dismax' parser coefficients, the analyzers usefulness, ...) by classical quantitative methods such as precision-and-recall, which any wiki master, or a collaborator, should be able to exploit and report with.

First impulse for this work exist:

Active
Yes
Year

2012

Developer profile
  • Java programming
  • Understanding of Information Retrieval principles
  • experience with Apache SOLR or Apache Lucene a plus.
Status

Selected

Progress

Description

The idea of the project is to use Apache SOLR as the search engine for XWiki. XWiki is using Lucene as a core component for Wiki Search. Lucene is little hard to configure and doesn't support features like facet search, hit highlighting, customizing search relevancy using boost index out of the box. Solr stands out in its minimal configuration to implement the search engine.Few libraries and a couple of XML configuration files are sufficient to implement a well to do engine. Configuring multiple languages is easy in SOLR compared to Lucene.Using SOLR, one can customize the indexing process by using required analyzers with selected tokenizers and set of filters on the dataset to generate highly customizable relevancy index. Through the front end, the user can select or configure the fields to be searched for and their weight which contributes to the document score. The link to the Design Page is given below:

http://dev.xwiki.org/xwiki/bin/view/Design/SOLRSearchIntegration

Milestones

Week                         Days                          Description                                                                      
 25 April-  20 May Community Bonding period. Get familiar with XWiki Platform, Coding practices, come up with a good design proposal for the project and fix some JIRA issues.
 Week 121 May- 27 MayWork on API by Speaking to mentors and community
 Week 228 May- 03 JuneWork on solr embedded server component
 Week 3 & 404 Jun-17 JunComplete the solr embedded server component , basic front search gui and facet search implementation at the back end
 Week 5 & 618 Jun- 28 JunCustomizing fields using index, hit highlighting and partial indexing of attachments done.
Milestone 1June 29Share the basic solr search component and get the feedback.
 Week 6 & 729 Jun- 08 JulDocumentation, Refactoring and code optimization. Improvise the Solr Component based on the feedback.
Implement facet search (GUI), Implement indexing of comments.
 Week 809 Jul - 15 Jul Complete the search component with customizable search fields.Integrate analyzers for different languages.
 Week 9-10July 17- 28Work on the Admin part 
 Week 11 30 Jul-08 AugWork on search filter, debug mode, Sorting based on relevancy, Auto suggest and Quick search bar
 Milestone 2 09 AugShare the Admin part and Advanced search
 Week 12 10 Aug- 12 AugDocumentation, setup file, User guide 
 Week 1313 Aug - 18 AugTesting with some real time time data and calibrate the indexes. Test the quality of search engine by creating a test suite creator and evaluator, Documentation on calibrating.
Milestone 319 AugSharing the work with community
Basic Implementation Steps

* Initialize the component and load the solr configuration on server start.

* Do a incremental indexing for wiki content and make it ready for querying the data.( In the existing set up indexing is done when the first search is made, can make this configurable - Having the indexing at the start or on the first search - good for small wikis )

* Register to the page events using xwiki-platform-observation, to reindex the documents for add/delete/edit operations.

* Allow the user to query with customizing the fields, to search only title, body,comments, attachments and other metadata.

* Allow the administrator to tweak the weights to boost relevancy on particular fields.

* Writing more JUnit tests and follow Test Driven development.

Recent code

Below is the link to the recent source code :

https://github.com/xwiki-contrib/xwiki-platform-solr

Running Instance

I have configured XWiki server in Amazon ec2 cloud to play around with Solr Search Component. Have included few documents in the following spaces : Programming, Places, Flora, Fauna . Below is the link to the running instance. 

http://savitha.hoplahup.net/xwiki/bin/view/Main/AdvancedSearch

Detailed Progress

The detailed Progress of the Solr Project could be found here 

https://docs.google.com/spreadsheet/pub?key=0AkC67pvTmc3zdHNaRldQdTVFaTJ1SkhpbVd2UnhOX0E&output=html

API

The API is given as a part of the Design page. Link to the Design Page is given below :

Design Page 

Features of Solr and Set up Documentation

The current features of Solr are explained in the link below:

http://dev.xwiki.org/xwiki/bin/view/GoogleSummerOfCode/SolrSearchApplication

Tags:
Created by Paul Libbrecht on 2012/03/08 21:29
   

Get Connected