General Actions:
XWiki is a very flexible wiki, in use in massive or small sites, with both highly structured and/or very textual content. This flexibility should be in the SOLR component:
Finally, this component should support calibrating the search engine's parameter (such as the Dismax' parser coefficients, the analyzers usefulness, ...) by classical quantitative methods such as precision-and-recall, which any wiki master, or a collaborator, should be able to exploit and report with.
First impulse for this work exist:
The idea of the project is to use Apache SOLR as the search engine for XWiki. XWiki is using Lucene as a core component for Wiki Search. Lucene is little hard to configure and doesn't support features like facet search, hit highlighting, customizing search relevancy using boost index out of the box. Solr stands out in its minimal configuration to implement the search engine.Few libraries and a couple of XML configuration files are sufficient to implement a well to do engine. Configuring multiple languages is easy in SOLR compared to Lucene.Using SOLR, one can customize the indexing process by using required analyzers with selected tokenizers and set of filters on the dataset to generate highly customizable relevancy index. Through the front end, the user can select or configure the fields to be searched for and their weight which contributes to the document score. The link to the Design Page is given below:
http://dev.xwiki.org/xwiki/bin/view/Design/SOLRSearchIntegration
| Week | Days | Description |
|---|---|---|
| 25 April- 20 May | Community Bonding period. Get familiar with XWiki Platform, Coding practices, come up with a good design proposal for the project and fix some JIRA issues. | |
| Week 1 | 21 May- 27 May | Work on API by Speaking to mentors and community |
| Week 2 | 28 May- 03 June | Work on solr embedded server component |
| Week 3 & 4 | 04 Jun-17 Jun | Complete the solr embedded server component , basic front search gui and facet search implementation at the back end |
| Week 5 & 6 | 18 Jun- 28 Jun | Customizing fields using index, hit highlighting and partial indexing of attachments done. |
| Milestone 1 | June 29 | Share the basic solr search component and get the feedback. |
| Week 6 & 7 | 29 Jun- 08 Jul | Documentation, Refactoring and code optimization. Improvise the Solr Component based on the feedback. Implement facet search (GUI), Implement indexing of comments. |
| Week 8 | 09 Jul - 15 Jul | Complete the search component with customizable search fields.Integrate analyzers for different languages. |
| Week 9-10 | July 17- 28 | Work on the Admin part |
| Week 11 | 30 Jul-08 Aug | Work on search filter, debug mode, Sorting based on relevancy, Auto suggest and Quick search bar |
| Milestone 2 | 09 Aug | Share the Admin part and Advanced search |
| Week 12 | 10 Aug- 12 Aug | Documentation, setup file, User guide |
| Week 13 | 13 Aug - 18 Aug | Testing with some real time time data and calibrate the indexes. Test the quality of search engine by creating a test suite creator and evaluator, Documentation on calibrating. |
| Milestone 3 | 19 Aug | Sharing the work with community |
* Initialize the component and load the solr configuration on server start.
* Do a incremental indexing for wiki content and make it ready for querying the data.( In the existing set up indexing is done when the first search is made, can make this configurable - Having the indexing at the start or on the first search - good for small wikis )
* Register to the page events using xwiki-platform-observation, to reindex the documents for add/delete/edit operations.
* Allow the user to query with customizing the fields, to search only title, body,comments, attachments and other metadata.
* Allow the administrator to tweak the weights to boost relevancy on particular fields.
* Writing more JUnit tests and follow Test Driven development.
Below is the link to the recent source code :
https://github.com/xwiki-contrib/xwiki-platform-solr
I have configured XWiki server in Amazon ec2 cloud to play around with Solr Search Component. Have included few documents in the following spaces : Programming, Places, Flora, Fauna . Below is the link to the running instance.
http://savitha.hoplahup.net/xwiki/bin/view/Main/AdvancedSearch
Detailed Progress
The detailed Progress of the Solr Project could be found here
https://docs.google.com/spreadsheet/pub?key=0AkC67pvTmc3zdHNaRldQdTVFaTJ1SkhpbVd2UnhOX0E&output=html
The API is given as a part of the Design page. Link to the Design Page is given below :
The current features of Solr are explained in the link below:
http://dev.xwiki.org/xwiki/bin/view/GoogleSummerOfCode/SolrSearchApplication