Content-based Tag Suggestions

Last modified by Michael Hamann on 2023/02/03 17:18

Estimated workload

175 hours (Medium size project)




In XWiki, Tags can be applied to any page. At the moment, suggested tag values are just all tags. As good tags can help organizing and finding documents, it could be interesting to suggest suitable tags automatically. Suggested tags could, e.g., be displayed after the actually used tags and users with edit rights could easily add them with one click (per tag). Alternatively, they could be displayed as part of the suggestions:


To get suggestions, the idea for this project would be to rely on tf-idf. This is basically how often a term occurs in the document vs. in how many documents this term occurs, so it favors terms that are frequently in the current document but that don't occur in other many other documents. Solr, which is used for XWiki's search feature, already provides an idf function and could thus be used to compute the values.

To get started, you should create a prototype that demonstrates the general feasibility of using Solr for this by, e.g., just writing Velocity code that executes the Solr query as shown in the documentation of the Solr Search Application to compute some td-idf scores. You might need to modify the Solr configuration for this. If this turns out to be infeasible, please discuss with us why and suggest a different approach like indexing term frequencies in a database table.

For your proposal, you should provide details how to display tag suggestions and how they will be computed. Your proposal should be realizable as an extension but you may also propose changes to XWiki itself that are necessary to realize this as an extension (like Solr configuration adjustments).

If you like this idea of automatic tagging but want to propose another framework for getting tags/tag suggestions, have a look at the related project about AI based tagging of pages.

Developer profile

Web developer with some knowledge of Java or Velocity. Prior experience with Solr and/or natural language processing is a plus.





Get Connected