XWiki @ Google Summer of Code 2023

Last modified by Michael Hamann on 2024/03/11 13:47

This page hosts information and project ideas for the open source project XWiki related to the Google Summer of Code 2023 mentorship program.

About GSoC

You can learn a lot about the program by reading the GSoC FAQ. The timeline of this year's edition is given here.

Getting Started

As a new GSoC Contributor candidate, you're probably looking for this section! Please read it carefully (and the linked resources) before asking for guidance or where to start. It applies to all proposed projects and it covers many topics asked by newcomers.

  1. Being part of the XWiki community means knowing our rules and practices.
  2. As a GSOC Contributor or candidate, you need to make sure you read and apply our guidelines. It also contains suggested way of getting to know the XWiki project and its community. This is where you find out what you need to get up to speed with developing extensions or making changes to the XWiki product.

Please do not skip this part. Don't be that person joining the community and asking "where do I start?". This is your first test you need to pass emoticon_smile

Application template

When applying for one of our projects, please provide this information about yourself and the project you choose in the application which you submit to Google.

Selected Projects for GSoC 2023 (1)

The projects below, out of all the proposed projects, have been selected to participate in GSoC 2023.

Interactive Link Visualization by Rajat Khanduri

sigmajs.png
Example Visualization using Sigma.js

The idea of this project is to create an interactive visualization of the structure of a wiki or a part of it. This should be done in the form of a graph where nodes represent documents in the wiki and edges represent links. Such visualizations have been created for DokuWiki but also more recently developed tools like Obsidian or Roam have graph view features. The main tasks would be to:

  • Develop a data source that provides link information combined with some document information. A major challenge here will be performance and rights as the data must not include any documents inaccessible to the current user. Therefore, in particular for non-admins, the graph size needs to be limited. Further, filtering options should be provided to, e.g., just consider a space or documents that have a relationship with the current document.
  • Develop a client-side visualization (could be a macro) to display such a graph. The visualization should provide some interactivity and in particular the possibility to visit the displayed documents.

A good choice for the client-side visualization is probably a force-directed layout as available in D3. As there may be documents without links, special features for disjoint graphs should be considered. Another very interesting choice for the visualization is Sigma.js. Their main demo visualizes links in Wikipedia, so this is very fitting and the performance seems to be much better than D3. They also have a demo with mouse manipulation. Therefore, feature-wise for this project it seems quite similar to D3. It should be investigated how Sigma.js handles disconnected nodes and graphs with several connected components as this is not clear from the provided demos.

There are countless possibilities to further extend this project, here are just some ideas:

  • Provide a Panel to display a visualization centered around the current document.
  • Provide a background job to produce a dump of the relations of the whole wiki, depending on a configuration option (should be an option for admins) this dump could then be visualized just by admins or also by all users of a group or even any user without further delay. Possibly, this could also be something like a daily job to produce a new graph. Possibly, it would also make sense to offer an export option, e.g., in a file format for Gephi. This feature probably makes more sense for Sigma.js than for D3 as D3 is only suitable for small graphs but with D3 it could still be interesting with the export option.
  • Display further data in the graph, like some metadata or even a preview of the content when hovering (or clicking on) nodes.
  • Provide a way to cache the layout of the graph on the server such that the layout doesn't need to be re-created every time. Alternatively/additionally, the layout could also be calculated on the server.
  • Provide ways to modify the color and/or shape of the nodes based on some properties like space, tags or also generically some XObject properties (could be configurable).
  • Adjust the layout algorithm by adding attracting forces based on shared spaces such that documents of the same space are grouped together even if there are no links (this might be more difficult with Sigma.js).
  • Make the layout algorithm configurable (adjust some of the forces), possibly even in view-mode with a live preview.

The idea is not to implement all of these ideas but rather to select and provide more details for a few of them. Of course, also new ideas are very welcome.

To get started, please complete the following tasks:

  • Write a Velocity macro that displays a list of all links of the current document (hint: there is a method to get them in the Document API). You should add some (internal) links such that they are displayed in the list.
  • Use the visualization library you chose to visualize these links (just the links from the current document to all linked documents, no links in-between them).

This will look boring but it can give you a first idea how to collect data and display a simple visualization. For the real project you should develop a proper API between the server and the client, for this prototype you can just generate JavaScript code in Velocity to add the links in the graph. Feel free to ask if you need any help or pointers to further documentation.

Related Resources

XWiki Extensions

  • XWiki Interactive Maps Application: some code and features can be interesting for introducing facet filtering on the graph
  • Page Relations: contains further Solr queries examples, and it could be useful to display such relations as well, even when backlinks are not present.

Other Knowledge Graphs Visualization Tools / UX

Coordinated by


Contributor
Estimated workload: 350 hours (Large size project).
Difficulty: Hard.
Read more...

Proposed Projects (10)

You can contribute to this list! We encourage would-be mentors to propose ideas or even to revive old ideas from the previous years, but, if you do, make sure to assign yourselves as mentors for the proposed project. Proposals without mentors will be discarded, as we have learned from previous years that they are counter-productive.

Candidate Contributors are also encouraged to propose projects, but make sure you first present and discuss your idea on the forum (dev category) and that they find a mentor interested in the idea who should then be assigned to the project proposal.

We also recommend candidates interested in joining XWiki for GSoC 2023 to manifest themselves as early as possible so that we can start working with them on this project list. Communication is the key to success at GSoC!

AI based tagging of pages

The goal of this project is to allow pages to be automatically tagged using a machine learning framework of your choice, based on the content of the page and its metadata.

There are two similar project proposals in the area of automatic tagging that could give you inspirations how this could be implemented with your chosen framework: content-based tag suggestions (using TF-IDF to suggest tags) and Organizing Knowledge Using Topic Models (using LDA for a global analysis).

Technical requirements

  • An offline solution is preferable. Sending wiki content to an external service should be avoided. Self-hosted third parties can be considered, but it's better if they can be avoided. This can lead to additional complexity in deployment and maintenance.
  • It is preferable to propose a solution that does not require excessive resources to deploy. (e.g. solutions that require several gigabytes of memory and/or disk should preferably be avoided).

Coordinated by


Estimated workload: 175 hours (Medium size project).
Difficulty: Medium.
Read more...

Automatic Screenshots in Documentation

At the moment, screenshots in XWiki's documentation are manually created and thus rarely updated even when the actual design or layout of the application changes. The goal of this project is to change this by automating the production of screenshots for the documentation. This consists of several parts:

  • Actually take the screenshots, this should be done as part of our existing docker-based UI tests that automatically test the UI using Selenium. New UI tests should be added for at least some parts of the documentation, the idea being to cover some representative cases such that important issues can be discovered and fixed during the project.
  • Store the screenshots. We need to decide if the screenshots should be stored in XWiki (the instance on xwiki.org) or in some separate place. Further, this requires to decide screenshots for which versions should be stored (e.g., current LTS and stable version and release candidate when it is more recent than the stable version, maybe the most recent build on each branch). We probably also need to ensure that we have CI jobs executed on the release itself. We will also need some way to clean up old screenshots such that we don't collect screenshots forever.
  • Display the screenshots. Depending on how the screenshots are stored, we probably need some macro to display the automatically taken screenshots. For this, we also need to decide for which version the screenshot should be displayed by default and how the user can see screenshots for other versions (e.g., some kind of version selector).

Please develop more detailed ideas for these parts as part of your proposal.

We're currently considering implementing screenshot-based regression testing (aka visual testing). While this doesn't need to bother you too much, there are some areas of intersection you should be aware of and work together with us if necessary:

  • For regression testing, we will have some way to determine if two screenshots match. This could also be used in this project to determine if a new screenshot should be stored.
  • Regression testing also needs a way to store known-good screenshots for different branches. Maybe this could be integrated with the screenshot storage of this project.
  • The screenshots that are created by this project should be used for regression testing.
  • This could be used to improve XWiki release notes so that when some improvement or new feature change an existing UI of XWiki, the release notes can show the difference before/after by showing the image diff.

To get a rough idea of this project please complete the following tasks:

  • Adapt an existing integration test to take a meaningful screenshot of some UI element. Interesting tests could, e.g., be the tests for AppWithinMinutes.
  • Let the test automatically upload the screenshot to an existing XWiki instance using the REST API for attachments (as attachment for an existing document). You may put both the URL, the name of the attachment and the credentials for this instance in the code for this prototype (extra credits for getting both URL and credentials from an environment variable). In the end, the attachment name could be something like {name}-{version} but for the prototype, you can just use a fixed name (or use two made-up versions to get more input for the next task).
  • Write a wiki macro that lists all attachments with a certain name-prefix, displays the most recent version and displays a select for the versions.

This is not necessarily how the final project will be developed, but these tasks should help you to familiarize yourself with the tools that you'll use for the project itself and demonstrate us that you know the technologies required for this project. Further, you might already identify potential issues that you can then address in the project itself. Feel free to ask for clarification and help regarding these tasks.

Coordinated by


Estimated workload: 350 hours (Large size project).
Difficulty: Medium.
Read more...

Block editor for wiki pages

Today the content of the wiki pages platform is stored in the XWiki 2.1 syntax and can be input by the users either by using a text editor in which they type XWiki syntax or through a Wysiwyg editor based on CKEditor. When the Wysiwyg is used the content is then converted back to XWiki 2.1 syntax and stored in the database.

The Wysiwyg editor used today (CKEditor) allows to edit a whole page at once, with menus and input for the entire page at once.

The objective of this project is to create an editor for XWiki pages that would allow to edit the page as a suite of "blocks", where each block has a type, its own formatting option, and which has advanced layouting options for the blocks in the final result. Then, the result produced by this editor would still need to be transformed in XWiki 2.1 syntax and stored in the wiki pages content.

The objective is to use an existing open source front end component/library for the editor itself, not to write a new one from scratch, unless there is a clear benefit, which should be detailed in the proposal.

A solution where the block editor only supports a subset of elements compared to the current elements supported by the wiki pages is acceptable, if a path to the full implementation is possible and proposed. A subset XWiki syntax may need to be defined for this subset of elements supported by the block editor.

Resources:

XWiki Syntax - for evaluating the supported elements https://www.xwiki.org/xwiki/bin/view/Documentation/UserGuide/Features/XWikiSyntax/ .

Open source block editors:

Very basic demo of the editorjs integration: EditorJS.WebHome-working.xar: import in your wiki instance and then go to the Administration ->Editing -> Wysiwyg Editor and set "Blocks" for the default editor - headings don't work correctly because of a bug in editorjs.

Coordinated by


Estimated workload: 350 hours (Large size project).
Difficulty: Hard.
Read more...

ChatGPT Integration to generate/update page content

Goals:

Coordinated by


Estimated workload: 350 hours (Large size project).
Difficulty: Medium.
Read more...

Content-based Tag Suggestions

In XWiki, Tags can be applied to any page. At the moment, suggested tag values are just all tags. As good tags can help organizing and finding documents, it could be interesting to suggest suitable tags automatically. Suggested tags could, e.g., be displayed after the actually used tags and users with edit rights could easily add them with one click (per tag). Alternatively, they could be displayed as part of the suggestions:

SuggestTag.PNG

To get suggestions, the idea for this project would be to rely on tf-idf. This is basically how often a term occurs in the document vs. in how many documents this term occurs, so it favors terms that are frequently in the current document but that don't occur in other many other documents. Solr, which is used for XWiki's search feature, already provides an idf function and could thus be used to compute the values.

To get started, you should create a prototype that demonstrates the general feasibility of using Solr for this by, e.g., just writing Velocity code that executes the Solr query as shown in the documentation of the Solr Search Application to compute some td-idf scores. You might need to modify the Solr configuration for this. If this turns out to be infeasible, please discuss with us why and suggest a different approach like indexing term frequencies in a database table.

For your proposal, you should provide details how to display tag suggestions and how they will be computed. Your proposal should be realizable as an extension but you may also propose changes to XWiki itself that are necessary to realize this as an extension (like Solr configuration adjustments).

If you like this idea of automatic tagging but want to propose another framework for getting tags/tag suggestions, have a look at the related project about AI based tagging of pages.

Coordinated by


Estimated workload: 175 hours (Medium size project).
Difficulty: Medium.
Read more...

Create a integrable solution Two-factor authentication (2FA) for XWiki

2FA requires you to enter an extra code when you log in or perform some account-sensitive action (e.g. changing your password).

XWiki allows writing custom authenticator and there is many of those, see http://www.xwiki.org/xwiki/bin/view/Documentation/AdminGuide/Authentication/#HCustomAuthentication but they are written one by one and they don’t share much with each other.

Coordinated by


Estimated workload: 350 hours (Large size project).
Difficulty: Medium.
Read more...

Interactive Link Visualization

sigmajs.png
Example Visualization using Sigma.js

The idea of this project is to create an interactive visualization of the structure of a wiki or a part of it. This should be done in the form of a graph where nodes represent documents in the wiki and edges represent links. Such visualizations have been created for DokuWiki but also more recently developed tools like Obsidian or Roam have graph view features. The main tasks would be to:

  • Develop a data source that provides link information combined with some document information. A major challenge here will be performance and rights as the data must not include any documents inaccessible to the current user. Therefore, in particular for non-admins, the graph size needs to be limited. Further, filtering options should be provided to, e.g., just consider a space or documents that have a relationship with the current document.
  • Develop a client-side visualization (could be a macro) to display such a graph. The visualization should provide some interactivity and in particular the possibility to visit the displayed documents.

A good choice for the client-side visualization is probably a force-directed layout as available in D3. As there may be documents without links, special features for disjoint graphs should be considered. Another very interesting choice for the visualization is Sigma.js. Their main demo visualizes links in Wikipedia, so this is very fitting and the performance seems to be much better than D3. They also have a demo with mouse manipulation. Therefore, feature-wise for this project it seems quite similar to D3. It should be investigated how Sigma.js handles disconnected nodes and graphs with several connected components as this is not clear from the provided demos.

There are countless possibilities to further extend this project, here are just some ideas:

  • Provide a Panel to display a visualization centered around the current document.
  • Provide a background job to produce a dump of the relations of the whole wiki, depending on a configuration option (should be an option for admins) this dump could then be visualized just by admins or also by all users of a group or even any user without further delay. Possibly, this could also be something like a daily job to produce a new graph. Possibly, it would also make sense to offer an export option, e.g., in a file format for Gephi. This feature probably makes more sense for Sigma.js than for D3 as D3 is only suitable for small graphs but with D3 it could still be interesting with the export option.
  • Display further data in the graph, like some metadata or even a preview of the content when hovering (or clicking on) nodes.
  • Provide a way to cache the layout of the graph on the server such that the layout doesn't need to be re-created every time. Alternatively/additionally, the layout could also be calculated on the server.
  • Provide ways to modify the color and/or shape of the nodes based on some properties like space, tags or also generically some XObject properties (could be configurable).
  • Adjust the layout algorithm by adding attracting forces based on shared spaces such that documents of the same space are grouped together even if there are no links (this might be more difficult with Sigma.js).
  • Make the layout algorithm configurable (adjust some of the forces), possibly even in view-mode with a live preview.

The idea is not to implement all of these ideas but rather to select and provide more details for a few of them. Of course, also new ideas are very welcome.

To get started, please complete the following tasks:

  • Write a Velocity macro that displays a list of all links of the current document (hint: there is a method to get them in the Document API). You should add some (internal) links such that they are displayed in the list.
  • Use the visualization library you chose to visualize these links (just the links from the current document to all linked documents, no links in-between them).

This will look boring but it can give you a first idea how to collect data and display a simple visualization. For the real project you should develop a proper API between the server and the client, for this prototype you can just generate JavaScript code in Velocity to add the links in the graph. Feel free to ask if you need any help or pointers to further documentation.

Related Resources

XWiki Extensions

  • XWiki Interactive Maps Application: some code and features can be interesting for introducing facet filtering on the graph
  • Page Relations: contains further Solr queries examples, and it could be useful to display such relations as well, even when backlinks are not present.

Other Knowledge Graphs Visualization Tools / UX

Coordinated by


Contributor
Estimated workload: 350 hours (Large size project).
Difficulty: Hard.
Read more...

Macros for displaying and editing properties

When displaying or editing a document that has structured data with properties like this Google Summer of Code project document, a so-called sheet document is used that defines how these properties are displayed. For example, the sheet for Google Summer of Code projects defines that the mentor(s) are displayed first. At the moment, such sheets use Velocity code for displaying both the names and the values of the properties. An example for such a sheet is provided in the FAQ application tutorial. Not all users in a wiki are allowed to create documents with Velocity code. The idea of this project is to create macros for displaying and editing properties such that sheets can be created by users without script right. For this, common needs how to display properties should be identified and the macros should also be integrated in Apps Within Minutes which automatically creates such sheet documents as a step towards making it possible to use it without script right.

The idea of this project is not to develop an independent extension but to contribute new core features to XWiki. As such, a close collaboration with the mentor(s) and the rest of the XWiki development team is required to make sure that the contributions match the best practices of XWiki and any API changes (e.g., new macro parameters) need to be agreed by the other developers. Also, extensive unit and integration tests need to be added. At least parts of the features should be added to the display macro which currently just allows displaying a whole page such that this macro can also display individual properties. It needs to be researched and discussed if displaying a label and possibly hint for the property should be done in an extra macro or in the same macro (e.g., controlled by a parameter). Further, there might be the need for other features like hiding empty properties in view mode or displaying some properties like a title only in edit mode.

This project is quite technical and won't produce anything flashy but it will be an important and valued contribution to the XWiki project. Through this project, you can gain a deep understanding of core parts of XWiki but also software development practices like testing.

To get started, it is suggested to follow the FAQ application tutorial in order to get a basic understanding how sheets work. Further, you should understand the vertical form layout and in-place editing as these are the HTML structures that need to be created by the macro(s). If necessary, it could also be possible to modify in-place editing to support whatever is needed for the macro(s), this is something to take into account when developing the macros. For example, it could be interesting to move attributes from the definition list to some inner element to make it possible to put the macro for the label and hint inside a regular definition list syntax.

Coordinated by


Estimated workload: 350 hours (Large size project).
Difficulty: Medium.
Read more...

Organizing Knowledge Using Topic Models

A topic model can be used to discover the topics in a collection of documents. The idea here is that a document consists of different topics, or rather, that its words are drawn from different topics. A topic again is a collection of words, though also with a frequency (probability) distribution such that different words can have different importance. Analyzing the topics that are covered in different documents of a wiki can help organizing the wiki by, e.g., assigning tags to documents of a topic or grouping documents of a topic in a space (a concept in XWiki that is similar to a directory in a filesystem). You can read more about how topic discovery can be applied for knowledge management in this article.

The idea of this project is to integrate an existing Java library that implements a topic model like LDA in XWiki. As, in particular for large wikis, the analysis will take quite some time, the idea would be to have a background job to run the analysis that writes the result (e.g., to some document, details to be discussed) and then to have a UI for inspecting the result. This UI should offer various options like assigning a tag to all documents of a topic or moving all documents of a topic to a space.

As task to get started, you should find a library that implements LDA or a similar topic model, create a prototype with a macro in Java that takes a list of documents, gets their text content (find out how to render to plain text or just take the text without rendering) and then feeds these documents into the library. It should then display the result of the analysis in textual form. This doesn't need to be clean and nice, but it should show us that you're able to work with XWiki's Java API and that the library you chose is working as expected.

In your proposal, you should detail how you want to transform this prototype into a background job for the analysis and a UI for presenting the result. Depending on the details the implementation could become a lot of work, one suggestion would be to keep the UI parts as simple as possible for now as advanced users could always write their own scripts to further process the results.

If you like this idea of using machine learning to organize the content of the wiki but don't want to use topic models, have a look at the related project about AI based tagging of pages.

Coordinated by


Estimated workload: 350 hours (Large size project).
Difficulty: Medium.
Read more...

Package and deploy XWiki in a well-known repository of applications

The idea is to create a package of XWiki and deploy it where people can find it.

Here are some examples:

The first step is to actually create a clean and working package for a version of XWiki.

Then the generation and publication of a package need to be automated as much as possible to be integrated to the XWiki release process.

The candidate should focus the proposal on a specific package/repository as implementing several is probably not going to fit in GSOC timeframe (promising too much is really not a good way to be selected).

Coordinated by


Estimated workload: 175 hours (Medium size project).
Difficulty: Medium.
Read more...

Mentors (6)

The following community members are assigned to mentor the proposed projects:

Contact us

You can ask for more information about each project proposal and interact with the community and mentors through the usual communication channels: forum or the Matrix channel.

What's next after GSOC?

First and foremost: Thank you for having participated to XWiki!

We want to keep you in the community for as long as possible. We understand that you may have school projects to carry on and won't have the time to continue participating much immediately after GSOC. However, we're really keen to see you coming back to this community when things settle a bit more and you get some time again.

Here's some visibility and ideas of what's next after you've completed a GSOC project and opportunities you may have:

  • You could be voted as Committer
  • You could get hired by one of the companies doing some business on top of the XWiki project
  • Become a Google Code-In mentor
  • You could propose a school project, PhD, etc about XWiki to your teachers!
  • You'll be able to add a nice line to your CV about having participated to an open source project emoticon_smile
  • You can ask for recommendations on LinkedIn, Facebook, etc about your work as a GSOC Contributor
  • (Future, doesn't exist ATM) Your name on the Hall of Fame
  • (Future, doesn't exist ATM) Receive an XWiki GSOC t-shirt
  • (Future, doesn't exist ATM) Be sponsored to take about XWiki at conferences
  • (Future, doesn't exist ATM) Be able to implement some bounties for XWiki and get paid for it
  • (Future, doesn't exist ATM) Be invited to physically participate to an XWiki conference

Org Admin Resources

If you are one of this year's XWiki Organization Administrators, make sure to check out the Organization Admin Guide.

Previous GSoC editions

Tags: gsoc
   

Get Connected