DokuWiki importer

Last modified by Thomas Mortagne on 2022/02/25 09:25

Mentor(s)
Student
Details

Filter Stream framework allows converting from one wiki representation to another. The idea here is to provide a Filter input module for DokuWiki so that it's possible to convert from DokuWiki to any Filter output format.

You can get inspiration from http://extensions.xwiki.org/xwiki/bin/view/Extension/Import+DokuWiki+into+XWiki+Application and http://extensions.xwiki.org/xwiki/bin/view/Extension/Dokuwiki+To+XWiki2+Extension.

Developer profile
  • Java
Active
No
Year

2017

Status

Successfully terminated

Progress

Google Summer of Code Submission Report


Description

The goal of the project is to import the data of DokuWiki instances to XWiki instances using the XWiki's filter framework. Unlike many wiki engines, DokuWiki doesn't use a database and instead uses a file based storage scheme. Thus this project takes the DokuWiki directory as the input and imports that to XWiki. This also converts DokuWiki syntax to XWiki/2.1 syntax.

Milestones

Milestone 1 [1 June  - 29 June]: Import the Document structure

I imported the Document structure from DokuWiki to XWiki, by calling relevant methods of the Filter API. This also included setting up the project and getting used to the Filter API. 

Challenges:

  • Refactoring my understanding/misconception of the Filter API. 
  • Getting used to the Component architecture of XWiki. 
  • Discovering DokuWiki engine's approach to strong documents, document revisions and namespaces.
  • Finding the ideal way to export Dokuwiki instance data. There are many ways of exporting data from a dokuwiki instance including copying the directory from system, exporting as XML, exporting as XHTML, or as Text. 
  • Discovering Filter API events and best practices. 
  • Process streams without saving content to memory, and if possible to disk. 
  • Maintaining created/modified time stamps of files. 
  • Understanding the code/approach from the MediaWIki importer. 

What I did:

  • Experiment with the  Confluence importer and MediaWiki importer, to get reference of the functionality of the importer. 
  •  Use the MediaWiki as a template and build of top of it. I also tried to understand the functionality of the MediaWiki importer by comparing the importer logic with MediaWiki documentation. 
  • Since MediaWiki importer had many MediaWiki specific logic/hacks, I was advised to get rid of all the MediaWiki code, and thus start from scratch.
  • I chose to import the whole Dokuwii directory as input, since it contains all the data related to the Dokuwiki instance. 
  • Now I implemented a tree structure to maintain the namespaces and document structure of the input, and call the correct Filter API events. 
  • Tests were written, and post-testing, the extension was released on XWiki nexus repository. 
  • Finally, documentation was created on the XWiki extension page. 
Milestone 2 [1 July - 28 July]: Parse DokuWiki syntax and convert to XWiki syntax

The next step was to parse the DokuWiki syntax and call the relevant methods of the Rendering API, so that the output is XWiki syntax. The idea was to support most of the basic syntax available in DokuWiki. 

Challenges:

  • There wasn't any available syntax parser for Dokuwiki in java. 
  • Deciding the best approach to build a syntax parser.
  • Thorough understanding of the DokuWiki syntax. 
  • Critical differences between XWiki and DokuWiki syntax and macros. 
  • Understand the XWiki rendering API.
  • Issues with apache compress behaviour difference with file input and streaming input.
  • Occasional differences in behaviour in the XML input and the output string from xwiki/2.1 syntax renderer. 
  • Since the parser became pretty big (~1000 lines), making changes became difficult over time. 

What I did:

  • Used apache compress library and added support for any archive supported by apache compress. However, this only supported streaming input, file input had problems with detection of string encoding. 
  • Create an iterative parser that parses the content character by character. 
  • Wrote and passed all the Simple CTS tests, and also wrote specific tests and tested the output. 
  • Removed the Tree structure earlier used to manage document namespaces, since its not required to have the output in the order of namespaces. 
  • The parser extension was designed to be separated from the previous project and thus it was released as a separate extension on XWiki nexus repository.
  • Finally, documentation work was done.  
Milestone 3 [28 July - 21 August]: Import users, revisions and attachments

The final milestone includes importing users, revisions of documents, and attachments. The hope is to produce an importer that can import most basic data from DokuWiki to XWiki. 

Challenges:

  • Frequent requirement to access files multiple times became necessary, thus extracting the archive to disk became a priority. 
  • Document Revision metadata and change-logs are stored separately.
  • linking change-logs with registered user.
  • Maintaining the order of revision file parsing.
  • String encoding problem with File input.
  • Critical differences like attachment belongs to a namespace in Dokuwiki whereas, XWIki in belongs to the document.

What i did:

  • Extracted the input archive to a temporary directory on disk. 
  • Fixed the problem of inability to read encoding for file input of the archive. 
  • Implemented support for reading users, attachments, and document revisions.
  • Tests were updated with the latest additions. 
  • Revision filenames contain timestamp, so sorting them by name made sure, they're in correct order. 
  • Finally an updated version of the extension was released on the XWiki nexus repository. 

Deliverables

D1 [M1]:  First Milestone release, latest version here
D2 [M2]: DokuWiki now support wide variety of compressed stream input. The syntax parser can be found here
D3 [M3]: Now support both file input and stream input archives. Most stock data imported. Latest version of the importer can be found here
 

Tags:
   

Get Connected