Everything In Between

If your project so much as pretends to have a profit motive, I will tell you to go fuck yourself and your project.

Extract list of all Apple WikiServer wiki titles into CSV format

7 comments

An interesting request came in today from a coworker. She wanted to create a spreadsheet that contained all of our intranet’s wiki pages (which uses the Apple WikiServer), presumably because Apple doesn’t provide an easy way to “list all pages” in the wiki itself. Along with the page title, she also wanted to extract its internal ID, its URL, and the time the page was created as well as the time it was last modified.

I spent about an hour looking into this this afternoon and it turns out that much of this information is readily available on the filesystem in the Apple WikiServer’s data store. I whipped up the following shell script to extract this information in CSV format, exactly as requested.

I’m posting this script here in case someone else wants similar “export a list of WikiServer pages to a comma-separated values (CSV) file” functionality but isn’t sure how to go about getting it. To use this, just edit the line that reads http://my-server.example.com/groups/wiki/ so that it refers to the wiki base URI of your own server.

Update: The latest version of this script is now available at its Github-hosted repository. You should probably use that instead of the script below.

#!/bin/sh -
#
# Script to extract data from an Apple WikiServer's data store by querying the
# filesystem itself. Creates a 'wikipages.csv' file that's readable by any
# spreadsheeting application, such as Numbers.app or Microsoft Excel.app.
#
# USAGE:   To use this script, change to the WikiServer's pages directory, then
#          just run this script. A file named wikipages.csv will be created in
#          your current directory. For instance:
#
#              cd /Library/Collaboration/Groups/mygroup/wiki  # dir to work in
#              wikipages2csv.sh                               # run the script
#              cp wikipages.csv ~/Desktop                     # save output
#
# WARNING: Since the WikiServer's files are only accessible as root, this script
#          must be run as root to function. Additionally, this is not extremely
#          well tested, so use at your own risk.
#
# Author:  Meitar Moscovitz
# Date:    Mon Sep 22 15:03:54 EST 2008

##### CONFIGURE HERE ########

# The prefix to append to generated links. NO SPACES!
WS_URI_PREFIX=http://my-server.example.com/groups/wiki/

##### END CONFIGURATION #####
# DO NOT EDIT PAST THIS LINE
#############################

WS_CSV_OUTFILE=wikipages.csv
WS_PAGE_IDS_FILE=`mktemp ws-ids.tmp.XXXXXX`

function extractPlistValueByKey () {
    head -n \
      $(expr 1 + `grep -n "<key>$1</key>" page.plist | cut -d ':' -f 1`) page.plist | \
        tail -n 1 | cut -d '>' -f 2 | cut -d '<' -f 1
}

function linkifyWikiServerTitle () {
    echo $1 | sed -e 's/ /_/g' -e 's/&amp;/_/g' -e 's/&gt;/_/g' -e 's/&lt;/_/g' -e 's/\?//g'
}

function formatISO8601date () {
    echo $1 | sed -e 's/T/ /' -e 's/Z$//'
}

function csvQuote () {
    echo $1 | grep -q ',' >/dev/null
    if [ $? -eq 0 ]; then
        echo '"'$1'"'
    else
        echo $1
    fi
}

ls -d [^w]*.page | \
  sed -e 's/^\([a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]\)\.page$/\1/' > $WS_PAGE_IDS_FILE

echo "Title,ID,Date Created,Last Modified,URI" > $WS_CSV_OUTFILE
while read id; do
    cd $id.page
    title=$(extractPlistValueByKey title)
    created_date="$(formatISO8601date $(extractPlistValueByKey createdDate))"
    modified_date="$(formatISO8601date $(extractPlistValueByKey modifiedDate))"
    link=$WS_URI_PREFIX"$id"/`linkifyWikiServerTitle "$title"`.html
    cd ..
    echo `csvQuote "$title"`,$id,$created_date,$modified_date,`csvQuote "$link"` >> $WS_CSV_OUTFILE
done < $WS_PAGE_IDS_FILE
rm $WS_PAGE_IDS_FILE

For those new to the Wiki Server, this introduction to the Apple WikiServer for web developers may be of interest.

Written by Meitar

September 22nd, 2008 at 12:35 am

7 Responses to 'Extract list of all Apple WikiServer wiki titles into CSV format'

Subscribe to comments with RSS or TrackBack to 'Extract list of all Apple WikiServer wiki titles into CSV format'.

  1. Thanks Meitar, also for your other applewiki analysis. I have to convert an Apple wiki to another format and your work has been of great help.

    Axel Roest

    4 Mar 09 at 7:39 AM

  2. Amazing how relevant this is still today… I’m looking to convert AppleWiki to MediaWiki (tired of holding my breathe every time I restart the server).
    The previous comment mentioned “other applewiki analysis”; I’d love to tap into some of that knowledge if possible.
    Do you have posted articles besides this one?
    Thanks!

    Neil R

    8 Mar 12 at 8:53 PM

  3. When I run your script, I get this error

    ls: [^w]*.page: No such file or directory

    I am running Apple Wiki 10.6.8 any help or guidance is welcomed!

    Jon Brown

    10 May 12 at 1:52 AM

  4. Nevermind it worked, I did not put it in the wiki folder of the group, duh. I was hoping it would export the actual content though, any way to get the content of the wiki?

    Jon Brown

    10 May 12 at 1:58 AM

  5. I figured it out, I added a script to your script and I added a line and an entry in the CSV for content. Its not perfect but it is better than going through each page manually to get the content out.

    https://github.com/jonbrown21/OSX-Wiki-2-CSV

    my run.sh script loops through all of the groups and user folders and extracts the content from all the page.html files.

    Jon Brown

    12 May 12 at 8:22 AM

  6. Thank you very much. you make my day. I have to export the structure of our bear-sized wiki: done in no time with your script.

    Decio

    22 Jun 12 at 1:21 AM

  7. For your information, I wrote a little more thorough and powerful tool based on yours.
    You can have a look to https://github.com/yvangodard/OSX-Wiki-2-CSV

    Yvan GODARD

    7 Sep 14 at 3:13 AM

Leave a Reply