Everything In Between

If your project so much as pretends to have a profit motive, I will tell you to go fuck yourself and your project.

seed this page on BitTorrent

Extract list of all Apple WikiServer wiki titles into CSV format

7 comments

An interesting request came in today from a coworker. She wanted to create a spreadsheet that contained all of our intranet’s wiki pages (which uses the Apple WikiServer), presumably because Apple doesn’t provide an easy way to “list all pages” in the wiki itself. Along with the page title, she also wanted to extract its internal ID, its URL, and the time the page was created as well as the time it was last modified.

I spent about an hour looking into this this afternoon and it turns out that much of this information is readily available on the filesystem in the Apple WikiServer’s data store. I whipped up the following shell script to extract this information in CSV format, exactly as requested.

I’m posting this script here in case someone else wants similar “export a list of WikiServer pages to a comma-separated values (CSV) file” functionality but isn’t sure how to go about getting it. To use this, just edit the line that reads http://my-server.example.com/groups/wiki/ so that it refers to the wiki base URI of your own server.

Update: The latest version of this script is now available at its Github-hosted repository. You should probably use that instead of the script below.

#!/bin/sh -
#
# Script to extract data from an Apple WikiServer's data store by querying the
# filesystem itself. Creates a 'wikipages.csv' file that's readable by any
# spreadsheeting application, such as Numbers.app or Microsoft Excel.app.
#
# USAGE:   To use this script, change to the WikiServer's pages directory, then
#          just run this script. A file named wikipages.csv will be created in
#          your current directory. For instance:
#
#              cd /Library/Collaboration/Groups/mygroup/wiki  # dir to work in
#              wikipages2csv.sh                               # run the script
#              cp wikipages.csv ~/Desktop                     # save output
#
# WARNING: Since the WikiServer's files are only accessible as root, this script
#          must be run as root to function. Additionally, this is not extremely
#          well tested, so use at your own risk.
#
# Author:  Meitar Moscovitz
# Date:    Mon Sep 22 15:03:54 EST 2008

##### CONFIGURE HERE ########

# The prefix to append to generated links. NO SPACES!
WS_URI_PREFIX=http://my-server.example.com/groups/wiki/

##### END CONFIGURATION #####
# DO NOT EDIT PAST THIS LINE
#############################

WS_CSV_OUTFILE=wikipages.csv
WS_PAGE_IDS_FILE=`mktemp ws-ids.tmp.XXXXXX`

function extractPlistValueByKey () {
    head -n \
      $(expr 1 + `grep -n "<key>$1</key>" page.plist | cut -d ':' -f 1`) page.plist | \
        tail -n 1 | cut -d '>' -f 2 | cut -d '<' -f 1
}

function linkifyWikiServerTitle () {
    echo $1 | sed -e 's/ /_/g' -e 's/&amp;/_/g' -e 's/&gt;/_/g' -e 's/&lt;/_/g' -e 's/\?//g'
}

function formatISO8601date () {
    echo $1 | sed -e 's/T/ /' -e 's/Z$//'
}

function csvQuote () {
    echo $1 | grep -q ',' >/dev/null
    if [ $? -eq 0 ]; then
        echo '"'$1'"'
    else
        echo $1
    fi
}

ls -d [^w]*.page | \
  sed -e 's/^\([a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]\)\.page$/\1/' > $WS_PAGE_IDS_FILE

echo "Title,ID,Date Created,Last Modified,URI" > $WS_CSV_OUTFILE
while read id; do
    cd $id.page
    title=$(extractPlistValueByKey title)
    created_date="$(formatISO8601date $(extractPlistValueByKey createdDate))"
    modified_date="$(formatISO8601date $(extractPlistValueByKey modifiedDate))"
    link=$WS_URI_PREFIX"$id"/`linkifyWikiServerTitle "$title"`.html
    cd ..
    echo `csvQuote "$title"`,$id,$created_date,$modified_date,`csvQuote "$link"` >> $WS_CSV_OUTFILE
done < $WS_PAGE_IDS_FILE
rm $WS_PAGE_IDS_FILE

For those new to the Wiki Server, this introduction to the Apple WikiServer for web developers may be of interest.

Written by Meitar

September 22nd, 2008 at 12:35 am