From Geeky

HowTo: Make an archival copy of every page, image, video, and audio file on an entire website using wget

I recently announced that my blog archives will no longer be publicly available for long:

Let me repeat that: while I am still “on Tumblr” and so on for now, my archives will not remain available for very long. If you find something of mine useful, you will need to make a copy of it and host it yourself.

[…]

The errors you see when you just punch in my web address in your browser or follow a link from Google are not happening because my blogs “broke.” The errors are intentional; my blogs have simply become invisible to some while still being easily accessible to others. […] Think of my web presence like Harry Potter’s Diagon Alley; so hidden from Muggles that they don’t even know what they’re missing, but if you know which brick to tap, a whole world of exciting new things awaits you….

As a result, a number of you have already asked the logical question: “Is there some easy way to automatically download your archives, instead of manually copy-and-pasting almost a decade of your posts? That would take forever!”

The answer, of course, is yes. This post is a short tutorial that I hope gives you the knowledge you need to download an entire website for offline viewing. This will work for any simple website like most blogs and personal sites, including mine. Archival geeks, this one’s for you. ;)

Preparation

A sculptor must understand stone: Know thy materials

A website is just a bunch of files. On a server, it usually looks exactly like your own computer’s desktop. A page is a file. A slash (/) indicates a folder.

Let’s say you have a website called “my-blog.com.” When you go to this website in a Web browser, the address bar says: http://my-blog.com/ What that address bar is saying, in oversimplified English, is something like, “Hey, Web browser, connect to the computer at my-blog.com and open the first file in the first folder you find for me.” That file is usually the home page. On a blog, this is usually the list of recent posts.

Then, to continue the example, let’s say you click on a blog post’s title, which is a link to a page that only contains that one blog post. This is often called a “permalink.” When the page loads, the address bar changes to something like http://my-blog.com/posts/123456. Again, in oversimplified English, what the address bar is saying is something like, “Hey, Web browser, make another connection to the computer at my-blog.com and open up the file called 123456 inside that computer’s posts folder.”

And that’s how Web browsing works, in a nutshell. Since websites are just files inside folders, the same basic rules apply to webpages as the ones that apply to files and folders on your own laptop. To save a file, you give it a name, and put it a folder. When you move a file from one folder to another, it stops being available at the old location and becomes available at the new location. You can copy a file from one folder as a new file in another folder, and now you have two copies of that file.

In the case of the web, a “file” is just a “page,” so “copying webpages” is the exact same thing as “copying files.”

Now, as many of you already surmised, you could manually go to a website, open the File menu in your Web browser, choose the Save option, give the file a name, put it in a folder, then click the link to the first entry on the web page to load that post, open the File menu in your Web browser, choose the Save option, give the file another name, put it in a folder, and so on and so on until your eyes bled and you went insane from treating yourself in the same dehumanizing way your bosses already treat you at work. Or you could realize that doing the same basic operation many times in quick succession is what computers were invented to do, and you could automate the process of downloading websites like this by using a software program (a tool) designed to do exactly that.

It just so happens that this kind of task is so common that there are dozens of software programs that do exactly this thing.

A sculptor must understand a chisel: Know thy toolbox

I’m not going to go through the many dozens if not hundreds of tools available to automatically download things from the Web. There is almost certainly an “auto-downloader” plugin available for your favorite Web browser. Feel free to find one and give it a try. Instead, I’m going to walk you through how to use simply the best, most efficient, and most powerful of these tools. It’s called wget. It stands for “Web get” and, as the name implies, it “gets stuff from the Web.”

If you’re on Windows, the easiest way to use wget is by using a program called WinWGet, which is actually two programs: it’s the wget program itself, and a point-and-click graphical user interface that gives you a way to use it with your mouse instead of only your keyboard. There’s a good article on Lifehacker about how to use WinWGet to copy an entire website (an act commonly called “mirroring”). If you’re intimidated by a command line, go get WinWGet, because the wget program itself doesn’t have a point-and-click user interface so you’ll want the extra window dressing WinWGet provides.

If you’re not on Windows, or if you just want to learn how to use wget to copy a website directly, then read on. You may also want to read on to learn more about the relevant options you can enable in wget so it works even under the most hostile conditions (like a flaky Wi-Fi connection).

Relevant wget options

While there are dozens upon dozens of wget options to the point that I know of no one who has read the entire wget manual from front to back, there are only three options that really matter for our purposes. These are:

-m or --mirror
This options turns on options suitable for mirroring. In other words, with this option enabled, wget will look at the URL you gave it, and then copy the page at that URL and all pages that first page links to which also start with the same URL as the URL of the first page until there are no more links to follow. How handy! ;)
-k or --convert-links
The manual describes this option better than I could. It reads:

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

So in other words, after the download finishes, all links that originally pointed to “the computer at my-blog.com” will now point to the archived copy of the file wget downloaded for you, so you can click links in your archived copy and they will work just as they did on the original site. Woot!

--retry-connrefused
This option isn’t strictly necessary, but if you’re on a flaky Wi-Fi network or the server hosting the website you’re trying to download is itself kind of flaky (that is, maybe it goes down every once in a while and you don’t always know when that will be), then adding this option makes wget keep trying to download the pages you’ve told it are there even if it’s not able to make a connection to the website. Basically, this option makes wget totally trust you when you tell it to go download some stuff, even if it tries to do that and isn’t able to get it when it tries to. I strongly suggest using this option to get archives of my sites.

Okay, with that necessary background explained, let’s move on to actually using wget to copy whole websites.

Preparation: Get wget if you don’t already have it

If you don’t already have wget, download and install it. For Mac OS X users, the simplest wget installation option are the installer packages made available by the folks at Rudix. For Windows users, again, you probably want WinWGet. Linux users probably already have wget installed. ;)

Step 1: Make a new folder to keep all the stuff you’re about to download

This is easy. Just make a new folder to keep all the pages you’re going to copy. Yup, that’s it. :)

Step 2: Run wget with its mirroring options enabled

Now that we have a place to keep all the stuff we’re about to download, we need to let wget do its work for us. So, first, go to the folder you made. If you’ve made a folder called “Mirror of my-blog.com” on your Desktop, then you can go into that folder by typing cd "~/Desktop/Mirror of my-blog.com" at a command prompt.

Next, run wget:

wget --mirror --convert-links --retry-connrefused http://my-blog.com/

Windows users will have to dig around the WinWGet options panes and make sure the “mirror” and “convert-links” checkboxes are enabled, rather than just typing those options out on the command line. Obviously, replace http://my-blog.com/ with whatever website you want to copy. For instance, replace it with http://days.maybemaimed.com/ to download everything I’ve ever posted to my Tumblr blog. You’ll immediately see a lot of output from your terminal that looks like this:

wget --mirror --convert-links --retry-connrefused http://days.maybemaimed.com/

--2015-02-27 15:08:06--  http://days.maybemaimed.com/
Resolving days.maybemaimed.com (days.maybemaimed.com)... 66.6.42.22, 66.6.43.22
Connecting to days.maybemaimed.com (days.maybemaimed.com)|66.6.42.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘days.maybemaimed.com/index.html’

    [ <=>                                                       ] 188,514     --.-K/s   in 0.1s    

Last-modified header missing -- time-stamps turned off.
2015-02-27 15:08:08 (1.47 MB/s) - ‘days.maybemaimed.com/index.html’ saved [188514]

Now just sit back, relax, let wget work for as long as it needs to (which could take hours, depending on the quality of your Internet connection). Meanwhile, rejoice in the knowledge that you never need to treat yourself like a piece of dehumanized machinery ever again because, y’know, we actually have machines for that.

Even before wget finishes its work, though, you’ll see files start appearing inside the folder you made. You can now drag-and-drop one of those files into your Web browser window to open that file. It will look exactly like the blog web page from which it was downloaded. Voila! Archive successfully made!

Special secret bonuses

The above easily works on any publicly accessible website. These are websites that you don’t need to log into to see. But you can also do the same thing on websites that do require you to log into them, though I’ll leave that as an exercise for the reader. All you have to do is learn a few different wget options, which are all explained in the wget manual. (Hint: The option you want to read up on is the --load-cookies option.)

What I do want to explain, however, is that the above procedure won’t currently work on some of my other blogs because of additional techno-trickery I’m doing to keep the Muggles out, as I mentioned at the start of this post. However, I’ve already created an archive copy of my other (non-Tumblr) sites, so you don’t have to.1 Still, though, if you can figure out which bricks to tap, you can still create your own archive of my proverbial Diagon Alley.

Anyway, I’m making that other archive available on BitTorrent. Here’s the torrent metafile for an archive of maybemaimed.com. If you don’t already know how to use BitTorrent, this might be a good time to read through my BitTorrent howto guide.

Finally, if data archival and preservation is something that really spins your propeller and you don’t already know about it, consider browsing on over to The Internet Archive at Archive.org. If you live in San Francisco, they offer free lunches to the public every Friday (which are FUCKING CATERED AND DELICIOUS, I’VE BEEN), and they always have need of volunteers.

  1. If you’re just curious, the archive contains every conference presentation I’ve ever given, including video recordings, presentation slides, and so on, as well as audio files of some podcasts and interviews I’ve given, transcripts of every one of these, all pictures uploaded to my site, etc., and weighs in at approximately 1 gigabyte, uncompressed. []

The mystery of the disappearing horizontal scrollbar

A classic exchange from the WordPress Support Forum for one of my plugins:

Them:

Hi,

When I first installed this plugin, there was an automatic horizontal scrollbar so that users could move to see all of the columns. However, it has now disappeared which means one of the columns is not fully readable.

Can you help?

Thanks.

Me:

Right above the button you clicked to post this question there is a line of text that reads:

Did you include a link to your site, so that others can see the problem?

Given that you didn’t notice this, I am going to suggest that you slow down and think about what was different on your site from when you installed the plugin (and experienced it working as expected) and now (when it’s not). If you still need help after that, I suggest you first think more about the answer to the question quoted above before you post again.

Them:

I apologise, it was an oversight on my behalf, as you have pointed out. Put it down to Friday ‘end of the week’ fuzzy head, if you like.

The pages where we are currently using the plugin are [here and here].

To clarify, if you hover around the rows and columns, it appears you can swipe and move it around, but the visible arrows and scrollbar is not visible. We have a lot of not very IT-literate people who use our website for support so it would be handy to make it like a visible scrollbar to click on again, if possible.

Thanks for your patience.

Me:

A clarifying question: you want a scrollbar to appear but one does not exist even when the browser window is too narrow to fit the whole table?

That is to say, I am confused by the statement “the visible arrows and scrollbar is not visible.” :/

Them:

Yes, we want a scrollbar and the arrows that go either end of said scrollbar to display because the browser window is too narrow to fit the whole table.

We feel that people would prefer to have the arrows on the scrollbar visible to encourage them to click them so that they can see the columns that go off of the screen.

Me:

So, when I go to your pages they both show scroll bars just as you say you want them. :\ I’m afraid whatever you’re seeing is specific to your combination of computer and browser.

It’s very likely that users who browse to your site are seeing the scroll bars show automatically. At least, that’s what happens when I load the site.

The beauty of the Web is that users are able to define their experience so it suits them best. Users whose browsers and operating systems are set to show scroll bars are showing scroll bars. This is good news, because it means you do not need to worry about your site malfunctioning: it is not malfunctioning and, as you say, nothing about it has changed.

Turn your Android phone into a full fledged programming environment

These days, mobile phones are basically computers. And not just any computer. If you have a smartphone, then it's the same kind of computer as a regular ol' laptop. Sure, the two look different, but once you get "under the hood" they look and feel remarkably similar.

My mission, which I chose to accept, was to see if I could turn my Android phone into a fully fledged web development console. Lo and behold, I could. And it's not even that hard, but I did have to do some digging.

That's because searching the 'net for phrases like "web development on Android" mostly returns information on how to code and debug websites for mobile browsers, rather than how to use mobile phones as your environment for developing websites. Once I figured out which tools were suited for the task (and my personal tastes), though, everything else fell into place.

Read the full post.

Read more

Easy template injection in JavaScript for userscript authors, plugin devs, and other people who want to fuck with Web page content

The Predator Alert Tool for Twitter is coming along nicely, but it's frustratingly slow going. It's extra frustrating for me because ever since telling corporate America and its project managers to go kill themselves, I've grown accustomed to an utterly absurd speed of project development. I know I've only been writing and rewriting code for just under two weeks (I think—I honestly don't even know or care what day it is), but still.

I think it also feels extra slow is because I'm learning a lot of new stuff along the way. That in and of itself is great, but I'm notoriously impatient. Which I totally consider a virtue because fuck waiting. Still, all this relearning and slow going has given me the opportunity to refine a few techniques I used in previous Predator Alert Tool scripts.

Here's one technique I think is especially nifty: template injection.

Read more

On Being a Social Cyborg: How iCalendar Helps Me Fight Loneliness

Here’s a topic I’ve been meaning to write about ever since I was deeply depressed last Fall and Winter. Back then, I was incredibly lonely, and despite my best efforts I simply found it damn near impossible to do anything to improve my situation. That’s because my “best efforts” consistently lead me to dead-end resources that sounded good but that had no practical or immediately useful information; resources like WikiHow.com’s “How to Deal With Loneliness” article.

In their article, WikiHow contributors write:

Get involved in anything where you will meet people. If you are very shy, find a group for social anxiety, even if it has to be online (obviously it’s better if it’s not). Look on places like Craig’s List, or local news websites for your town for activities in your area. Volunteering can help. But don’t attend functions with the idea of making friends or meeting people. Being too demanding is a sign of loneliness. Try to go with no expectations whatsoever, and to enjoy yourself regardless of what happens. Look for activities that interest you, that also involve groups of people, like intramural sports, book clubs, church groups, political campaigns, concerts, art exhibitions, etc.

While it all “makes sense,” the WikiHow article reads like an elaborate horoscope. It’s incredibly annoying because it contains no meaningful, discrete, actionable items. Where, exactly, can I find “activities in my area”? And once I find them, how do I make sure I know about them when they are happening? And as if that wasn’t hard enough, how do I make the process workable under the extreme energy constraints that being depressed and lonely put me under? (See also: without using up too many “spoons”.)

Ironically, when I finally concocted a solution to this problem, I no longer had the time to write the blog post about solving the problem because I was so busy doing things and being social. I proceeded to pull myself out of my depression, have a pretty good (if still difficult at times) Spring and Summer, and even Fall in 2011. But now that the days are getting shorter and I’m increasingly feeling like my moods are walking on a tightrope of “happy” above a pit of bleakness, I figure it’s about time to document my process. That, and it seems people I know are running into the same problem, so hopefully sharing my own solution can really make a positive impact on others’ lives.

Creating a Cyborg’s Social Calendar

The basic problem was two-fold. First, I needed an easy way to discover local goings-on. Second, I needed a way to remember to actually attend events that I was interested in.

It turns out this is far more difficult to accomplish than one may at first believe since the set of events that I both want to attend and have the capability (energy, time, money, motivation, physical accessibility, etc.) to attend are actually relatively limited. Moreover, I also need to align the set of events that match both of those criteria with the knowledge that said event is occurring when it is occurring. It’s a bit like playing temporal Tetris.

In a nutshell, the solution I implemented was similarly two-fold. First, I cast an incredibly wide but low-cost sensor net, integrated directly into the process I already used for keeping track of my daily appointments. (See also the “no extra time” concept and its wide applicability). Second, I classified the “activities in my area” into two distinct groups: “engagements” (stuff I’ve said “yes” or “maybe” to) and “opportunities” (stuff I haven’t yet said “no” to).

Here’s what my calendar looks like after all the pieces of the system are in place:

As you can see, I have an enormous selection of activities I could participate in at any given time. Better yet, they all show up on my calendar without my ever needing to repeatedly go “look[ing] on places like Craig’s List” to find them, the events on my calendar update themselves, and I can show or hide sets of events on a whim.

The prerequisite tool for doing this is the iCalendar feed, which, in the words of Stanford University, is a popular calendar data exchange format which allows you to subscribe to a calendar and receive updates as calendar data changes. Each of those calendars under the “Subscriptions” heading in the screenshot of my iCal is actually an iCalendar feed from a remote website. iCalendar feeds are to calendars as RSS feeds are to blogs.

The first thing I did was add the event subscription feed from my Facebook. Do this:

  1. Log into your Facebook account and go to the “Events” page.
  2. Scroll to the very bottom of the page and click on the small “Export” link. This will reveal a personalized web address (URL) listing all upcoming Facebook events you’ve been invited to or have RSVP’ed either “Yes” or “Maybe” to, in iCalendar feed (.ics) format. Copy that URL.
  3. Back in iCal (or your calendaring application of choice), choose “Subscribe…” from the menu and paste in the URL you got from Facebook.
  4. Give this calendar subscription a meaningful name. I called it “Facebook Events” (see above screenshot).
  5. Set the “Refresh” interval to something that makes sense; I set it to once “every 15 minutes,” since the Facebook feed is one I check often because it changes so frequently. (For feeds from calendars that I check or that update less often, such as those of community groups, or calendars listing events that are far from home, I set the refresh rate much, much slower, such as once “every week.”)

Okay! Now, whenever a friend invites you to an event on Facebook, your calendar will be updated to reflect that event at the appropriate date and time. If you RSVP “No” to the event, it will disappear from your calendar when iCal next checks your Facebook iCalendar feed.

Repeat the same steps for any other event-management website that you use and that offers iCalendar feeds. Some services I use, such as Plancast.com and Meetup.com, actually offer two distinct iCalendar feeds, one for all of the events visible to you on the service, and one for events that you have RSVP’ed “Yes” to. Subscribe to both; in the screenshot of my iCal window, above, you’ll note the existence of a “‘meitar’ on Plancast” calendar as well as a “Plancast Subscriptions” calendar, and similarly a “My ‘Yes/Maybe’ Meetups” calendar as well as a “My Meetups” calendar.

Now that you’ve got a bunch of subscriptions, it behooves you to organize them in a way that makes sense to you. How you can do this will depend a little bit on the tools you have at your disposal. I found Apple iCal the best choice because of its Calendar Group feature, while I found Google Calendar an incredibly frustrating tool to use.

In iCal, I first created two calendar groups. The first one was called “Social Engagements,” into which I placed all the iCalendar feeds that showed me events to which I’ve RSVP’ed “Yes” to on the remote site. This included the Facebook, “‘meitar’ on Plancast”, and “My ‘Yes/Maybe’ Meetups” feed. The second group was called “Social Opportunities,” into which I placed all the other calendars.

Every time I learned about a new local venue, such as a nightclub, or a café, or a bookstore that had an open mic, I would scour its website to see if it offered an iCalendar feed. If it did, or if it used a tool that did, such as embedding a Google Calendar on their website,1 I’d add their feed to my “Social Opportunities” calendar group, too. I’d do the same every time I learned of a new event aggregating website, such as the IndyBay.org calendar or the Calagator Portland Tech Community calendar, which both offer feeds.

In very short order, I became one of the go-to people to ask about what was happening ’round town—including some towns I didn’t even live in!

However, as I travelled across the country speaking at conferences, I realized that my “Social Opportunities” group was getting cluttered with events that I could not actually attend because I was literally thousands of miles away from them. To solve that problem, I created distinct “Social Opportunities” calendar groups based on geographic region, and moved the individual subscriptions to the group with which they were geographically associated; the Occupy DC calendar feed is in the “Social Opportunities – DC” calendar group, and so on. I also created an “A-geographic” group to house feeds that listed events from all over the place.2

Some event-management services let you filter by geography, making this even easier. For instance, Yahoo!’s “Upcoming” event listing website shows you events by “place,” and you can subscribe to an iCalendar feed of just those events. For instance, here are the Upcoming events in Seattle, and here is the same information in iCalendar feed format. I added the feed of each Upcoming Place to which I regularly travel to its appropriate regional calendar group.

The benefits of this set up are obvious:

  • Visually overlay social opportunities on top of social engagements to ensure few conflicts, and help make the most informed choice about which events I want to go to when there are conflicts, to mitigate my social opportunity cost.
  • Toggle calendars on/off to find nearby activities. Ordinarily, I simply leave all the “opportunities” calendars deselected, so I’m just looking at my personal calendars and the “Engagements” group, since this view shows me “stuff I have to do today.” When I’m bored or I’m looking for new things to do in the upcoming week, however, I simply turn on the “opportunities” calendars. Voila! In 1 click, I’m browsing a wealth of stuff to do!3
  • Quickly orient oneself within the social space of a new city. If I’m taking a trip to Washington DC for a few days, all I have to do is deselect/uncheck the “Social Opportunities – SF/Bay Area” calendar group to hide all of my calendar subscriptions in that group, then select/check the “Social Opportunities – DC” calendar group and, voila, my calendar view has instantly shifted to showing me events that I can attend in Washington, DC.
  • Make RSVP’s meaningful: if I RSVP “Yes” to an event on Meetup, the event is automatically removed from my “Social Opportunities – A-geographic” calendar group and added to my “Social Engagements” calendar group.
  • Easily move event information from a calendar feed to a personal calendar using copy-and-paste without ever leaving the calendaring tool of your choice.

Of course, none of this matters with regards to feeling lonely if I don’t also show up at events in physical space. Admittedly, actually mustering the physical and social energy to get up and go is by far the hardest part of this whole process. Typing on a keyboard is all fine and well (rest assured I do more than enough of it!), but there is no substitute for actually being around other human beings face-to-face. Physically vibrating the air using one’s mouth and having those vibrations move another’s ear drum (or physically moving one’s hands and letting the photons bounce off those movements and onto the retina of another’s eyes, in the case of sign language) is a vital part of the experience of being social.

This system isn’t perfect, but the imperfections are mostly due to the way sites like Facebook handle RSVP information. For my purposes, though, this workflow gets me well over 80% of the way towards my goal, and since I’m actually a human (not a machine), I can deal with a little data pollution here and there. There’s also plenty more I could write about with regards to “being a social cyborg,” such as how I use my calendar in conjunction with my contact management application (my digital rolodex) to maintain “loose” or “weak” interpersonal ties with over 1,000 people spread across the world—again, using “no extra time.” But I’ll save that for another post.

For now, hopefully this gave you a better understanding why my most frequent response to being informed of a party is something along the lines of, “Can you send me a link (to Facebook/Meetup/Google Calendar)?” and also why I’m so, so, so critical of important websites like FetLife that seem to prioritize everything but user security and interoperability.

  1. Every public Google Calendar also publishes its information in an iCalendar feed. For example, rather than view the Occupy SF calendar on their website, just subscribe to the iCalendar feed provided by Google. Also, while you can create an aggregate view of multiple Google Calendars to embed on a Web page, it seems to me like this isn’t a feature offered for iCalendar feeds, so if you come across such a calendar, you’ll likely need to add the individual calendars’ feeds one by one. []
  2. Currently that’s just Meetup and Plancast, for me, since I’ve joined Meetup groups all over the country and I’ve subscribed to people on Plancast who live in dozens of cities. []
  3. Frustratingly, although Facebook also offers you a page listing events that you were not invited to but that your friends were, there seems to be no iCalendar feed of that list, forcing me to periodically check that page for events that would be “Social Opportunities” if I knew of them. Thankfully, to add them to my own calendar, I just RSVP “Yes” or (more likely) “Maybe.” []

Using Calendars from the Command Line

If you’re anything like me, you always have a terminal window open. One of the reasons I do this, of course, is because it’s fast. If I want to know anything at all about my computer, all I need do is type the question. The answer, because it’s always text-based, comes back immediately. I don’t have to wait for a window to open or for a pane to scroll. Everything comes at me from a single visual direction, the bottom of my terminal window.

However, there are some occasions when a text-based response to a complicated question isn’t very helpful because it requires so much extra work to understand. For me, the most common example of this sort of issue has always been in looking at time-based information, and more specifically, calendars. Whenever I’m on my machine, I almost always need to look at a calendar.

In the past, I used to go all the way over to iCal. Sure, I can do this using keyboard shortcuts only, but sometimes all I want is a quick answer to “what date is this upcoming Friday?” In situations like that, I’ve lately begun using the cal command, and my oh my, what a timesaver.

cal is kind of like man for dates. Of course, you can get more info by saying man cal to your prompt. The cal program, installed by default on almost all UNIX-based systems (including Mac OS X), has a ton of useful options. However, most of the time, I don’t need more than a few.

For instance, let’s say I just want a calendar of the current month. I can get get a compact, simple month view instead of going to iCal by saying just cal at the command line:

Perseus:~ maymay$ cal
     April 2008
Su Mo Tu We Th Fr Sa
       1  2  3  4  5
 6  7  8  9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30

Other options let me ask other questions of cal. Easy, simple, fast. I like it.

Sharing your Windows XP Virtual Machine’s Internet connection with your Mac OS X host operating system using VMware Fusion

In some situations, like the odd one I now find myself in, the only way to get Internet connectivity is to use a solution that requires a fair bit of maneuvering. In my situation, I have temporarily obtained a Vodafone 3G mobile card. Unfortunately, the Vodafone Mobile Connect software for Mac OS X as of this writing is obscenely poor. Of course, Vodafone’s software for Windows works without a hitch.

The only way I could get my Vodafone 3G card to work was to fire up a Windows XP guest inside of my MacBook Pro, using VMware Fusion. Connecting to the Internet with the 3G card using the Windows guest was smooth sailing, but that only provided the Internet connection to the Windows virtual machine. I wanted my Mac to be directly connected.

The solution is obvious, but a few gotchas really bit me hard. To get the Windows guest to share its Internet connection from the 3G card to my Mac, I would need to bridge VMware’s virtual ethernet adapter from the Windows guest to the Mac OS X host. Once bridged, both the Windows guest and the Mac OS X host would logically be on the same ethernet network segment. At this point, I can enable Windows XP’s built-in Internet Connection Sharing (stupidly dubbed “ICS” because everything needs a TLA) on the 3G connection so that Windows NATs it through to the bridged virtual ethernet card. Finally, I can connect to Vodafone’s 3G network, and all should be well.

Here’s the gotchas.

First, in order for VMware to actually initiate the network bridge when it starts up, it must detect that a physical link is active on your Mac. In other words, Mac OS X’s Network System Preferences pane must show you a yellow dot next to at least one physical networking device (probably either your “Built-in Ethernet” or your “AirPort” ports). VMware Fusion will give you no errors or warnings that a bridge is unavailable until you try to connect your virtual machine’s network while set to bridge, in which case VMware Fusion will complain with an error that reads: “The device on /dev/vmnet0 is not running.”

Obviously, if you have no other devices to connect to, you need to fake one. The easiest way to do this is to set up a Computer-to-Computer network using AirPort. Just go to your AirPort menu bar item and select “Create Network…” and create the network (preferably encrypted). If you check System Preferences now, you should see a that AirPort has a yellow dot next to it and reads as having a “Self-Assigned IP Address.” Now that you have a physical link on your AirPort card, you should be able to start the VMware Fusion virtual machine with bridged networking mode without incident.

However, if you do encounter the above error anyway, you need to restart the VMware network bridge. You can do this either by shutting down VMware completely (turn off your guest operating systems, and quit the VMware Fusion application), or you can run the following commands as an administrator in Terminal, which will stop any bridge currently running (or do nothing if no bridge is running) and then restart it, providing the output as shown:

sudo killall vmnet-bridge
sudo "/Library/Application Support/VMware Fusion/vmnet-bridge" -D vmnet0 ''
Entering event loop...
Examining network configuration...
Turning on bridge with host network interface en1...

Obviously, you may be asked for your password as you perform this procedure. Note that the trailing two apostrophes are single quotes with no space. This is (almost) how the VMware Fusion boot.sh script starts and stops the network bridge. Specifically, you’re telling the vmnet-bridge application to run in Debug mode and to bridge vmnet0 to whatever is the current primary networking interface. In the example output shown above, this is en1, or my AirPort card connected to the computer-to-computer network I created in the previous step.

Hopefully you won’t have to mess with the vmnet-bridge application, as this should happen on its own when you start up VMware Fusion if you have any physical link on a network device. Nevertheless, I’ve found this is sometimes unreliable, so just in case it doesn’t now you know how to bring up the bridge on your own. (Tip: once it’s up, you can CTRL-Z to pause it, re-start it with fg %1 and then quit Terminal if you like. The bridge will still be up.)

Now that the AirPort card has a physical link, and the VMware network bridge is running, the next step is to configure your virtual machine to use bridged networking. Just go to Virtual Machine → Network → Bridged as normal. Make sure Connected is also selected. Now start up your Windows guest.

Once Windows boots, go to the Network Connections window by selecting Start → Connections → Show all connections. At this point, your “Local Area Connection” in Windows probably has a warning sign on it and reads as having “Little or no connectivity.” It probably has a self-assigned IP address just like your AirPort card. That’s fine—as long as it’s not “unplugged,” we’re in good shape.

Next, select whatever other connection you want to share the Internet from (in my case, the 3G modem, but it could also just be any other connection in the window), right-click it and select Properties. Go to the Advanced tab and make sure “Allow other network users to connect through this computer’s Internet connection” is checked. The other boxes won’t matter.

What this does is turns on Windows’ own NAT service that configures the one connection (the one your sharing) as the WAN side of (yet another) virtual networking device and the Local Area Connection (the one we’ve bridged to our AirPort or Built-in Ethernet card on our Mac) as the LAN side. Hit OK as many times as is necessary to close the network connection properties windows and wait a few moments. Sometimes this can take up to 30 seconds or so, but eventually you’ll see Windows announce that “Local Area Connection is now connected.” If you inspect it, you’ll see that the IP address configuration has been automatically assigned as a “Manual Configuration” with the address of 192.168.0.1, a subnet mask of 255.255.255.0, and no default gateway.

As a last step, now we can actually connect to the Internet using whatever service we have. In my case, this is when I hit the “connect” button on my Vodafone Mobile Connect software. Once the connection is established and the Windows XP virtual machine can see Internet, it takes up to another minute or two (or three) for the Mac’s connection to get an IP address from the Windows guest, but it invariably works.

If the Windows side of things is giving you any trouble, the most reliable solution I’ve found is to simply disable, then re-enable whatever connection isn’t behaving as desired. If after all of this your Mac still doesn’t get an IP address from the Windows XP guest, disconnect and then re-connect the virtual machine’s ethernet card (by toggling the “Connected” menu item in the Virtual Machine → Network menu). Also, of course, be doubly sure that your AirPort is set to “Use DHCP.”

Phew! So simple…and yet so much harder than it had to be. I found the following two PDF documents very helpful in understanding all of this. You might too:

  1. VMware Fusion Network Settings — a super-brief, but excellent introduction to VMware’s network setting internals. It’s also a PDF download attached to the linked forum thread.
  2. Share Windows XP Guest Internet Connection with OS X Host HOWTO — This basically describes the same thing this post does, but it does so using absolute step-by-step instructions. It’s also a PDF download attached to the linked forum thread.

Steven Pinker’s ‘The Stuff of Thought’

This video, which is one of the recent TED Talk videos, is of Steven Pinker’s talk called The Stuff of Thought. This is simply brilliant. So brilliant, in fact, that those who know me well are about to be utterly astounded by what I am going to say:

I now understand the value of indirect communication. And it is immense.

I also understand why I never saw it before: the benefits are reaped solely through language’s social applications, not its analytical ones. See for yourself by watching the video.

An incredible interview with this Harvard professor is available on Google Video.

We should re-instate that old USENET warning

From the everything-you-say-can-and-will-be-used-against-you department:

I’ve been doing this for years, and my solution is pretty simple: no regrets.

As an aside, these days when you punch in “privacy concern” into Googlepedia, you get the Wikipedia entry for Facebook. I was kind of expecting the entry for “US Government,” but whatever.