Importing HTML Files

I created the HTML Import plugin because the most common scenario, both in my day job and my freelance work, is moving a site from Dreamweaver templates into WordPress. I got very tired of copying and pasting!

The plugin works by reading in HTML as XML and copying the specified tags' contents into various WordPress fields. It therefore works best on well-formed HTML. Your files don't necessarily have to validate according to the W3C specification, but they should at least contain tags that are properly nested. They should also reside on the same server as your WordPress installation.

To begin, download the plugin from the repository at and activate it. You'll find the import options page under the Settings menu. The first thing you'll be asked to fill in is the path to the directory of files you want to import. Find the absolute path—not a site- or file-relative one—to this directory. On a Windows machine, that path will begin with a drive letter (e.g. C:\sites\import). On a UNIX-based server (including Macs), the path will begin with a slash (e.g. /users/username/home/ public_html or /Library/WebServer/mysite). Enter the path into the first field on the importer's options page, as shown in Figure 5-14.

Then, identify the types of files you want to import and list the file extensions, separated by commas. If there are any directories the importer should skip, like image or script directories, specify those as well.

Figure 5-14. HTML Import: specifying directories, file types, and the content area

To select the part of the file that contains the main content—what will become the post or page content in WordPress—you can specify an HTML tag or a Dreamweaver template region. If your pages are based on Dreamweaver templates, select the Dreamweaver option and enter the name of the content area (e.g. "Main Content") into the template region field. If you're using a tag without attributes, or where the attributes don't matter, simply enter the tag (without brackets) in the tag field, and leave the attribute and value fields blank. If your tag does have an attribute that makes it unique, enter the attribute name (like class or id) in the attribute field and the value in the value field. For example, if your content is contained in the <td id="main-content"> tag, your import setting would look like Figure 5-15.

You can also have the importer clean up any unneeded HTML, if you wish. For example, if your files came from Microsoft Word or Frontpage, they're probably littered with extraneous div tags, smart tags, and class attributes. To clean them up, choose Yes under the Clean up bad (Word, Frontpage) HTML heading, then specify the HTML tags and attributes that should be allowed. Any tags and attributes not in these lists will be removed. A list of suggested tags and attributes is provided, along with an extra set that you should include if your content contains data tables.

Figure 5-15. HTML Import: choosing the title and metadata

You can select the title tag the same way you chose your content area, as shown in Figure 5-15. You can have the importer remove common words or phrases from your titles. Remember that your site title will be added automatically to your WordPress posts and pages (depending on your theme; see Chapter 7). If it's part of your HTML files' <title> tags, for example, you'll need to remove it now to avoid duplication on your WordPress site.

The metadata section (also shown in Figure 5-15) is where you can specify all the little details: whether you want to import the files as posts or pages, which user should be listed as the author, and what the categories and tags (for posts) or page parent (for pages) should be. You can also choose whether to use the meta description tag's contents as excerpts.

If you have created custom taxonomies for your site (which I'll go over in Chapter 12), you'll see fields for those as well.

Once you've filled in all that information, press the Import button at the bottom of the page and sit back! If you have many files, this might take a minute or two. When the importer has finished, it will display a list of the imported files (Figure 5-16) with any errors noted. It will also give you a set of rewrite rules that, with some slight modifications, you can use in your .htaccess file to redirect visitors from your old files to your new WordPress posts or pages. The original paths won't be exact, especially if you moved the files into a temporary directory while importing them, but you should be able to correct them with a simple search and replace.

Figure 5-16. The imported files and .htaccess rewrite rules

If the site you're importing has a news section, keep in mind that you could import those files as posts, then remove them from your import directory, and import the rest of the files as pages.

Was this article helpful?

0 0

Post a comment