Brian Button - One Agile Coder

Blogging on all things .Net, C#, and Agile

My Links

Blog Stats

News

Archives

Post Categories

Agile Solutions Group

Blogs I read

Enterprise Library V1 Team

Powershell script to convert RTF docs to plain text using MS-Word

On our current project, we had a bunch of RTF files that had some text in them that we wanted to yank out and store in a database. Instead of laboriously opening and resaving each of the files as plaintext, I decided to write what I had hoped would be a simple PowerShell script to do that for me. What follows is my best try at that script.

I am very open to any questions, criticisms, and improvements in the script, as I'm still very much learning the language. And my fundamental question is really, was PS the right tool to use for this job?

Enjoy!

-- bab

=============== Reformat.ps1 ================

function translate_from_rtf_to_text([System.Io.FileInfo] $source_file)
{
    $source_file_name = $source_file.FullName
    $dest_file_name = create_destination_file_name($source_file)

    write-host "Copying from $source_file to $dest_file_name"

    $rtf = $word.Documents.Open($source_file_name)
    $rtf.SaveAs([ref]$dest_file_name, [ref]$saveFormat)
}

function create_destination_file_name([System.Io.FileInfo] $source_file)
{
    $dest_file_name = $source_file.Name.Remove($source_file.Name.Length - $source_file.Extension.Length) + ".txt"
    $dest_directory = $source_file.DirectoryName
    $dest_file = join-path $dest_directory $dest_file_name

    return $dest_file
}

if ($args.Length -ne 1)
{
   write-host "Usage: Reformat.ps1 <path to root directory of rtf files>"
   exit 1
}

$path = $args[0]

write-host Converting all rtf files under directory $path...

$word = new-object -com Word.Application

$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatTextLineBreaks")

get-childitem -path $path -include *.rtf -recurse | foreach-object -process { translate_from_rtf_to_text $_ }

$word.Quit([ref]$true)

posted on Tuesday, March 13, 2007 2:06 PM

Feedback

# re: Powershell script to convert RTF docs to plain text using MS-Word 3/31/2008 8:17 PM crystal

Very interesting script. Nice post

Title  
Name  
Url
Spam Protection:
Enter the code you see:
If you can't read it, click your refresh button to get a new image.
Comments