Page view counter

Obtaining and Caching a Lot of Words

list of A words

It is surprising how often I wish I had a lot of words handy.  This week it has been because I've wanted to play with the AutoCompleteBox (you just set the list of words as the ItemSource for the control and voila!

In previous posts I demonstrated how I obtained these from a book through Project Gutenberg and how I used a background worker thread to keep the UI up to date. Today I'll show how to use Isolated Storage to stash the words locally to dramatically improve performance, and then after I show this nifty trick at DevConnections I'll write up how to obtain the words on one page and then use them in a AutoCompleteBox on a second page (ok, it's not that hard).

Isolated Storage

Isolated storage really works well here, because once you go to the bother of getting and sorting these words, it is rather silly to go get them again the next time you run the program. The trick, of course is just to check to see if you've already saved them in Iso Storage and then if so, just reconstitute them. If not, then when you are done using them, stash them away in isolated storage for next time.

You can get all sorts of fancy saving away complex data structures and saving different lists, but to keep things simple, let's just… well, keep things simple.

When we're about to get ask the user what file to open to grab words from, we'll do a quick "look aside" to see if we already have words saved,

void Page_Loaded( object sender, RoutedEventArgs e )
{
  worker.WorkerReportsProgress = true;
  worker.DoWork += new DoWorkEventHandler( worker_DoWork );
  worker.ProgressChanged += new ProgressChangedEventHandler( worker_ProgressChanged );
  worker.RunWorkerCompleted += new RunWorkerCompletedEventHandler( worker_RunWorkerCompleted );
  if ( TestIsoStorage() )
  {
    FilePicker.IsEnabled = false;
    if ( worker.IsBusy != true )
      worker.RunWorkerAsync( null );
  }
  else
  {
    FilePicker.Click += new RoutedEventHandler( FilePicker_Click );
  }
}

 

This takes a bit of explanation. I'm still setting up my worker thread, because i'm going to use it whether or not i Have the words. It will be the worker thread that take s the single string of words and rebuilds my list of strings that the application expects. And why not? That part is already working?  The only change I wanted to make was either to get the file and parse it or not.

Let's look at TestIsoStorage(),

The logic here is that I call GetUserStoreForApplication which returns an IsolatedStorageFile at the application level (and since this is a resource I want to make sure is given up as quickly as possible I take advantage of C#'s using construct) . With that, I can test if my isolated storage file exists and if it does, I open a StreamReader and in one line I open the file for reading and suck the entire contents out as a single string, which  I place into a string builder.

NB: I'm of two minds about my ambivalence about having a single return point. One argument is that it is less confusing if you use a flag (retVal) and always exit at the end, the other responds with a word I'm not allowed to write here. Most of the time I would rewrite this as

private bool TestIsoStorage()
{
  bool retVal = false;
  using ( var store = IsolatedStorageFile.GetUserStoreForApplication() )
  {
    if ( store.FileExists( "SortedWords" ) )
    {
      using ( StreamReader reader =
        new StreamReader( store.OpenFile( "SortedWords", FileMode.Open ) ) )
      {
        sb = new StringBuilder();
        sb.Append( reader.ReadToEnd() );
        retVal = true;
      }
    }
    return retVal;
  }
}

but I don't get too excited about it.

The key to note (and I admit it is almost a hack) is that if we get the words from the file, we never call the dialog box (in fact we disable the open file button) and kick off the background thread with a null file

if ( TestIsoStorage() )
{   
  FilePicker.IsEnabled = false;   
  if ( worker.IsBusy != true )      
    worker.RunWorkerAsync( null );
}

 

The first half of DoWork is encased in a big if statement that basically turns it into a noop if we have obtained the words from isolated storage.  I kinda' hate this because the connection is not obvious, but it works, its late and I swear I'll come back and fix it… really.

void worker_DoWork( object sender, DoWorkEventArgs e )
{
  const long MAXBYTES = 200000;
  BackgroundWorker workerRef = sender as BackgroundWorker;
  if ( workerRef != null )
  {    // begin massive ugly hack      
    if ( e.Argument != null )
    {
      System.IO.FileInfo file = e.Argument as System.IO.FileInfo;
      if ( file != null )
      {
        System.IO.Stream fileStream = file.OpenRead();
        using ( System.IO.StreamReader reader = new System.IO.StreamReader( fileStream 
        {
          string temp = string.Empty;
          try
          {
            do
            {
              temp = reader.ReadLine();
              sb.Append( temp );
            } while ( temp != null && sb.Length < MAXBYTES );
          }
          catch { }
        }     // end using             
        fileStream.Close();
      }        // end if file != null      
    }           // end if argument is null       
    string pattern = "\\b";
    allWords = System.Text.RegularExpressions.Regex.Split( sb.ToString(), pattern );
    long total = allWords.Length / 100;
    long soFar = 0;
    int newPctg = 0;
    int pctg = 0;
    foreach ( string word in allWords )
    {
      newPctg = (int) ( ( ++soFar ) / total );
      if ( newPctg != pctg )
      {
        pctg = newPctg;
        workerRef.ReportProgress( pctg );
      }
      if ( words.Contains( word ) == false )
      {
        if ( word.Length > 0 && !IsJunk( word ) )
        {
          words.Add( word );
        }     
      }       
    }        
  }                      
}

 

Finally, when the thread ends we make sure to go save the words for next tmie if we've not done so yet,

private void StoreWords()
{   
  Message.Text = "Storing Words in Isolated Storage...";    
  using ( var store = IsolatedStorageFile.GetUserStoreForApplication() )   
  {      
    if ( ! store.FileExists( "SortedWords" ) )      
    {         
      StringBuilder sb = new StringBuilder();         
      foreach ( string s in words )         
      {            
        sb.Append( s + " " );         
      }         
      using ( StreamWriter writer = 
        new StreamWriter( store.OpenFile( "SortedWords", FileMode.Create ) ) )         
        { 
          writer.Write( sb.ToString() ); 
        }
    }
  }
}

 

The result, not surprisingly is a much faster start up to the program.    I do worry just a bit about the detritus of long forgotten isolated storage files cluttering up the disk. I wonder if we can put in a self-destruct timer?  I'll have to look into that.

 

-j

Published 10 November 2008 07:00 AM by jesseliberty

Comments

# party42 said on 10 November, 2008 09:04 AM

Got a url to download the project? How does the autocomplete scale though? Is it still fast when using 10,000 words? Or how would you recommend doing a google like suggest algorithm (www.google.com/webhp) where the list of results is context sensitive?

# AutoCompleteBox: Caching von W??rtern said on 10 November, 2008 03:43 PM

Pingback from  AutoCompleteBox: Caching von W??rtern

# jesseliberty said on 10 November, 2008 09:07 PM

I will post this project when I get back from DevConnections, and while I'm at it, I'll post one that stashes the words in isolated storage and compares the performance.  

It's tempting to set the minimum number of letters to 3 (or more) and then do the search -- or otherwise try to optimize (pare down) the set of words, but I can't believe that the user experience would be tolerable.  It would be interesting though to try this not with a few thousand words but with a few hundred thousand. I'll try that as well.

# unruledboy2 said on 10 November, 2008 11:54 PM

is the default(initial) size of Isolated Storage in 2.0 RTM still 1.0M or changed to 100K(0.1M)?

# Silverlight News for November 11, 2008 said on 11 November, 2008 02:53 AM

Pingback from  Silverlight News for November 11, 2008

# 2008 November 11 - Links for today « My (almost) Daily Links said on 11 November, 2008 03:41 AM

Pingback from  2008 November 11 - Links for today &laquo; My (almost) Daily Links

# Dew Drop - November 11, 2008 | Alvin Ashcraft's Morning Dew said on 11 November, 2008 08:44 AM

Pingback from  Dew Drop - November 11, 2008 | Alvin Ashcraft's Morning Dew

# Community Blogs said on 11 November, 2008 12:25 PM

In this issue: Ian Griffiths, Matthew Casperson, Chris Anderson, IDV Solutions, Nikhil Kothari, Dave

# » Silverlight Cream for November 11, 2008 — #424 said on 11 November, 2008 01:29 PM

Pingback from  &raquo; Silverlight Cream for November 11, 2008 &#8212; #424

# obsid said on 11 November, 2008 08:06 PM

By the way, I find alot of times when I think isolated storage is the answer, there is another way in that we can use the browser cache.  This only works till they clear the browser cache clearly (but when they want to do that, shouldnt you be clearing your stuff too?).  For instance if your getting the words from a URI hosted on your site, it would be a bad idea to use isolated storage, as they browser handles caching that for next time anyway (as long as your list is presorted on the website).

# dgearey said on 13 November, 2008 07:51 PM

I wonder if you could throw a little sample code my way to help me with the following scenario.

I have an auto-complete control for searching hierachical data in a treeview. I want the search to be context aware.  If a particular category has been selected then I want the entries in that category to appear first.

In the above scenario I need my treeview to be filled dynamically because there are more than 7000 total items (which is manageable if I dynamically create/delete sub-items during expand/collapse.) I retreive datasets with WCF service calls.    

Most of all I'm hoping for c# code behind examples because xaml, as powerful as it is, can't be stepped through and gets so messy after blend gets it's hands on it. I shiver when I see so many functional relationships defined in a super powered markup language.

Any help would be greatly appreciated!  

Thanks, Donald

p.s. I agree isolated storage has good uses but I prefer not to use it unless I can be sure I won't end up leaving garbage behind on people's not-so-isolated storage.  Is there any "delete on exit" functionality I can use?

# Jesse Liberty - Silverlight Geek said on 03 December, 2008 12:22 PM

The Silverlight Toolkit includes a wrap panel that allows you to add elements to it and will automatically

# Microsoft Weblogs said on 03 December, 2008 12:43 PM

The Silverlight Toolkit includes a wrap panel that allows you to add elements to it and will automatically