Creating an Application from Scratch, Part 3by Jesse Liberty
In my previous column I began writing my application PeopleLikeMe in earnest.
Quick summary: PeopleLikeMe lets you ask the following question: please find folks who reviewed the books I've reviewed and whose taste matches mine, and then show me books they reviewed that I have not yet reviewed, that they rated highly. That is, show me books I'm likely to enjoy.
Since the last time I wrote, I have implemented all of the code to obtain records from Amazon (described below) and to save that data in a SQL Server 2005 (this example should work equally well with SQL Server 2005, SQL Server 2005 Express, and SQL Server 2000).
In the original design, the user would login to an ASP.NET application and then ask to update his or her records, or, if the records were already fully up to date, then the user could ask to see books highly rated by other readers who matched the user's taste in books.
Unfortunately, the action of upating the records turns out to take a very long time. For example, when I update my own records, Amazon reports that I have reviewed 46 books. These 46 books were also reviewed by a total of nearly 2,100 other people, and those 2,100 people have reviewed in total nearly 17,000 books! Also, the time it takes to gather this information is gated by the fact that my license to query the Amazon web service requires that I make only one query per second (in my un-optimized version I lose 1,100 seconds, or nearly 20 minutes, just waiting my turn).
In any case, the idea that the user would wait for all of this, while staring at an inert browser, is clearly untenable. There are a number of design alternatives. I've settled on two.
In the long run, I'll create a web application that lets the user choose either to update or to search for recommendations (using web-forms authentication, personalization, etc.). If the user asks to update, the application will come back and say that this is a lengthy process and an email will be sent when it is completed. There will be a second (Windows) application that will run on the server that will take the user's ID off a queue, and run the update and then fire off the email.
Now, one could certainly argue that what is needed is not an ASP.NET application at all, but rather a stand-alone desktop application that works only for one user. Each person who wants recommendations runs his or her own copy. We'll finesse this decision for now and come back to it in later articles.
In either case, what is needed right now is a Windows application that does the work of:
- Finding all the books I have reviewed.
- Finding everyone else who has reviewed these same books and giving them a score based on how closely we match in our rating of the book.
- Finding all the other books that all those others have reviewed, and how they rated these books.
That work is unchanged no matter which design we end up with, and so that is what I've implemented for this article. To keep this example simple, rather than taking my
CustomerID off a queue (or from a database table) for now, I enter my Amazon
CustomerID in a text box and click Run. The program runs only once, for me, but modifying it to take the next
CustomerID off a table and run again would not be difficult.
The new Windows application (PeopleLikeMeEngine) updates my table of reviews and finds all the other reviewers, scores their records, and finds all their reviews. Because this is a Windows application, it is now able to provide far more extensive "real-time" feedback, updating the display as it finds reviewers and inserts records into the database, as shown in Figure 1.
To begin, I enter my Amazon
CustomerID (see the previous article for how to find your ID) into the text box at the top of the form and click Run. The program finds all the books I've reviewed, then all the other reviewers of those books, and in turn, all the books they've reviewed.
(It should be noted that what Amazon calls the
CustomerID I call the
ReviewerID. We can argue all day about whether or not that is a good idea.)
The topmost listbox shows the book ASIN (ISBN), the
CustomerID), and that reviewer's rating for the book (1-5 stars). As database errors occur, they are added to the second listbox. This is expected, because the database is set up to reject duplicate entries. The complete error text for the current error is shown in the (disabled) text box (figure 1, marked 1). Double-clicking on any of the errors will display in the text box.
As the program runs, I display three running totals: the number of Reviewers who have reviewed the same books as I have (figure 1, marked 2), the number of Reviews by all these reviewers (figure 1, marked 3), and the total number of seconds we've lost waiting to ensure that we don't make a request to Amazon more than once per second (figure 1, "Seconds gated," marked 4).
The gated figures are so high (up to 20 minutes when I run the program) that this is an obvious place to look for optimizations (to be covered in future articles).