Summary
In the paper “Visualizing Email Content: Portraying Relationships from Conversational Histories”, the authors document the ideation, algorithm design, and content analysis of their application Themail. The purpose of this program is to give users a photo-album type visualization of their relationship with another person using the user’s archived e-mails with that person. Themail takes the most frequently used words exchanged by e-mail between the pair, checks its uniqueness (whether it also frequently exchanged with many other people), and uses an algorithm to arrange the words in one of two meaningful formats.
- Needle Mode: In the foreground, vertical stacks of words are placed on a timeline by month. Each stack consists of the most frequent/unique words exchanged during that specific month. This provided a more detail-oriented exploration of one’s e-mail archives. Despite its capability for searching these details, only 20% of users to greater interest in this mode.
- Haystack Mode: In the background, larger words float faintly but visibly. These words represent the most frequent/unique words exchanged over the course the year being represented. This provides a more “big picture” exploration of trends and themes. 80% of users utilized this mode the most, indicating an interest in the greater relationship they had with people and the aesthetic quality of the application.
The algorithms used to generate word value and populate the screen was based off of a past algorithm which scores words based on their relative frequency in one document out of a collection. The team evolved this concept by comparing subsets of e-mails against supersets so that they may not only take relative frequency into account but also relative uniqueness. The equations for yearly and monthly word values are nearly identical, with the key difference being that yearly word frequencies are cubed in order to increase overall weight results.
Reflection
There were two aspects of this paper that interested me. Firstly, the aesthetic factor that made users more interested in the haystack rather than the needle mode goes to represent how important it is to remember that despite the technical nature of our field, a appreciated level of graphic design is involved in computer science. Secondly, the algorithm design brought to mind my team’s project for the semester. We too must come up with an algorithm that utilizes word frequency to detect importance in a certain context. While our topic is not based upon time, the paper nonetheless provides us with a unique perspective and possibly a stronger foundation from which we can build our equation. Namely, the use of inverse frequency to detect relative frequency rather than raw frequency may allow us to see if we’re focusing on the wrong keywords (as we are currently measuring by the latter).
Questions
- What other visual structures and organizational patterns were considered for this application? What benefits did this horizontal timeline with vertical stacks afford over the other options? What drawbacks did you have to accept with it?
- The paper mentions the limitations of their content analysis in that the application cannot detect personal weight of e-mails. For example, an e-mail from a mother wishing happy birthday to her son might be taken as having greater word value than one where she is reminding him of a dentist appointment. Is there any progress being made on that front? What ideas are there for identifying overall message value?
- What factors went into deciding to have the haystack and needle mode occupy the same screen rather than having a way to switch between the individual modes? Can you see any benefit to having a screen for each mode rather than a fused screen? Perhaps it would allow more features?