An Update on NLTK — Web Based, GUI NLTK via WebNLP! | CS 6724: Investigative Technologies in Society

Tool: WebNLP at http://dh.mi.ur.de/

My demonstration tomorrow is essentially a re-do of my earlier NLTK demo. For this round, I’ve been looking at the tool WebNLP. Unfortunately, the website on which it is hosted hasn’t been loading all evening. The white paper that accompanies this tool is very interesting, and reflects some of my experiences trying to work with NLTK as a humanities/social sciences researcher. Hopefully the site will be up again soon, but for everyone’s edification, I thought that a blog post about the WebNLP white paper would be productive.

Here’s the paper: https://www.researchgate.net/publication/266394311_WebNLP_-_An_Integrated_Web-Interface_for_Python_NLTK_and_Voyant

This includes a description of WebNLP’s functionalities, but is also a rationale for the development of a web-based GUI NLTK program. The authors write:

Most of these [NLP] tools can be characterized as having a fairly high entry barrier, confronting non-linguists or non-computer scientists with a steep learning curve, due to the fact that available tools are far from offering a smooth user experience (UX)…

The goal of this work [the development of WebNLP] is to provide an easy-to-use interface for the import and processing of natural language data that, at the same time, allows the user to visualize the results in different ways. We suggest that NLP and data analysis should be combined in a single interface, as this enables the user to experiment with different NLP parameters while being able to preview the outcome directly in the visualization component of the tool. (235-236)

They go on to describe how WebNLP works, visualized in this graphic:

Visualization of WebNLP functionality

As we can see, WebNLP joins Python NLTK with the program Voyant to create a user-friendly (i.e. no coding or command line interface requirements) tool for NLP that is sophisticated enough for scholarly research. The fact that it’s web-based seems to be a benefit, too; I’d imagine that a local application would require the user to install Python, which could be problematic.

WebNLP is based on JavaScript and the front-end framework Bootstrap. I don’t know if it’s open source — I couldn’t find it on Github and the paper doesn’t mention that. As far as I can tell, the only place it is hosted is at the link shared above. It doesn’t seem extremely difficult to implement, and given its potential usefulness, I (of course!) think that it should be hosted at a stable site — or that the code should be opened up to allow other to host WebNLP applications and iterate on it. Right now, it is limited to sentence tokenization, part-of-speech tagging, stop-word filter, and lemmatization. I think there is more that NLTK can do. The visualization output, meanwhile, can produce the following: Wordclouds, bubblelines, type frequency lists, scatter plots, relationships and type frequency charts. Even just as an experiment or learning tool, it might be useful to think about how else these data might be visualized.

All this being said, I really do hope http://dh.mi.ur.de/ returns soon. In any case, it’s encouraging to see that something I thought should already exist… already does. And the paper has been useful for my semester-long project.