Topic Modelling Experiment

Ryan Nichols and Ted Slingerland, both longtime readers of this blog, write with an invitation to blog readers to help them out by participating in an experiment. Read on!

Dear Warp, Weft, and Way users,

As affiliates of the University of British Columbia’s Cultural Evolution of Religion Consortium (CERC), we write to invite Warp, Weft, & Way users with some training in classical Chinese to participate in an experiment. Several years ago we embarked on a project to use quantitative methods of analysis, including statistical testing and unsupervised data mining, in order to gain new insights into classical Chinese texts. Our corpus, drawn from Donald Sturgeon’s, a resource we all know and enjoy, contains over 5 million characters from texts that date from pre-Warring States through the Tang.

After many pilot studies and preliminary testing, we generated a topic model for our classical Chinese corpus. In other words, we have compiled the corpus using an algorithm that allocates statistical probabilities to the associations between characters in the texts. One of several results of this process is the generation of 100 lists of characters that appear statistically closely related to one another. (Other results include data on how a given list of characters, or ‘topic’, is represented in each text in our corpus, for example.)

In contrast to much computer modeling and text mining on humanistic sources, we do not want to stop with our algorithm’s outputs. Rather, we want to seek the opinions of experts in an effort to assist us in fixing the meanings of these lists of terms. For this reason we have converted each of 100 topics produced by our model into a word cloud containing the ten characters in each topic with the highest statistical loading. We hope Warp, Weft, and Way users will find the potential of such techniques intriguing enough to participate in our experiment by using about 15-20 minutes of your time as experts on Chinese thought. After clicking the link below, you will be taken to a consent form and will then be asked to view a random assortment of 15 word clouds. After each word cloud, you will be asked a handful of brief questions, almost all multiple choice, about its content. At the end of this survey, we will collect some standard demographic information from you necessary for statistical analyses. Here is the link to the survey:

Thank for your participation. This would not be possible without you.


Ryan & Ted

Hannah Pang detail

3 replies on “Topic Modelling Experiment”

  1. I meant to add that we will be sharing all the results of our research with the public, including participants of course, when it is complete.

  2. I’ve now done this, and it was kind of like playing a game — I was actually a bit disappointed when I got the the end and couldn’t play anymore. (Maybe I’ll do it again!)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.