Crowdsourcing and Engagement

We’re five weeks into Digital Cultural Heritage now and I’m starting to notice that certain words are becoming regular parts of the DCH vocab. Notable amongst these are metadata and crowdsourcing. I’ve already talked about metadata in a previous post. So now it is the turn of crowdsourcing to be understood.

Reading quotes for PH

I personally like the idea of crowdsourcing. It bridges that gap between the ivory towers of academia and the person on the street (or in front of the screen). Sometimes, as with so many good things, it can be misused. You end up with the equivalent of sweatshops, where volunteers are used to do the simple but time-consuming jobs like tagging. Other times, it works fantastically as a collaboration where everyone learns from everyone else. A great example is Measuring the Anzacs, which encourages citizen historians to tag and transcribe records of Anzac soldiers. While academics benefit from the additional help in tagging and transcription, the so-called ‘crowd’ have the opportunity to see behind the statistics to who the soldiers really were.

The thing about crowd sourcing is that it needs people to engage. That is when it works at its best.

Let’s be honest. No one is going to spend their time tagging or transcribing medieval manuscripts unless they are interested in it. The nature of crowd sourcing is that it is never a random crowd who engage with the project but individuals who are interested in the topic. For example, a marine biologist will most likely not engage with a Shakespearean project but an member of an amateur dramatics group might do. Thus, the need for engagement in crowd sourcing means that most projects ends up being group sourced by default.

These groups may not be structured or chosen on purpose but they are united by common purpose. So you end up with a group sourced project who are all engaged with your project.

And if you’re honest with yourself, you would rather have a small group of actively engaged people than a large group of vaguely interested people who don’t really care.











Textual Analytics and Corpus Linguistics: Beyond the Hypothesis

One of my modules this semester is looking at “computer-assisted approaches to text analysis”. Or at least that is the module overview says. In reality, we are looking at research questions in the humanities and how digital methods for text analysis and corpus linguistics can help us answer them.

From the first two seminars, the idea that has stuck with me is that most humanities research does not go beyond forming a hypothesis. If we take a scientific approach (as digital humanities borrows a lot form computer science, this isn’t difficult), we can read a history or English research paper as an exploratory discussion of the topic. However, the conclusion often does not beyond what we have discovered through the course of this discussion.

Why is textual analytics different? The main reason, and the one I am interested in, is that we can look at a larger sample of data by using computers. We can compare different elements of multiple texts, allowing us to understand them quantitatively as well as qualitatively.

For example, my chosen thesis topic is on how the presentation of Sir Gawain changes in Arthurian literature and film. I could just use a close reading approach but this would be largely reliant on my own interpretation of the texts and small samples of data. In other words, qualitative research. If I use digital methods, I can compare the frequencies of adjectives related to Sir Gawain within each texts. This will allow me to demonstrate how Sir Gawain’s identity has changed through time, making my research quantitative.

Why add this quantitative element to humanities research? By adding an analytical aspect to research, humanities researchers can prove their hypothesis instead of relying on their own intuition and interpretation.