How Linguistics Can Shed Light on Citizen Science Communities| Webinars

This edition of the ��'s Interdisciplinary Webinar Series sees Leïla Choukroune, Professor of International Law and Director of the �� Thematic Area in Democratic Citizenship, host a presentation by Claudia Viggiano, Research Assistant, Democratic Citizenship Research Theme.

Citizen science is a rising form of crowdsourced research that harvests the contributions of volunteers for the advancement of science, especially with regards to collecting or classifying large bodies of data. While extensive, research on citizen science has mainly focused on quantitative metrics such as success and motivations, rather than on the language used on the virtual platforms that host CS projects.

This presentation uses corpus linguistics to analyse the CS community, exploring the linguistic devices and communication strategies used by members to signal their participation, expertise, and membership within the community. By analysing linguistic interactions, this work argues in favour of integrating approaches from linguistics into the wider study of CS communities, which can help gain better understanding of issues such as motivation, engagement, success, learning and access to science in CS, while also informing the design and maintenance of CS platforms in general.

Speaker’s Bio

Claudia Viggiano has just passed her PhD viva in Corpus Linguistics, is a part-time Lecturer in Italian, and a Research Assistant with the Democratic Citizenship research theme. Her research focuses on online citizen science communities and specifically on the ways users create and employ language to self-identify as members of a community

Research Futures: How Linguistics can shed light on citizen science communities

A very good afternoon, everyone, a very warm welcome to yet to new fascinating edition of our Research Futures series.

Delighted today to welcome Claudia Viggiano.

I was going to say almost Dr. Claudia Viggiano, because Claudia has recently defended her thesis and I know it's been quite successful.

I'm particularly happy because Claudia has been my research assistant.

As you may know, I'm Leila Choukroune.

I'm professor of International and director of the Democratic Citizenship Theme.

And for the past almost three years, actually, Claudia has been working with me.

Great support.

And you may know as well that in the context of our research futures webinars, we like to give the floor to what I call young scholarship.

So that's exactly what we're going to do today to reflect upon this question of how linguistics can shed light on citizen science communities.

Really interesting, because I always find that beyond language, just some sort of jargon.

You have to adopt a particular type of language, if not a jargon, to become a member of a tribe, to belong.

So I don't know whether Claudia is going to talk about that, but I'm quite excited and really delighted to give you the floor, Claudia.

Thank you very much, Leila.

I am going to start by sharing my screen, so I hope you can see it.

Yes.

OK.

All right, so the title of my talk is democratising science, how linguistics can shed light on citizen science communities.

So this talk focuses on my PhD work, but mainly it is just sort of an extension of my PhD work.

And some of it comes from some of the some of the reflections that I eventually came out of my work specifically.

A lot of I this is something I struggled with my PhD with because a lot of research on citizen science community does not include any linguistics, knowledge or insight, which is very curious either way.

And so this this presentation is sort of about that and about how linguistics can actually help other disciplines study citizen science communities.

So I will start by talking about citizen science.

I'm not sure if everyone is aware but citizen science is defined as the online collaboration between scientists and members of the public who volunteer to take part in research.

In a way, this is a sort of crowdsourced research taking place on online platforms that usually host a different number of projects where volunteers, people who are interested in science or people who want to learn about science or people who have a lot of free time, volunteer to take part in specific scientific projects that usually they can choose from.

So usually online websites that do that carry citizen science projects will have a wide range of subjects.

So they will include astronomy projects, zoology projects, annotation projects, or even looking at old documents or analysing fossils.

So there is many, many out there.

What usually happens on the citizen science platforms is that volunteers will go through a training phase, obviously online, and after that they will take part in a task.

So tasks will usually entail some data collection or most more likely, more often classification, annotation of data, sometimes even a transcription of data.

But most of the time, and especially with the community that I'm looking at, is more about classifying data so they learn how to classify them and then they will they do the tasks basically, which help researchers complete their own projects quickly and at a larger scale.

Each citizen science platform and each citizen science project will use a discussion for a forum where users can ask for help, discuss findings or chat, get to know each other.

So usually on these platforms, you'll see both volunteers and staff members, team members and researchers themselves.

Zooniverse is the case study that I'm talking about today.

So it was the citizen science community and platform that I focussed on for my research.

It is an umbrella website, just like many other citizen science websites.

It is just a container of other projects.

Researchers have to apply to them to to have their projects on the platform.

Zooniverse currently has over 80 active projects in different scientific domains, and Zooniverse also has its own online discussion forums divided into their own single projects, which is what they call the talk section.

This is the this image is from Zooniverse, which they call people powered research.

So it's important to acknowledge the goals of citizen science platforms before I go into, I guess, my problem and my issue that I want to sort of attempt to solve today.

There are two main goals that are identified in my research.

The primary goal is obviously maximising the number of reclassification classifications.

So obviously it's a pragmatic goal.

Team members from each of the projects, researchers from the projects, will obviously want volunteers to complete as many classifications as possible.

So they will want to have them engaged, engaged, and they will want them to have to complete accurate classifications as well.

So making sure that the training is successful, then the secondary goal of citizen science platforms is that of learning and eventually democratisation of science.

This obviously comes into a wider movement, which is open science and the belief in open science and having science accessible to many and papers accessible to as many people as possible.

So learning obviously eventually leads to the primary goal.

So the secondary goal leads into leads to the primary goal.

It feeds into it because obviously the more knowledgeable users will be, the more classifications and the more accurate classifications they will be likely to complete.

Users and moderators on these communities understand that creating an informed and engaged, goal oriented community is key to achieving both the primary and secondary goal.

So this is where my issue comes in and what most want to be talking about today and research on citizen science has mainly focussed on quantitative metrics, but little effort has been put on language and how the study of language can help explore these quantitative metrics and other disciplines and how it can be integrated into other research.

Most of what I found in the literature, most research in my literature has focussed on these four points.

The first is success of citizen science, communities and projects, motivations behind participating in these communities, engagement and retention and participation into these communities, and also learning outcomes of participation.

So I will be going through each of these four points with insights from my from my study and from my data into how we can sort of integrate the study of these four research, what I suppose research interests and how linguistics can help with them.

So first I will be focussing.

I will be talking about my project.

My project is a PhD project that uses corpus method to methods to define and describe the language of citizen science community, specifically the Zooniverse community.

So obviously, corpus methods, I have to I have to explain that a corpus plural is called corpra.

So if that is confusing, that's that's why a corpus is a large collection of naturally occurring texts that are stored electronically and are searchable through software, software that is made specifically for corpus linguistics.

And obviously, in this case, I collected this corpus data and I made it intoa corpus and I stored it and searched it with the software.

So it is derived from that corpus.

Linguistics is a quantitative sort of big data approach to analysing tlanguage it does not only need to be quantitative, but it gives us the opportunity to look at patterns in large bodies of text that you wouldn't be able to analyse manually.

And after that, you can obviously close up and zoom into your findings to look at actual qualitative data, we may say, and actually analyse the single instances.

But corpus linguistics helps us with that.

It helps us with finding patterns that we wouldn't think maybe would be feasible if we were looking at six million words of text.

So my data set, the Zooniverse dataset is a six million word corpus that includes metadata.

That metadata is the data that comes besides the linguistic data.

So it will be information about who is the user that posted the message, when it was posted, where it was posted, because Zooniverse has many projects inside it, et cetera, et cetera.

So that dataset is made up of 43 Zooniverse projects and it was collected between December of 2010 and April 2016.

Sorry it wasn't collected then.

It collects linguistic data that started in December of 2010.

So the first messages on these boards, online boards were in 2010 and the last ones that we collected 2016.

And the tools that I'm analysing SketchEngine is the name of the software that I use.

There are many for corpus analysis, but this is the one that I use mostly.

And most of the tools that I will be talking about today are keyword in term analysis and concordance lines.

But I use the other ones in in my data.

So to explain what keywords are in terms as well.

A keyword is defined as a word that is more frequent in a text or corpus understudy than it is in some larger reference corpus.

What a difference is in frequency is statistically significant.

Now this means that what your software does is that it compares to corpora, usually your corpus and a more general, very large one.

And it sort of highlights in your corpus what's more unique about it.

So usually this will give you insights into what your corpus is about.

Of course, you will see on the right hand side that some of these are more scientific leaning lexical items.

Some are related to animals, species, so someone new to the to this community and quite unique to it, like Zooite and also I suppose Zooniverse is one of them.

There are some community specific acronyms like GZ, RGZ, et cetera, et cetera.

And and key words is the name for single word items and terms is the name for the more than one word, basically multi word units.

This this example on the right hand side is just for keywords.

So as I said, keywords internally I use to extract key lexical items and semantic areas.

And I kept the first two hundred and fifty items of the list.

Then the other thing that I will show you today, concordance lines.

Concordance lines are the more we can consider them as the more qualitative side of corpus analysis, because you can actually zoom into a word and see what you can see a bunch of text where that word or those words occur and see what the environment is like.

Yeah.

So they are useful to see the word in context and in the context in which it occurs in in this specific corpus.

So this is an example for this is more of a random example from Zooniverse.

OK, so I'm not going to focus too much into this, but when I was going through keywords, it was very helpful to look at the list of keywords, the top two hundred and fifty and to top two hundred and fifty terms.

So two work units and sort of top them, label them by semantic area.

I did this manually because this corpus is so specialised in a way that this needed to be done manually, as I think.

And the concordance lines that I just showed you actually helped in tagging this data so I could actually ascertain what semantic area each keyword corresponded to.

So I had seven resulting categories that helped me with the next stage of my research.

So Zoon is the first category, which is starts with Zooniverse.

So these are the keywords and terms, also multi units that so that were more unique and more peculiar about this specific community.

So it's a lot it includes expressions that are unique, coined by users that represent the identity of the community, but also specific references to specific projects.

So, for example, people we're talking about Galaxy Zoo, which is one of the projects, and that's how I tagged it.

The others are a bit more straightforward.

So the others the other category is science.

So this is more scientific terminology.

The other is tech, which is more about the boards, in the community boards and the functioning, the tech and the technical functioning of these boards.

The others are animals.

So animals are obviously very, very frequent in zoology projects.

The others are acronyms.

So obviously acronyms.

Most of them are created by community members already existing in science users.

So users are actually calling each other or referencing each other.

And the last one was hashtags, which was an interesting finding because to use it sort of adopted hashtag so the other users could find similar findings in the community.

So that was interesting as well.

And these categories help me analyse my data more thoroughly and more in depth and to find patterns through them.

So going back to those five points and those five areas that have been studied, that have studied the they have studied citizen science communities, and I'm going to go through the first one, which is success.

So many studies have focussed on success of citizens, citizen science projects and communities.

And success is often measured in a number of publications that come out of specific projects, which obviously leads to more visibility for the projects in the media, but also by a number of volunteer contributions.

So these are very quantitative metrics.

The more the more extensive study on success of citizen science communities is far from actually Cox's et al, �� University.

But it's with other people across many universities and in the UK and with Zooniverse researchers as well.

They devise a success matrix for Zooniverse projects that included measures of contribution to science and public engagement with relevant sub criteria.

Their main findings were that since its scientific impact and public engagement positively correlated, and also that astronomy projects consistently scored the highest in the Matrix.

This is second point sort of makes sense because Zooniverse was born out of one astronomy project and it became an umbrella website after that.

So it was born.

Galaxy Zoo, which is looking at galaxies, pictures of galaxies and classifying pictures of galaxies, and then after that they saw how successful it was and they turned it into an umbrella website that hosted more astronomy projects, but also other projects.

So it sort of makes sense that people who are interested in astronomy will go straight to this one, to this specific project and then maybe to this specific website, sorry, and then maybe later get into some of the other projects.

So how can we explore success with linguistic analysis? So the first thing that we can do, obviously, from a more superficial point of view and a more quantitative point of view, is that we can look at success in terms of how big the project subcorpra is.

So how much do people talk in each of the projects? So you can see on the right hand side that I have copra sub corpora really for each of the major projects in this universe.

So Galaxy Zoo is the most active one with 900,000 words basically.

And you can imagine how many interactions I saw.

And then the other one is the second one is planet four.

And we have also Radio Galaxy Zoo, the first three, these three sorry, Galaxy Zoo, Planet Four and Radio Galaxy Zoo are astronomy projects.

The other ones are a Chimpanzee and Welcome Gorongosa Zoology Project and Fossil Finder is about fossils.

So in a way these sort of allign, at least those three Galaxy Zoo, Planet four and Radio Galaxy Zoo, in terms of like how much people talk on these and interact in communities, yes, they are more successful.

It suggests that they can be successful, at least if we want to look at success from an engagement in the community point of view.

These are amongst the most successful.

But we also see some other ones are, you know, quite close to them as I think Galaxy Zoo is, you know, a lot higher than the rest.

You can see of this almost five hundred thousand words more than the second most active ones.

So success, linguistic analysis can also analyse success in top projects in other ways.

So through my analysis, I'm not going to go too deep into this.

But I found that there was a correlation or at least a link between top projects and how tight their communities were.

So how much people interacted and how much people were friendly to each other, how much unique, how unique sorry their project specific lexicon was.

So how tight was their linguistic community basically.

How strong the ties between users were.

And how high the collaboration patterns and knowledge exchange were.

So basically this sort of relates to how supportive the community is and how collaborative it is.

So this suggests from my data that the more collaborative, the more supportive, the more question and answer exchanges there are in the community, the more successful it will be.

So in a way, it does it does align with some of the findings in the Cox et al study.

You can see some of the concordance is from Galaxy Zoo.

So these, again, are taken straight out of exchanges.

People will say one of the really great things about being involved in Galaxy Zoo is a community of amazing citizen science experts we get to meet.

So people are valuing this opportunity.

Hi and welcome to the Zoo.

The Zoo with a word for the Galaxy Zoo.

You can see how tight lexicon is here.

The orange smudge in the bottom right corner is an artefact.

So this is an exchange after obviously a question, there is an answer explaining what it is.

So there is a collaboration and knowledge exchange pattern here.

And the last one.

Hi, and welcome to the zoo again.

So this is sort of an established word for the Galaxy Zoo, for their own community.

The blue object to the left is a foreground star from our galaxy.

Happy hunting.

Happy hunting is something that I found in my data is project specific lexicon a pretty specific expression where one specific user, but sometimes others as well, sometimes others adopt it will sort of close the interaction with happy hunting sort of is a way of saying goodbye, but also keep encouraging the other person to keep hunting for galaxies, basically in the data that they're analysing, the data that they're classifying in the tasks, basically.

So it's it's a metaphor, really, for hunting in the zoo, hunting in the Galaxy Zoo.

So, yeah, you can see some of these other elements and how linguistic, well, obviously, this is still in theory in a way, because I have not worked with other researchers.

But potentially linguistic analysis can be integrated into these kinds of studies on success, and it can help to to evaluate and assess success from a linguistic point of view, from a complete community exchange and interaction point of view.

OK, the second point and the second area that is explored very often in citizen science is motivation.

So why do users participate in citizen science? So these reasons often include contribution to science, interest in science, learning, teaching resources, so there might be some teachers on the platforms who want to find some resources for their pupils having fun and community and getting to know other people.

However, these motivations are often elicited through surveys.

But linguistic analysis can also help explore these patterns in a more natural way without having to elicit them in the context of real exchanges.

So we can observe this in how people just talk to each other on these forums.

Yeah, so this is often found.

These patterns and motivations are often found in introductory threads where people have just joined and they want to say hi to the community and what they're what they're interested in.

"Hi, I'm [Redacted], I redacted all the names with a huge interest in genealogy and the First World War, so this is absolutely brilliant.

I get to read fascinating war diaries and input into a really worthwhile project.

I love it already." So you can see why they're basically explaining their motivations for joining this specifically into.

This is a project called Operation War Diaries, and it's about basically transcribing and interpreting old copies of war diaries and digitising them.

The second example,"I've been interested in astronomy since I was a very young child.

These projects allow me to reconnect with astronomy and also let me participate in a small way in the pursuit of knowledge." So pursuit of knowledge is a motivation.

Or three, "Thank you for letting us help you with important research like this and others in the Zooniverse in a world today, it's nice to look into Chimpanzee, [one of the zoology projects, sorry] and beautiful moments untouched by the horrors that we have to live with." So this one is a bit more dramatic.

But, yes, obviously this person is saying that it's a distraction for them.

It's for fun and four "I've had a lifelong interest in science, in particular astronomy.

I hope my contributions can help advance our knowledge of our amazing solar system." So this person, of course, is participating to contribute to science.

And for interest in science as well.

So that was how linguistics can look into motivations without actually eliciting answers from people, without having to use surveys or interviews, we can just do that by looking at the data.

The third point is a point that I developed a bit more, quite a bit more in my PhD and its engagement and retention as well as citizen science communities.

So I will start with the research that's been done on it.

So research in engagement on engagement has mainly focussed on task completion and occasionally on interaction patterns.

For example, Luczak-Roesch and Tinati all explore participation and found a correlation between the numbers of tasks completed and presence in the community parts of the project concluding in one case that most of the tasks are completed by a small number of users who constitute what they call core participants.

This is similar to what I find found as well, and I will highlight it later Tinati et al also argue that communicating with other users, whether for social support or general discussion, is important for user retention.

So while my focus was not on task completion findings on core participants aligned with mine and I focus on more a bit more interaction patterns.

So how can linguistic analysis help us explore engagement and retention in citizen science communities.

So to explore engagement and communication patterns in Zooniverse, it's important to understand the nature of Zooniverse as a task based and goal oriented community.

So this is not a community like Facebook or Twitter.

People will join because they have a goal in mind.

And because they have that goal in mind, they will usually take on specific roles in the community and they will have obviously their communication will be dictated, dictated by their goals and their roles as well.

So Zooniverse and similar types of citizen science communities, there are some other ones as well, more like games.

But these ones are examples.

I argue in my thesis that these are examples of communities of enquiry.

So communities of enquiry is is a framework that's used for e-learning communities.

So communities for virtual learning, basically, where users ask for help and basically interact because they have a specific goal, which is to learn.

Yeah.

Or to pass their exams.

The COI, the Community of Enquiry Framework, explores the relation between social interaction excuse me, and engagement, cognitive development and instruction.

Communities of enquiry can collaborate to construct meaningful and worthwhile knowledge.

And this framework, the COI framework, is divided into what they call three dimensions that account for engagement, which is the first dimension social presence, learning, cognitive presence and teaching and moderation, which is a teaching presence.

So the three out of the three dimensions, I specifically focussed on social presence because the dimension of social presence obviously is more interesting, because to test engagement, to account for engagement, but also because, as you will see, the social presence framework has a linguistic grounding.

So you can see the social presence's framework, a social presence is defined or I define it as the continued engagement and participation of users who wish to be seen as integral to the community, but with their own distinct personalities.

So.

The social presence framework helps us assess engagement, community and identity signals through language use.

I would go a bit deeper into these three sub dimensions.

So social presence is a dimension of itself, but there are three types of responses in this framework.

So these obviously will depend on what types of interactions people have.

But usually these types of responses are meant to build engagement and build community and identity as part of that community.

So we have effective responses, then cohesive responses and then interactive responses.

So I will skip to the next slide because I see them as well with examples.

So effective responses in this framework.

You can see the effective responses include the show of emotion, affiliation.

The show of emotions, affiliation, self disclosure, as well as the use of humour and emoticons in the community, so you can see how we can actually carry out linguistic analysis and test whether social presence is present in the community, whether users are socially present, and you can see how these responses are effective, "Hooray, awesome, well done will also show emotions and use of emoticons.

Hi, thanks for sharing.

It is a beauty.

Happy hunting again," which is again basically closing the interaction and in encouraging the other, use it to continue to get involved.

Thanks for sharing.

And it is a beauty is a beauty, specifically a show of emotions and appreciation for what the other person has done.

And then "Wow you got yourself a gorgeous merger" there is a technical term for a galaxy.

"Nice catch" "Happy hunting" so you can follow the show of emotions is used here.

Quite...

Quite tactically, to encourage other users to to get involved and to continue to get involved and to show appreciation for what they've done.

Then we have cohesive responseS These are meant to build and maintain commitment.

We have a cohesive, cohesive descriptors, include salutation's vocatives, inclusive pronouns and phatics, which are defined as social pleasantries that enforce politeness and friendly friendliness.

So like welcome, etc..

So we have "The sooner we zooites tag, such images as poor quality image (or similar) the better." So this person is using the collective, the inclusive "we" to say, OK, we should all move and sort of collaborate by doing this so that we can actually improve the community and how the community works.

And then in the second one, "Hi.

Great to have you on board" So you have social pleasantry then and also evocative.

"Hi, REDACTED" will be the name of the person.

So this is another way of engaging directly with the with the person they're answering to.

Yeah.

Have fun and happy hunting.

So that's also within that same realm.

And again, hi and welcome to Asteroid to hi.

Obviously it's a salutation and evocative because I reported the name, but also welcome is a social pleasantry.

So you can see again how these exchanges help to build a sense of community and help to make newcomers feel welcome and engaged.

So the last one was interactive responses.

We're still within the social presence framework.

These include asking questions, continuing threads, referring or directly quoting other user's comments and forms of reinforcement, including praising agreeing with and acknowledging other people's ideas and contributions.

So you see this.

The first one isn't is a question.

What is this, a fuzzy galaxy? The second one, thanks for sharing.

Is a Beauty is also reinforcement and praising.

And the third one includes, I agree in this case, so acknowledging and agreeing with other people's contributions.

So you can see how these align.

But there are other engagement patterns in Zooniverse as well.

So you will see evaluating my supportive language as well, which is sort of linked to what I've been saying.

But others are and I sort of mentioned this as well, creative language use or lexical innovations and repurpose lexicon that become very unique and very tight with the community.

So we saw happy hunting and welcome to the zoo.

But there is also a selfie here, an example number two here, which is basically a repurposed word for obviously doesn't mean selfie in this case.

It's just the way that people use in that community to refer to animals approaching the research cameras in the wild.

So obviously, the cameras take a selfie.

There are mental discussions with people suggesting how the community can be improved and how the platform can be improved.

Other engagement terms includes specific signatures like this person saying happy hunting, for example, or people signing with their name at the end, acronyms and hashtag use, because obviously with acronyms, you need to be an insider to the community to understand what it means.

And users also identifying as nonexperts is a way of engaging with other users because they don't...nobody wants to come off as very like a big expert.

They will always try to hedge their responses to obviously not lose face, but also to try and sort of all be on the same level and appear on the same level.

So these patterns help to build and engaged and informed community and are consistently used by corpus participants.

And this really ties into that study that I mentioned earlier where they found the.

There's a small group of core participants who are actually the ones who complete the most tasks and not the most active.

So the last point I know that I've been talking for a while is it was about learning.

So learning is another thing that has been studied in a citizen science community communities.

And it's been often measured through surveys, interviews and quizzes.

However, this can be done in other ways and with linguistic analysis, so the first and more, I guess the more basic one is just by looking at how much question and answer dynamic is there.

There is so much knowledge, exchange and creation in these communities just because people ask questions and other people answer them.

So what is this, a fuzzy galaxy? We saw that from earlier.

I was wondering if this looks like a good candidate.

Well spotted.

It looks like a possible candidate to me, too.

And then it follows within with a further explanation, which is really helpful, because you can see that people are collaborating on these communities and they're trying to inform, inform others and exchange knowledge.

However, this is not the only way we can explore learning with corpus linguistic tools with my work.

I actually carried out a diachronic analysis, which means that I looked at those keywords, if you remember the top 250 from earlier and I analysed them over time.

I did this by splitting my corpus into four month chunks and then tracked usage over those four month chunks.

I tagged the usage, the meaning and trends across projects, so I found that keywords and key terms in the Zooniverse community, especially project specific terminology, are expressions often peak when that specific project that they belong to is first introduced, but then normalised frequencies of these occurrences drop in the subsequent quarter quarters or a four month chunks.

Since the use of this terminology often occurs in the context of questions and answers.

Again, knowledge exchange this drop in frequency suggests that the answers received managed to successfully produce learning, which explain why fewer and fewer users ask questions such as those relating to what is this bright object or what are these green lines over time.

So you will see that there is, again, some words, some expressions that belong to a specific projects that will tend to peak when the project starts, when the project is born, because obviously people are less knowledgeable about it and more people will ask, what is this? Then they receive the answer.

And over time you will see that that thing is not questioned anymore.

It is question very few times, and that's possibly because there's new people joining constantly.

But it tends to it tends to drop because and that sort of suggests learning.

I haven't looked to too much into this, but it will be interesting to potentially work with another person and from another field who can help me track this learning achievement.

So to conclude how linguistics can contribute to research in online citizen science communities, linguistics can be incorporated into other methods and disciplines, providing insights from natural language use and interactions.

It can do so by incorporating both quantitative and qualitative approaches, as we saw today, we looked at more quantitative data and we looked at concordance lines.

So going through the four points that I focussed on, we have success, this can be looked at in terms of corporate sizes, interaction, knowledge, knowledge, exchange, lexical usage and uniqueness of the text of the lexicon and the vocabulary from a project.

Motivations can be looked at through concordance lines specifically and especially introductions.

Engagement, there is there are so many linguistic tools that can do that.

And I showed you a few, but specifically I focussed a bit more on the social presence framework today.

And finally, learning can be studied through the question and answer dynamic on a more on a more basic level, but also through a diachronic trends in lexical usage and technical terminology.

So finally, interdisciplinary research on citizen science community needs further exploration and can inform practise such as the design and maintenance of citizen science communities and also linguistic analysis shows how the goals, the two goals of citizen science are achieved through language and interactions.

So we saw them at the beginning.

The secondary goal is learning.

So the idea of being able to democratise science and make it accessible to everyone, but also, of course, the primary goal, completing as many tasks as possible, possibly through engagement as well.

Thank you.

Fantastic, thank you very much, Claudia, that was really interesting, I'm quite impressed by your methodology and the rigour, and I see that your supervisor, Dr. Glenn Atkins, is applauding.

So I think everybody found that it was a very interesting presentation, certainly very rigorous, very scientific, and with a lot of interdisciplinarity.

You know, that's something that we love here in the research futures.

I know that we'll have a number of questions from the audience, but I have a first sort of a very candid question.

You know, I'm not the specialist at all, but I was wondering whether these communities HAVE you observe whether they produce language, do they? Because you've referred to a number of terms that they use.

Do they invent terms? Do they produce a language of their own use such an expression that they're going to use all the time and they are coded somehow? Yes.

So thank you for the great question, because this was one of my research questions as well.

This was one of my focuses for this work.

I didn't go too deep into this today because obviously I even went for a time.

But yes, definitely this was one of the aspects that I looked up because my understanding of the Zooniverse from the get go was the fact that people tended to group around some ideas and concepts and they tended to tended to want to use words that were created in the community or repurposed in the community because they felt like that it would be an expressions of.

Participation to mention them, basically.

So what happens is that you'll have a word like, for example, Zooite, which is the word that they created for the participant of Zooniverse.

So those Zooniverse volunteer.

Obviously, somebody coined the last word at some point and people started sort of grouping around it to say, oh, I'm going to use this because I identify myself with that.

So it's a way of showing how much of an insider they are.

And this is, of course, really, really interesting, because you can easily with this you can easily identify the most unique words that are very unique to this community and don't basically don't exist outside of this community and see how people use them, how people embrace them, how people reject them and say, oh, I actually want to use another one, or how people will sort of change the way they use them and their meaning over time as well.

Interesting Claudia we have a question from Olga.

What made you interested in this particular project? Did you take part in Zooniverse before your PhD? Were are you interested in science? Because you in a way, it's a bit far from linguistics.

So I was interested in science, but the reason why I worked with this data was because I was actually hired as a research assistant before I started my PhD.

And the work that we did with this research project, the language of Citizen Science Research Project, which Glenn was with as well, we we collected this data because we all each of us had our own sort of angles on what we wanted to study about citizen science.

And so we collected this data and obviously I wanted to do my PhD.

So I was you know, I came at a good time as well.

I had the data and I I wrote my proposal and it went well.

Yeah, I was interested in science, but I did not know very much about citizen science until it joined to be fair.

That's super interesting, says a lot about the merits of becoming a research assistant and building up your career, opening up your mind as well to this new disciplines.

A new question from Joanne.

How do you approach to groups you studied or citizen science in general to engage and enhance their interaction with the volunteers? Do you have a future project plan with them? Thank you, Joanne.

That's a really interesting question.

So shortly.

No, I have not approached the groups that I studied, but I have spoken to the people, PI and Zooniverse and the people who have basically worked together with the Zooniverse, the main Zooniverse team from the ��.

So we met when we were together.

We were working on the Languages Citizen Science Project.

We met with Jo Cox, who was the main author behind the success study that I mentioned.

And we also met with the PI of Zooniverse, who is based in Oxford University.

So it was it was good to get, first of all, to get their approval and using this data.

But also, yeah, it was it was good to get a better idea and a better perspective on the community that I was looking at.

In terms of future future plans, I suppose.

Yes the PI is Chris Lintott from Oxford University.

Thank you, Glenn.

I kind of want to bring this idea forward of integrating linguistics into other types of studies on citizen science, and I have been put in touch with a professor at the University of Exeter who wants to organise like a one day seminar or a one day workshop working with both language experts or linguists and scientists working with citizen science.

So that will be interesting, but I think it will take place in October.

But I am thinking of turning this specific presentation basically into into a potential paper.

So I'll also need to figure out how to go on with this one, with this.

Surely it's a pretty radical idea.

But a question for me, another one, if you don't mind, can you expand your methodology to other communities? I mean, citizen science is your first database, but do you think that you could expand that to, I don't know, other social significance movement? Yes.

So this is specifically especially looking at that social presence framework and the engagement side of things which you probably saw from my presentation is the one that I developed more because it was one of my main focuses in the in the research.

I definitely think the other similar types of communities can be analysed with this method and also other citizen science community.

Now, I mentioned before briefly, there are different types of citizen science communities and some of them are like games and even those ones the context will be a little bit different because there is a lot of competition as well.

Obviously, they're games, obviously, they also I'm not sure exactly how they work because I didn't take part in any of them.

But obviously they carry out classification and complete classifications and carry out tasks in the form they make them in the form of games.

So there is a lot of competition.

People don't tend to collaborate on those communities because obviously they're competing against each other in the game.

So maybe not those, but other citizen science communities.

Definitely.

I think I could I could apply the same methodology pretty much.

And definitely I also wanted one of my goals as well was also to sort of devise this methodology to make it applicable to other online communities that are similar in scope as this one.

So this can be applied to other e-learning communities like the unity of enquiry framework was, but also other I can think of, like, for example, Reddit forums or other forums where people have a specific task in mind and they have a specific goal for participating in the community.

So this this framework would work for that, but it wouldn't work for Facebook or Twitter, where engagement and interactions are often like one offs and not really they don't really occur with a specific goal in mind, so it would be interesting to look at, for example, I don't know, like maybe a Reddit forum for people who want to learn how to do woodworking or something.

I just I just want to say, I think in a random examples, but it has to have the task based and goal oriented goal to it, component to it.

That's really interesting.

Maybe the last question, unless someone else has a question, I suppose, that you worked on English and English language.

So so what do you think about the possible divide, if any, between the native speakers of the non-native speakers play role? Do you have many foreigners or non-native speakers in this community? Yes.

So I think by looking at some of the demographic information, most of them are of first language native speakers of English.

However, most of them are fantastic at speaking English anyway.

However, I think the person who would best be fit for answering this question would be my colleague.

Mario Saraceni was also based at �� University, who actually looks at how people introduce themselves in these communities by saying, I'm sorry for my English when their English is perfect.

And that's such an interesting thing because, again, they kind of don't want to lose face.

And in a way, I think it's also connected to my point about users wanting to come across as nonexperts because they don't want to lose face.

And they also kind of don't want to.

Yeah, they just want to appear as equals as others.

Whereas the other the people who who for whom English as a second language will often introduce themselves and say, sorry for my English when there's no need, absolutely no need, because then looking at the answers everyone's like your English is perfect.

There's no need to say that.

But yes, it's a very interesting phenomenon.

Yeah, he he's not here today, but he's written a paper on it, I think.

OK, so another thing to explore and maybe you you sent us the reference of the paper.

We have a last question.

Interesting question from Richard.

Fascinating.

Thank you very much.

Are they thought to be generalisable observation about what promotes success in learning communities for your undergraduate students who may not fully feel part of an academic community in current times, etc.? What do you think? I think I think because it was a bit hard for me to sort of identify the type of community that universities, because not many other people have done this linguistic analysis on similar communities.

The closest I came to was eLearning communities.

So which are learning communities.

And my community is kind of aligned with those as well by using this framework.

I also found similar insights and similar results and findings.

So I would say that generally most e-learning communities recognise the role of central users and how they can increase engagement, the roles of teachers and moderators as well, because they they definitely they can push and facilitate communication and meaningful communication as well.

Obviously, it's a little bit different because moderators don't have exactly the same roles as these learning communities.

Very often in learning communities, moderators will kind of have to gatekeeper conversation in some ways as well, like they have to take the conversation back to the main topic, which is something that it's a bit more free in Zooniverse people can can go off topic, basically.

But I think going back to yeah.

My answer is yeah.

I think other research in e-learning community has found similar results, that basically engagement is super, super important in making sure that users have easy access to learning resources, who they can ask for help and how they can basically catch up if they've had issues or they've fallen behind and or know that it's an open and collaborative community and they shouldn't be afraid to ask questions, basically.

Well, thank you so much Claudia.

Yeah, that was fascinating.

It's really interesting topic.

It seems extremely technical, but you've managed to present it in a really accessible manner.

So you know what I'm going to say? It requires a lot of work and it flowed very nicely.

So thank you so much.

Claudia that was really very interesting.

So as you know, everybody, the presentation is recorded is going to be online under the banner of Research Futures.

So you can see that again.

And I'm sure Cláudia will be happy to send you some links and references.

If you if you need, I'd like to conclude in thanking the team.

Olga, Gloria, Barnaby, and naturally He and Claudia for their support, and I'll see you next week for another webinar.

Thanks very much.

Thank you so, so much for joining us and for your questions.

Thank you.

Bye bye.

Corpus linguistics

We're looking at huge datasets of natural language – often many billions of words – to explore how language is used in different regions, genres and situations.

��������

This presentation explores the linguistic devices and communication strategies used by members of Citizen Science websites

Speaker’s Bio

Research Futures: How Linguistics can shed light on citizen science communities

Corpus linguistics

��