CSE 703 - Applied Natural Language Processing and Computational Social Science

Details

Overview

This seminar course will focus on giving students a broad understanding of state-of-the-art methods in NLP and how they can be applied to address questions in the social sciences and/or humanities. Topics will also include other relevant areas of computational social science, broadly construed, including research on ethics, fairness, and power in applications of machine learning.

Schedule

Week Date Theme Papers Slack Channel
1 August 30 Introduction - NLP&CSS   #intro
2 September 6 Examples 1 - Rajadesingan et al. (2019). Smart, Responsible, and Upper Caste Only: Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles
- Antoniak et al. (2019). Narrative Paths and Negotiation of Power in Birth Stories
#css_examples_1
3 September 13 Examples 2 - Green et al. (2022). Online engagement with 2020 election misinformation and turnout in the 2021 Georgia runoff election
- Card et al. (2022). Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration
#css_examples_2
4 September 20 Data Feminism - D’Ignazio and Klein (2020) Data Feminism: Intro
- Suresh et al. (2022) Towards Intersectional Feminist and Participatory ML: A Case Study in Supporting Feminicide Counterdata Collection
#data_feminism
5 September 27 Data Collection - Ernala et al. (2019) Methodological gaps in predicting mental health states from social media: Triangulating diagnostic signals
- Bozarth and Budak (2022) Keyword expansion techniques for mining social movement data on social media
#data_collection
6 October 4 Class Cancelled    
7 October 11 Data Quality - Geiger et al. (2020). Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?
- Bradley et al. (2022). Unrepresentative big surveys significantly overestimated US vaccine uptake
#data_quality
8 October 18 Modern NLP 1 - Smith. (2019). Contextual Word Representations: A Contextual Introduction
- Levy et al. (2015). Improving distributional similarity with lessons learned from word embeddings
#modernnlp_1
9 October 25 Modern NLP 2 - Devlin et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Reimers and Gurevych (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
#modernnlp_2
10 November 1 Stance - Sen et al. (2020). On the Reliability and Validity of Detecting Approval of Political Actors in Tweets
- Rashed et al. (2021). Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey
#stance
11 November 8 Bias in NLP 2 - Blodgett et al. (2020). Language (Technology) is Power: A Critical Survey of” Bias” in NLP. - De-Arteaga et al. (2019). Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
#bias2
12 November 15 Bias - Obermeyer et al. (2019). Dissecting racial bias in an algorithm used to manage the health of populations
- Stapleton et al. (2022). Imagining new futures beyond predictive systems in child welfare: A qualitative study with impacted stakeholders
#big_bias
13 November 22 Syncing w/ Qualitative Research - Eads et al. (2020). Separating the wheat from the chaff: A topic- and keyword-based procedure for identifying research-relevant text.
- Patton et al. (2020). Contextual Analysis of Social Media: The Promise and Challenge of Eliciting Context in Social Media Posts with Natural Language Processing
#qualitative
14 November 29 TBD (class decision)    
15 December 6 Class Recap and Activity    

Prerequisites

It is assumed that students have some background in machine learning. Also, although not required, this course will likely be most interesting to those with an interest in the social sciences and/or humanities.

Course Credits

You can take this seminar for 1, 2, or 3 credits. For one credit, your responsibility is to read papers and write critiques on the course slack channel. For two credits, you will also need to present a paper to the class. For three credits, you will also organize an activity for the class, constructing a jupyter notebook that you lead us through using an interesting method relevant to the course.

Grading

The class will be (and is required to be) graded pass/fail. There are three course components: readings, presentations, and activities.

All students must adhere to the attendance policy as well.

Attendance Policy

Missing more than three scheduled course meetings will result in a failing grade. The only condition under which the student can still pass the course if they miss more than three scheduled meetings is if all but one absence is documented with the instructor and due to the following reasons (from UB’s official policy):

Consistently being more than 5 minutes late to class will result in me counting you as absent. Specifically, every three times that you are more than 5 minutes late will be counted as a missed class. There are no exceptions to this policy.

Please note that I do not need a reason for missing class if you miss three or less. That is, as long as you don’t miss more than three classes, you can be absent for any reason. Note, however, that you will not receive credit for attendance on those days unless documentation is provided for one of the reasons above.

Readings

Each week, readings are graded on a pass/fail basis. You must complete all of the following to pass:

There are 13 weeks of reading assignments. You must have a passing grade for at least 10 weeks. Note then that if you don’t plan on attending, you shouldn’t feel obligated to do the readings for a given week, and vice versa.

In addition, if you take the course for more than one credit, you will present a paper to the class. These presentations will be graded on a pass/fail basis which requires that you sufficiently complete the following tsks:

For students taking the course for three credits, a final pass/fail component of the course is the partner-planned activity, see activity.

Reading Responses on Slack

All responses to readings must be put onto Slack by Sunday at 11:59PM, onto the slack channel for that week’s readings. Responses submitted after this time or to a different channel will not be accepted.

Responses to readings should have two paragraphs Before these two paragraphs, state the paper you are responding to in bold. See the #intro channel for example responses. The first paragraph might choose to address some of the following questions, but should not feel inclined to simply answer only or all these questions:

The second paragraph might consider some of the following questions:

This second paragraph will serve as the basis for discussion during the class.

Comments on Responses on Slack

All comments must be put onto Slack by Monday at 11:59PM, replying to another response or comment from someone else on the slack channel for that week’s readings. Comments submitted after this time or to a different channel will not be accepted.

Further, all comments should be civil. Verbal abuse or intimidation, or attempts to make others “feel dumb” or that are offensive in any way will not be tolerated. If you believe your comment might fall into this space, please me an email before posting it so we can discuss this. Any violation of this practice will be treated as a violation of the university’s policy for Academic Integrity

Class participation

Students are expected to actively participate in discussion of all papers for which they give a response. Students should expect to be called upon to discuss the responses they give on Slack during class and to address other potential discussion topics.

Paper presentations

Paper presentations are expected to be 12 minutes long (give or take a minute or two). Slides are not due beforehand, but I am happy to provide feedback if the slides are sent before Tuesday. I will also ask that slides are sent to me after the presentation in PDF format in order to post them to the website.

The presentation should cover the following, although you may choose to spend more or less time on particular parts (i.e. you may want to spend a lot of time on the methodology if its a methods paper, or the interesting results if it is an application paper):

Additionally, the first slide should contain:

Activity

Students taking the course for three credits will, in teams of two, organize an activity for the rest of the class at some point during the semester. The activity should take roughly 10 minutes to explain, and engage your classmates for roughly 20-30 minutes as they work through your activity.

Your idea for an activity must be approved by Kenny before you conduct it. The activity will likely take the form of your team developing a notebook (either in CoLab or distributed to the class) that leverages a technique relevant to the seminar and shows how it can be applied in some interesting fashion. Potential tools that might be demoed during an activity include:

However, I am wide open to ideas, so we can discuss!

In progress list of miscellaneous related resources

Books:

Resource Lists: