UT Austin LIN 313: Language and Computers
Fall 2023
Instructor: Prof. Kyle Mahowald (he/him)
Meeting times: Tues./Thurs. 12:30pm-2:00pm
Location: JGB 2.202
Office hours: Thurs. 2pm-3pm
TA: Will Sheffield
Will’s office hours: Wed. 3-4pm; Thurs. 10-12pm
Summary
In the past decades, the widening use of computers has had a profound influence on the way we communicate, search and store information. For the overwhelming majority of people and situations, the natural vehicle for such information is natural language. Whenever you use spellcheck or Grammarly, talk to Siri, do a Google search, interact with a robot voice on the phone, or are fed targeted ads, you are interacting with natural language technology.
Remarkably, even the language technology of 5 years ago is now far out of date, when you consider modern models like ChatGPT that process language in amazing ways. A major reason for this is the machine learning and big data revolutions, which have together made it possible for machines to learn using huge amounts of data.
Our goal will be to gain enough of an understanding of language technology to understand what these models are doing, what their capabilities are, how they work, and what their potential pitfalls and dangers are. In this course, you will be given insight into the fundamentals of how computers are used to represent, process and organize textual and spoken information, as well as tips on how to effectively integrate this knowledge into working practice. We will cover the theory and practice of human language technology. Topics include text encoding, search technology, tools for writing support, machine translation, dialog systems, computer aided language learning and the social context of language technology.
This course uses natural language systems to motivate students to exercise and develop a range of basic skills in formal and computational analysis. The course philosophy is to ground abstract concepts in real world examples. We introduce strings, regular expressions, as well as algorithms defined over these structures and techniques for probing and evaluating systems that rely on these algorithms. The course goes beyond merely subjective evaluation of systems, emphasizing analysis and reasoning to draw and argue for valid conclusions about the design, capabilities and behavior of natural language systems.
Evaluation will be based on the midterm, homeworks, participation, discussion questions, and the essay.
Quantitative Reasoning
This course carries the Quantitative Reasoning flag. Quantitative Reasoning courses are designed to equip you with skills that are necessary for understanding the types of quantitative arguments you will regularly encounter in your adult and professional life. You should therefore expect a substantial portion of your grade to come from your use of quantitative skills to analyze real-world problems.
Learning outcomes:
1) You will be able to explain what kinds of problems have to be solved in order for computers to do useful things with language. In order to train a computer to perform a human language task, we first have to understand something about how human language works and what kinds of problems need to be solved.
2) You will learn how to approach tasks from a computational perspective. This is not a programming class, but through class exercises and homework, you will gain experience thinking like a computer scientist and about how a computer scientist would go about solving problems in human language.
3) You will gain insight into the technology that underlies a wide variety of human language technologies, from spell checkers to search engines to dialogue systems.
4) You will gain experience thinking about data. Data science is one of the fastest growing fields for graduating college students. Throughout this class, we will think about the nature of experiments and data.
5) You will gain a better understanding of the ethical considerations around work in computational linguistics. We will spend time not just on the primary research, but on thinking about the higher-order ethical and practical considerations that impact computational linguistics as a field.
Readings
Readings will be drawn from:
- Markus Dickinson, Chris Brew, and Detmar Meurers: Language and Computers Wiley-Blackwell (2013). DBM below.
- Sedivy’s Language in Mind
- Jurafsky & Martin
- Other sources
Some material will draw on computational or statistical expertise to understand fully and may be difficult. We don’t expect you to have that expertise. So don’t be daunted: part of the point of this course is to talk through and get more familiar with these ideas and techniques.
Discussion posts
There will be discussion posts. Please respond on Canvas at least ONE HOUR before the beginning of class, on the date the post is due. I will read those and they will help guide discussion.
Grading
Activity | Weight |
---|---|
4 Homeworks (10% each) | 40% |
Midterm | 20% |
Discussion questions on Canvas | 10% |
Class participation (participation in sections, polls, office hours) | 10% |
Final project | 20% |
Late policy: 10% off each day late, down to 50% off. Then you can hand it in later for 50% credit. 50% credit is better than nothing!
Schedule
What problem do we need to solve?
Date | Topic | Assignments |
---|---|---|
22-Aug | What is human language and what makes human language unique? (in class: Transformer exercise) | Chaser |
24-Aug | What is language for? (Communication Game for HW 1!) | Optional: Sedivy 12.1 - 12.2 (through p. 528) |
29-Aug | Structure of language I | Discussion Question Due; Sedivy 2.1-2.2 |
31-Aug | Structure of language II | Sedivy 2.3 |
Encoding language in machines
Date | Topic | Assignments |
---|---|---|
5-Sep | Encoding Language for machines (demo: design an efficient code) | HW 1: Communication Game Analysis Due |
7-Sep | Encoding language: Probability review, ngrams and wordpiece tokenization | Slides from Dickinson; Readings: DBM Sections 1.1-1.3; Link to the interactive IPA chart; A tutorial on how to convert decimal numbers to binary, https://www.wired.com/story/why-unicode-keeps-adding-boring-emoji/ |
12-Sep | How do human language encode information? Part I: Languages are efficient (in class: Shannon guessing game); | Word piece tokenization |
14-Sep | Languages have words that are useful to their speakers (in class: color experiment) | Discussion Q, Information Entropy Tutorial |
Language models: Intro
Date | Topic | Assignments |
---|---|---|
19-Sep | From Encoding to Machine Learning + Probability Review; Optional video: Review conditional probability; | |
Optional video: Review Bayes Rule | ||
21-Sep | Language Models | HW 2 Encoding Due |
26-Sep | Language Models Continued | Reading from Jurafsky & Martin: pp. 1-10 Optional: Finish Jurafsky Chapter. Note that it gets technical! |
Machine Learning: Intro
Date | Topic | Assignments | |
---|---|---|---|
28-Sep | Intro to Machine Learing: Text Classification + Naive Bayes Sentiment Analysis | Optional video: Naive Bayes and Machine Learning; Optional video: Naive Bayes classifier | |
3-Oct | Naive Bayes in Action | DBM 5.5 (Including Under the hood 9) | |
5-Oct | Introducing Final Project; Perceptron Sentiment Analysis | Discussion Q on Reading: DBM 5.1-5.2 | |
10-Oct | Midterm discussed, Perceptron continued, leading into word embeddings | HW 3 LMs + Machine Learning Due; Optional: Slides from Dickinson pages 1-43.; Readings: DBM Section 2.3-2.4; Slides from Dickinson | |
12-Oct | More midterm practice; Word embeddings Part II | Jurafsky & Martin: Vector Semantics pp. 1-11. Jurafsky & Martin: Vector Semantics pp. 14-27 NOTE: this gets a bit technical in the word2vec section, don’t worry about getting bogged down on that part | |
17-Oct | Midterm Review |
Midterm
Date | Topic | Assignments |
---|---|---|
19-Oct | Midterm |
Modern Language Models
Date | Topic | Assignments |
---|---|---|
24-Oct | Discuss Projects, Pre-trained language models | Word2Vec |
26-Oct | Evaluating Pre-trained language models: GLUE and SuperGLUE | Discussion Q Due |
31-Oct | Bias and language models | |
2-Nov | What’s happening now with models: RLHF, Chain of Thought | |
7-Nov | Brainstorming project ideas, Experimental Design | HW 4 Due |
9-Nov | Society and Langauge Models 2 (with Zoom visitor Kevan Choset) | Discussion Q Due |
14-Nov | Machine translation | Discussion Q; Jurafsky & Martin on MT pp.1-10 |
16-Nov | Summation, Supervised group work on Project | |
21-Nov | BREAK | |
23-Nov | BREAK | |
28-Nov | Final project presentations | |
30-Nov | Final project presentations | Final Project Due on 9-Dec |
Notice about missed work due to religious holy days
A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.
FERPA and Class Recordings
Class recordings are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction by a student could lead to Student Misconduct proceedings.
Academic dishonesty policy
You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.
Sharing of Course Materials is Prohibited
No materials used in this class, including, but not limited to, lecture hand-outs, videos, assessments (quizzes, exams, papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have my explicit, written permission. Unauthorized sharing of materials promotes cheating. It is a violation of the University’s Student Honor Code and an act of academic dishonesty. I am well aware of the sites used for sharing materials, and any materials found online that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.
Student Support Services
Services for Students with Disabilities
The university is committed to creating an accessible and inclusive learning environment consistent with university policy and federal and state law. Please let me know if you experience any barriers to learning so I can work with you to ensure you have equal opportunity to participate fully in this course. If you are a student with a disability, or think you may have a disability, and need accommodations please contact Services for Students with Disabilities (SSD). Please refer to SSD’s website for contact and more information: http://diversity.utexas.edu/disability/. If you are already registered with SSD, please deliver your Accommodation Letter to me as early as possible in the semester so we can discuss your approved accommodations and needs in this course.
Counseling and Mental Health Center
The Counseling and Mental Health Center serves UT’s diverse campus community by providing high quality, innovative and culturally informed mental health programs and services that enhance and support students’ well- being, academic and life goals. To learn more about your counseling and mental health options, call CMHC at (512) 471-3515.
If you are experiencing a mental health crisis, call the CMHC Crisis Line 24/7 at (512) 471-2255.
The Sanger Learning Center
Did you know that more than one-third of UT undergraduate students use the Sanger Learning Centereach year to improve their academic performance? All students are welcome totake advantage of Sanger Center’s classes and workshops, private learning specialist appointments, peer academic coaching,and tutoring for more than 70 courses in 15 different subject areas. For more information, please visit http://www.utexas.edu/ugs/slcor call 512-471-3614 (JES A332).
Undergraduate Writing Center: http://uwc.utexas.edu/
Libraries: http://www.lib.utexas.edu/
ITS: http://www.utexas.edu/its/Student
Emergency Services: http://deanofstudents.utexas.edu/emergency/
University Policies
Academic Integrity
Each student in the course is expected to abide by the University of Texas Honor Code: “As a student of The University of Texas at Austin, I shall abide by the core values of the University and uphold academic integrity.”Plagiarism is taken very seriouslyat UT.Therefore, if you use words or ideas that are not your own (or that you have used in previous class), you must cite your sources. Otherwise you will be guilty of plagiarism and subject to academic disciplinary action, including failure of the course. You are responsible for understanding UT’s Academic Honesty and the University Honor Code which can be found at the following web address: https://deanofstudents.utexas.edu/conduct/standardsofconduct.php