View on GitHub

LIN313

UT Austin LIN 313: Language and Computers

Fall 2023

Instructor: Prof. Kyle Mahowald (he/him)

Meeting times: Tues./Thurs. 12:30pm-2:00pm

Location: JGB 2.202

Office hours: Thurs. 2pm-3pm

TA: Will Sheffield

Will’s office hours: Wed. 3-4pm; Thurs. 10-12pm

Summary

In the past decades, the widening use of computers has had a profound influence on the way we communicate, search and store information. For the overwhelming majority of people and situations, the natural vehicle for such information is natural language. Whenever you use spellcheck or Grammarly, talk to Siri, do a Google search, interact with a robot voice on the phone, or are fed targeted ads, you are interacting with natural language technology.

Remarkably, even the language technology of 5 years ago is now far out of date, when you consider modern models like ChatGPT that process language in amazing ways. A major reason for this is the machine learning and big data revolutions, which have together made it possible for machines to learn using huge amounts of data.

Our goal will be to gain enough of an understanding of language technology to understand what these models are doing, what their capabilities are, how they work, and what their potential pitfalls and dangers are. In this course, you will be given insight into the fundamentals of how computers are used to represent, process and organize textual and spoken information, as well as tips on how to effectively integrate this knowledge into working practice. We will cover the theory and practice of human language technology. Topics include text encoding, search technology, tools for writing support, machine translation, dialog systems, computer aided language learning and the social context of language technology.

This course uses natural language systems to motivate students to exercise and develop a range of basic skills in formal and computational analysis. The course philosophy is to ground abstract concepts in real world examples. We introduce strings, regular expressions, as well as algorithms defined over these structures and techniques for probing and evaluating systems that rely on these algorithms. The course goes beyond merely subjective evaluation of systems, emphasizing analysis and reasoning to draw and argue for valid conclusions about the design, capabilities and behavior of natural language systems.

Evaluation will be based on the midterm, homeworks, participation, discussion questions, and the essay.

Quantitative Reasoning

This course carries the Quantitative Reasoning flag. Quantitative Reasoning courses are designed to equip you with skills that are necessary for understanding the types of quantitative arguments you will regularly encounter in your adult and professional life. You should therefore expect a substantial portion of your grade to come from your use of quantitative skills to analyze real-world problems.

Learning outcomes:

1) You will be able to explain what kinds of problems have to be solved in order for computers to do useful things with language. In order to train a computer to perform a human language task, we first have to understand something about how human language works and what kinds of problems need to be solved.

2) You will learn how to approach tasks from a computational perspective. This is not a programming class, but through class exercises and homework, you will gain experience thinking like a computer scientist and about how a computer scientist would go about solving problems in human language.

3) You will gain insight into the technology that underlies a wide variety of human language technologies, from spell checkers to search engines to dialogue systems.

4) You will gain experience thinking about data. Data science is one of the fastest growing fields for graduating college students. Throughout this class, we will think about the nature of experiments and data.

5) You will gain a better understanding of the ethical considerations around work in computational linguistics. We will spend time not just on the primary research, but on thinking about the higher-order ethical and practical considerations that impact computational linguistics as a field.

Readings

Readings will be drawn from:

Some material will draw on computational or statistical expertise to understand fully and may be difficult. We don’t expect you to have that expertise. So don’t be daunted: part of the point of this course is to talk through and get more familiar with these ideas and techniques.

Discussion posts

There will be discussion posts. Please respond on Canvas at least ONE HOUR before the beginning of class, on the date the post is due. I will read those and they will help guide discussion.

Grading

Activity Weight
4 Homeworks (10% each) 40%
Midterm 20%
Discussion questions on Canvas 10%
Class participation (participation in sections, polls, office hours) 10%
Final project 20%

Late policy: 10% off each day late, down to 50% off. Then you can hand it in later for 50% credit. 50% credit is better than nothing!

Schedule

What problem do we need to solve?

Date Topic Assignments
22-Aug What is human language and what makes human language unique? (in class: Transformer exercise) Chaser
24-Aug What is language for? (Communication Game for HW 1!) Optional: Sedivy 12.1 - 12.2 (through p. 528)
29-Aug Structure of language I Discussion Question Due; Sedivy 2.1-2.2
31-Aug Structure of language II Sedivy 2.3

Encoding language in machines

Date Topic Assignments
5-Sep Encoding Language for machines (demo: design an efficient code) HW 1: Communication Game Analysis Due
7-Sep Encoding language: Probability review, ngrams and wordpiece tokenization Slides from Dickinson; Readings: DBM Sections 1.1-1.3; Link to the interactive IPA chart; A tutorial on how to convert decimal numbers to binary, https://www.wired.com/story/why-unicode-keeps-adding-boring-emoji/
12-Sep How do human language encode information? Part I: Languages are efficient (in class: Shannon guessing game); Word piece tokenization
14-Sep Languages have words that are useful to their speakers (in class: color experiment) Discussion Q, Information Entropy Tutorial

Language models: Intro

Date Topic Assignments
19-Sep From Encoding to Machine Learning + Probability Review; Optional video: Review conditional probability;  
Optional video: Review Bayes Rule    
21-Sep Language Models HW 2 Encoding Due
26-Sep Language Models Continued Reading from Jurafsky & Martin: pp. 1-10 Optional: Finish Jurafsky Chapter. Note that it gets technical!

Machine Learning: Intro

Date Topic Assignments  
28-Sep Intro to Machine Learing: Text Classification + Naive Bayes Sentiment Analysis Optional video: Naive Bayes and Machine Learning; Optional video: Naive Bayes classifier  
3-Oct Naive Bayes in Action DBM 5.5 (Including Under the hood 9)  
5-Oct Introducing Final Project; Perceptron Sentiment Analysis Discussion Q on Reading: DBM 5.1-5.2  
10-Oct Midterm discussed, Perceptron continued, leading into word embeddings HW 3 LMs + Machine Learning Due; Optional: Slides from Dickinson pages 1-43.; Readings: DBM Section 2.3-2.4; Slides from Dickinson  
12-Oct More midterm practice; Word embeddings Part II Jurafsky & Martin: Vector Semantics pp. 1-11. Jurafsky & Martin: Vector Semantics pp. 14-27 NOTE: this gets a bit technical in the word2vec section, don’t worry about getting bogged down on that part  
17-Oct Midterm Review    

Midterm

Date Topic Assignments
19-Oct Midterm  

Modern Language Models

Date Topic Assignments
24-Oct Discuss Projects, Pre-trained language models Word2Vec
26-Oct Evaluating Pre-trained language models: GLUE and SuperGLUE Discussion Q Due
31-Oct Bias and language models  
2-Nov What’s happening now with models: RLHF, Chain of Thought  
7-Nov Brainstorming project ideas, Experimental Design HW 4 Due
9-Nov Society and Langauge Models 2 (with Zoom visitor Kevan Choset) Discussion Q Due
14-Nov Machine translation Discussion Q; Jurafsky & Martin on MT pp.1-10
16-Nov Summation, Supervised group work on Project  
21-Nov BREAK  
23-Nov BREAK  
28-Nov Final project presentations  
30-Nov Final project presentations Final Project Due on 9-Dec

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

FERPA and Class Recordings

Class recordings are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction by a student could lead to Student Misconduct proceedings.

Academic dishonesty policy

You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.

Sharing of Course Materials is Prohibited

No materials used in this class, including, but not limited to, lecture hand-outs, videos, assessments (quizzes, exams, papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have my explicit, written permission. Unauthorized sharing of materials promotes cheating. It is a violation of the University’s Student Honor Code and an act of academic dishonesty. I am well aware of the sites used for sharing materials, and any materials found online that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.

Student Support Services

Services for Students with Disabilities

The university is committed to creating an accessible and inclusive learning environment consistent with university policy and federal and state law. Please let me know if you experience any barriers to learning so I can work with you to ensure you have equal opportunity to participate fully in this course. If you are a student with a disability, or think you may have a disability, and need accommodations please contact Services for Students with Disabilities (SSD). Please refer to SSD’s website for contact and more information: http://diversity.utexas.edu/disability/. If you are already registered with SSD, please deliver your Accommodation Letter to me as early as possible in the semester so we can discuss your approved accommodations and needs in this course.

Counseling and Mental Health Center

The Counseling and Mental Health Center serves UT’s diverse campus community by providing high quality, innovative and culturally informed mental health programs and services that enhance and support students’ well- being, academic and life goals. To learn more about your counseling and mental health options, call CMHC at (512) 471-3515.

If you are experiencing a mental health crisis, call the CMHC Crisis Line 24/7 at (512) 471-2255.

The Sanger Learning Center

Did you know that more than one-third of UT undergraduate students use the Sanger Learning Centereach year to improve their academic performance? All students are welcome totake advantage of Sanger Center’s classes and workshops, private learning specialist appointments, peer academic coaching,and tutoring for more than 70 courses in 15 different subject areas. For more information, please visit http://www.utexas.edu/ugs/slcor call 512-471-3614 (JES A332).

Undergraduate Writing Center: http://uwc.utexas.edu/

Libraries: http://www.lib.utexas.edu/

ITS: http://www.utexas.edu/its/Student

Emergency Services: http://deanofstudents.utexas.edu/emergency/

University Policies

Academic Integrity

Each student in the course is expected to abide by the University of Texas Honor Code: “As a student of The University of Texas at Austin, I shall abide by the core values of the University and uphold academic integrity.”Plagiarism is taken very seriouslyat UT.Therefore, if you use words or ideas that are not your own (or that you have used in previous class), you must cite your sources. Otherwise you will be guilty of plagiarism and subject to academic disciplinary action, including failure of the course. You are responsible for understanding UT’s Academic Honesty and the University Honor Code which can be found at the following web address: https://deanofstudents.utexas.edu/conduct/standardsofconduct.php