The Peanuts Comics Transcription Project

Comic strips are not generally analyzed because of the difficulty of converting handwritten images into a digital version. Perhaps one of the most difficult comic strips to analyze is Peanuts, an American comic strip written and illustrated by Charles M. Schulz from 1950 to 2000. The comic strip has about 18,000 unique comics, which makes it the longest ongoing comic strip in history. ODH has worked with Amy Johnson (a linguistics graduate student and the daughter of Charles Schulz) to create a corpus of transcribed Peanuts comic strips.

Although there are many pieces of information that could be transcribed in such a corpus, the main focus of the project included transcribing speech bubbles, thought bubbles, and character actions within each panel of a comic strip. To do this, 30 students in ODH transcribed comics for about six months to complete the project.

Sample Peanuts Transcription
An example of a transcribed comic strip.

Each line represents a single panel. A forward slash separates each instance of speech, thought, or action. After the data for each strip were compiled and cleaned up, the transcription was converted into formats (such as CSV, Excel, JSON, and WordCruncher) for further research.

The corpus has been added to WordCruncher, the text analysis software developed within BYU’s Office of Digital Humanities, for further research. These are a few areas of interest:

  1. Trends in character usage throughout time (e.g., When is a new character introduced or removed?)
  2. Common interactions between characters (e.g., How much time does Snoopy spend with Charlie Brown?)
  3. Keywords and key phrases for each character (e.g., What makes Charlie Brown’s speech unique from other characters?)

As with any crowd-sourced project, we recognize that the data are neither perfect nor complete. We wrote Python scripts to detect common errors, and then we performed manual corrections. Other errors, such as mislabeling characters or tagging actions as speech, are likely still present in the corpus and will require further cleanup. The corpus is also limited to the amount of description for each panel–it would be nice to add descriptions of the weather, surroundings, and clothing of characters. For now, we are satisfied with the current results of the project and look forward to continuing our research on the Peanuts comics.

An example search for the word rats, a common utterance in Peanuts.