How to Write Up a Ph.D. Dissertation (for computer scientists and the like) by Jason Eisner (2006) This page is about how to turn your research (once it's done) into a readable multi-chapter document. You need to figure out what to include, how to organize it, and how to present it. Following this advice will make me happier about reading your submitted or draft dissertation. You may find it useful even if I'm not going to read your dissertation. Many other people have written usefully on this subject, including someone in the Annals of Improbable Research. This page focuses on what a finished dissertation should look like. You could also skim good dissertations on the web. _________________________________________________________________ * What to include: + What Goes Into a Dissertation? + Know Your Audience + Planning Your Dissertation * How to organize it: + High-Level Organization * How to present it: + Make Things Easy on Your Poor Readers + Mechanics _________________________________________________________________ What Goes Into a Dissertation? A typical thesis will motivate why a new idea is needed, present the cool new idea, convince the reader that it's cool and new and might apply to the reader's own problems, and evaluate how well it worked. Just like a paper! The result must be a substantial, original contribution to scientific knowledge. It signals your official entrance into the community of scholars. Treat it as an chance to make a mark, not as a 900-page-tall memorial to your graduate student life. Beyond stapling The cynical view is that if you've written several related papers, you staple them together to get a dissertation. That's a good first-order approximation -- you should incorporate ideas and text from your papers. But what is it missing? First, a thesis should cohere -- ideally, it should feel like one long paper. Second, it should provide added value: there should be people who would prefer reading it to simply reading your papers. Otherwise writing it would be a meaningless exercise. Here's what to do after stapling: * integrate the pieces + craft a substantial introductory chapter that ties the work together somehow and highlights the novel contributions + reorganize the remaining presentation as supporting and developing your story from the introduction + write a (brief) concluding chapter that recapitulates your story and summarizes what was learned + make the notation, terminology, and style consistent throughout + do keep good ideas, text, and results from your previous papers (giving credit to any co-authors) * expand the text + make the text clearer, more tutorial, and more thoughtful + add more examples and intuitions to help the reader + add new experiments/theorems/significance tests to leave no stone unturned + consider counterarguments, variations, and alternative explanations + give enough details to allow a reader to replicate the work or apply it in new settings * contextualize the ideas + mention all obviously related work and explain how it relates to yours + discuss alternative solutions that you rejected or are leaving to future work + point out connections to other areas, including other possible applications of your ideas + describe possible generalizations (and try them if possible) + lay out future work for yourself or others * acknowledge help (usually in a preface) + acknowledge any collaborators on this work, such as your advisor + acknowledge financial support on this work, and perhaps also other financial support you've received as a grad student + thank other people who have helped you technically, administratively, socially, or emotionally over your grad student career + state which parts of the thesis text (if any) have appeared in your previous publications; get permission to republish if you are no longer the copyright holder of those works, or if you had co-authors Taking Responsibility Don't expect your advisor to be your co-author. It's your Ph.D.: you are sole author this time and the responsibility is on your shoulders. If your prose is turgid or thoughtless, misspelled or ungrammatical, oblivious or rude to related research, you're the one who looks bad. You can do it! Your advisor and committee are basically on your side -- they're probably willing to make suggestions about content and style -- but they are not obligated to fix problems for you. They may send your dissertation back and tell you to fix it. In the following sections, I'll start with advice about the thesis as a whole, and work downward, eventually reaching small details such as typography and citations. _________________________________________________________________ Know Your Audience First, choose your target audience. That crucial early decision will tell you what to explain, what to emphasize, and how to phrase and organize it. Checking it with your advisor might be wise. Pretty much everything in your thesis should be relevant to your chosen audience. Think about them as you write. Ask yourself: What does your audience already know? A computer science thesis can freely invoke basic ideas like hash tables and computational complexity without defining or even citing them. (After all, do biologists read a computer science thesis? Not unless they are pretty comfortable with computer science.) You can also safely assume that your readers have some prior familiarity with your research area. Just how much familiarity, and with which topics, is a judgment call -- again, you have to decide who your intended audience is. In practice, your audience will be somewhat mixed. Up to a point, it is possible to please both beginners and experts -- by covering background material crisply and in the service of your own story. How does that work? As you lay out the motivation for your own work, and provide notation, you'll naturally have to discuss background concepts and related work. But don't give a generic discussion! Make it an integral part of your motivation and argument. Present your detailed perspective on the intellectual landscape and where your own work sits in it -- a fresh (even opinionated) take that keeps tying back to your main themes and will be useful for both experts and beginners. Thus, be as considerate as you can to beginners without interrupting the flow of your main argument to your established colleagues. A good rule of thumb is to write at the level of the most accessible papers in the journals or conference proceedings that you read. What do you want your audience to learn from the thesis? You should set clear goals here. Just like a paper or a talk, your dissertation needs a point: it should tell a story. Writing the abstract and introductory chapter first will help you work out what that story is. You may find that you have to do further work to really support your chosen story: more experiments, more theorems, reading more literature, etc. What does your audience hope to get out of the thesis? Why does anyone crack open a dissertation, anyway? I sometimes do. Especially for areas that I know less well, a dissertation is often more accessible than shorter, denser papers. It takes a more leisurely pace, provides more explicit motivation and background, and answers more of the questions that I might have. There are other reasons I might look at your dissertation: * To better understand your cool idea: your grand vision, how you think about it, and what you did. * To look up details, clarification, or further results after reading a shorter version of your work. * To get a sense of what your field or area is all about. * To read a thorough summary of work in your area, via your literature review. * To describe your work accurately in a paper I'm writing. * To check whether a paper I'm reviewing should have cited you. * To decide whether to give you a Ph.D. :-) * To help me write a recommendation or promotion letter. Readers with different motivations may read your thesis in different ways. The strong convention is that it's a single document that must read well from start to finish -- your committee will read it that way. But it's worth keeping other readers in mind, too. Some will skim from start to finish. Some will read only the introduction and conclusion. Some will read a single chapter in the middle, going back for definitions as needed. Some will scan for what they need: a definition, example, table of results, or literature review. Some will flip through to get a general sense of your work or of how you think, reading whatever catches their eye. _________________________________________________________________ High-Level Organization Once you've chosen your target audience, you should outline the structure of the thesis. Again, the convention is that the document must read well from start to finish. The "canonical organization" is sketched by Douglas Comer near the end of his advice. You'll probably want something like it. A few further tips: Keep your focus Keep your focus. Length is not a virtue unless the content is actually interesting. You do have as much space as you need, but the reader doesn't have unlimited time and neither do you. Use space as needed for clarity and to flesh out and support your story. If you feel like your thesis is too short, it may need more ideas or thoughtful discussion or experiments (talk to your advisor), but it doesn't need more padding. Get to the good stuff A newspaper, like a dissertation, is a hefty chunk of reading. So it puts the most important news on page one, and leads each article with the most important part. You should try to do the same when reasonable. Get to the interesting ideas as soon as possible. A good strategy is to make Chapter 1 an overview of your main arguments and findings. Tell your story there in a compelling way, including a taste of your results. Refer the reader to specific sections in later chapters for the pesky details. Chapter 1 should be especially accessible (use examples): make it the one chapter that everyone should read. The same strategy works within a chapter. Start by telling your readers what the chapter is about and why they should read it. Then unfold your ideas and results. The order of your presentation should be natural and logical (e.g., motivation before experimental design before results), but try to keep the reader turning pages; seek reasonable ways to move the boring bits to later sections or later chapters. Include a road map Chapter 1 traditionally ends with a "road map" to the rest of the thesis, which rapidly summarizes what the remaining chapters or sections will contain. That's useful guidance for readers who are looking for something specific and also for those who will read the whole thesis. It also exhibits in one place how much work you've done. Here's a detailed example. Where to put the literature review I recommend against writing "Chapter 2: Literature Review." Such chapters are usually boring: they're plonked down like the author's obligatory list of what he or she was "supposed" to cite. They block the reader from getting to the new ideas, and can't even be contrasted with the new ideas because those haven't been presented yet. A better plan is to discuss related literature in conjunction with your own ideas. As you motivate and present your ideas, you'll want to refer to some related work anyway. Related work that didn't meld naturally into that presentation can be acknowledged soon afterwards in its own section -- where you should still focus on how it relates to your ideas and fits into your framework, which you have already presented. Each chapter might have its own related work section or sections, covering work that connects to yours in different ways. Where to define terminology and notation Basic terminology, concepts, and notation have to be defined somewhere. But where? You can mix the following strategies: Retail. You can define some terms or notation individually, when the reader first needs them. Then they will be well-motivated and fresh in the reader's mind. If you use them again later, you can refer back to the section where you first defined them. Wholesale. On the other hand, there are advantages to aggregating some of your fundamental definitions into a "Definitions" section near the start of the chapter, or a chapter near the start of the dissertation: * Sits readers down and gets them oriented all at once. * Makes the definitions harder to overlook. * Highlights how the definitions are related to one another. * Gets the definitions out of the way, so they don't have to interrupt the flow of your argument later. * Gives readers a place to check if they forget what you meant by hairy_variable_name or the "bumptiousness" of a model. (An alternative is to include a summary of notation and a glossary at the back of the dissertation, and advertise their existence.) The downside is that such sections or chapters can seem boring and full of not-yet-motivated concepts. Unless your definitions are novel and interesting in themselves, they block the reader from getting to the new and interesting ideas. So if you write something like "Chapter 2: Preliminaries," keep it relatively concise -- the point is to get the reader oriented. Thrift shop. Use well-known notation and terminology whenever you can, either with or without a formal definition in your thesis. The point of your thesis is not to re-invent notation or to re-present well-known material, although sometimes you may find it helpful to do so. _________________________________________________________________ Make Things Easy on Your Poor Readers Now we get down to the actual writing. A dissertation is a lot to write. But it's also an awful lot to read and digest at once! You can keep us readers turning pages and following your argument. But it's a bigger and more complicated argument than usual, so you have to be more disciplined than usual. Break it down Long swaths of text are like quicksand for readers (and writers!). To keep us moving without sinking, use all the devices at your disposal to break the text down into short chunks. Ironically, short chunks are more helpful in a longer document, both to keep your argument tightly organized and to keep the reader focused and oriented. If a section or subsection is longer than 1 double-spaced page, consider whether you could break it down further. I'm not joking! This 1-page threshold may seem surprisingly short, but it really makes writing and reading easier. Some devices you can use: subsectioning Split your section into subsections (or subsubsections) with meaningful titles that keep the reader oriented. lists If you're writing a paragraph and feel like you're listing anything (e.g., advantages or disadvantages of some approach), use an explicit bulleted list. Sometimes this might yield a list with only 2 or 3 rather long bullet points, but that's fine -- it breaks things down. (Note: To replace the bullets with short labels, roughly as in the list you're now reading, LaTeX's itemize environment lets you write \item[my label].) labeled paragraphs Label a series of paragraphs within the section, as a kind of lightweight subsectioning. Your experimental design section might look like this (using the LaTeX \paragraph command): Participants. The participants were 32 undergraduates enrolled in ... Apparatus. Each participant wore a Star Trek suit equipped with a Hasbro-brand Galactic Translator, belt model 3A ... Procedure. The subjects were seated in pairs throughout the laboratory and subjected to Vogon poetry broadcast at 3-minute intervals ... Dataset. The Vogon poetry corpus (available on request) was obtained by passing the later works of T. S. Eliot through the Systran translation system ... footnotes Move inessential points to footnotes. If they're too long for that, you could move them into appendices or chapters near the end of the thesis. captions Move some discussion of figures and tables into their captions. A helpful caption provides guidance on how to interpret the figure or table and what interesting conclusions to draw from it. (In LaTeX, you can write \caption[short version]{long version}. The optional short version argument will be used for the "List of Tables" or "List of Figures" at the start of the thesis.) theorems Even simple formal results can be stated as a theorem or lemma. The theorem (and proof, if included) form a nice little chunk, using the LaTeX theorem enviroment. Breaking down equations Long blocks of equations are even more intimidating than long swaths of text. You can break those apart, too: * Intersperse short bits of text for guidance. You might introduce line 3 of your formula with A change of variable from x to log x now allows us to integrate by parts: * Distinguish conceptually important steps from finicky steps that just push symbols around. You can even move finicky steps to a footnote, like this: Some algebraic manipulation^5 allows us to simplify to the following: * Use visual devices like boldface, underlining, or \underbrace to call attention to significant parts of a formula: [hairyformula.png] * Simplify the formulas in the first place by defining intermediate quantities or adopting notational conventions (e.g., "the t subscript will be dropped when it is clear from context"). Now tie it back together Now that you've chopped your prose into bite-sized chunks, what binds it together? Coherent and explicit structure Your paragraphs and chunks have to tie together into a coherent argument. Do everything you can to highlight the structure of this argument. The structure should jump out at the reader, making it possible to read straight through your text, or skim it. Else the reader will get stuck puzzling out what you meant and lose momentum. Make sure your readers are never perplexed about the point of the paragraph they're reading. Make them want to keep turning the page because you've set up questions to which they want to know the answers. Don't make them rub their eyes in frustration or boredom and wander off to the fridge or the web browser. So how exactly do you "highlight the structure" and "set up questions"? * Ask questions explicitly and then answer them, as I just did. This is a great device for breaking up boring prose, communicating your rhetorical goals, and making the reader think. * Explicitly refer back to previous text, as when I wrote, "So how exactly do you 'highlight the structure' and 'set up questions'?" * Use lots of transitional phrases (discourse connectives). Note that it's fine to use these across chunk boundaries; that is, feel free to start a subsection with "For this reason, ...", picking up where the previous subsection left off. * Provide guidance through subsection titles. * As you start a section, explicitly state how it will be organized, or how it fits into the larger organization. (Note: If a section is skippable, or chapters can be read out of order, do say so. But don't use this as an excuse for poor organization or long distractions. Some readers tend to read straight through, and your advisor or committee may feel that they must read everything.) Lots of internal cross-references A thesis deals with a lot of ideas at once. Readers can easily lose track. Help them out: * Use plenty of references to your equations, sections, figures and tables. This is really helpful to a reader who might be getting confused, or who is skimming the thesis or reading it piecemeal or out of order. + Don't just say "as defined earlier" or "we will see below"; give the section number. + Don't say "divide by Z"; say "divide equation (3.22) by Z from (3.19)." + Don't say "footnote 22"; say "footnote 22 on p. 99" (if it's far away), by using both the \ref and \pageref commands in LaTeX. You'll probably want define some LaTeX macros for frequent reference styles. * Each figure or table should be mentioned in the main text, so that the reader knows to go look at it. Conversely, the figure's caption may point the reader back to details in the main text (stating the section number). A caption may also refer to other figures or tables that the reader should be sure to compare. * Boldface terms that you are defining, as a textbook would. This makes the definitions easy to spot when needed. You may also want to generate an index of boldfaced terms. * Be very consistent in your terminology. Never use two terms for the same idea; never reuse one term or variable for two ideas. * Be cautious about using pronouns like "it," or other anaphors such as "this" or "this technique." With all the ideas flying around, it won't always be obvious to everyone what you're referring to. Use longer, unambiguous phrases instead, when appropriate. * Try saying "the time t" instead of just "t" or just "the time." Similarly, "the image transformation T," "the training example x[i]," etc. This style reminds the reader of which variables are connected to which concepts. You can further do this for expressions: "the total probability S[i] p[i]" instead of just "the total probability" or "the sum." * Give the reader some clue about the type of each variable. This makes it easier to interpret formulas. State the type (range) when you introduce the variable: "let x e [1,N] be an index." The name of the variable should also be a clue to its type. You may want to adopt naming conventions and state them explicitly, e.g., + i, j, k ... for integers in the range [1,N] + a, b, c ... for characters + A, B, C ... for sets of characters + a, b, g ... for strings + X, Y, Z ... for random variables + script X, Y, Z, ... for the ranges of random variables + x (or x with an arrow above it) for a vector + x^(n) for an n-tuple + SMALL CAPS for the name of a model or system * Feel free to lavish space where it confers extra understanding. Don't hesitate to give an example or a caveat, or repeat an earlier equation, or crisply summarize earlier work that the reader needs to understand. Be concrete As I read a thesis, or a long argument or construction within a thesis, I often start worrying whether I am keeping the pieces together correctly in my head. Something that has become deeply familiar and natural to you (the world expert) may be rougher going for me. If I can see some concrete demonstration of how your idea works, it helps me check and deepen my understanding. * Examples keep the reader, and you, from getting lost in a morass of abstractions. Example cases figured in your thinking; they can help the reader, too. Invented examples are okay, but using "real" examples will also show off what your methods should or can do. * Running examples greet the reader like old friends. The reader will grasp a point more quickly and completely, and remember it better, when it is applied to a familiar example rather than a new one. So if possible, devise one or two especially nice examples that you can keep revisiting to make a series of points. * Pictures serve much the same role as examples: they're concrete and they share how the ideas really look inside your head. A picture is worth at least a thousand words (= 2.5 double-spaced thesis pages). * Pseudocode is a concrete way to convey an algorithm. It is often more concise and precise than a prose description, and may be closer to your own thinking. Of course, you can comment your pseudocode! * Theorems, too, are concise and precise. They are also self-contained chunks, because they formally state all their assumptions. A reader sloshing through a long, complicated, contextual argument can always grab onto a theorem as an island of certainty. _________________________________________________________________ Mechanics Sentences. The previous section dealt with sections and paragraphs, but how about sentences? Yours should read well. The best advice in The Elements of Style: "Omit needless words. Vigorous writing is concise." Typography. It's nice to get the typography right. This might be a good time to read a LaTeX tutorial or book, if you don't know * the differences among -, --, and ---, and whether to put spaces around them * the differences among spacing commands like \@, ~, \ , \;, \!, \hspace and \hspace* * why $diff$, $p(x|y)$, and $argmax_x$ look ugly and how to fix them * how to use environments like eqnarray and theorem * how to make complicated tabular environments * how to use symbols and commands from the AMS-LaTeX packages * how to include graphics (hint: the includegraphics package) * how to define macros to make your life easier * that pdflatex (in place of latex) directly produces a PDF file with nice fonts Margins, spacing, title page, etc. JHU provides these style files for LaTeX. Citations. BibTeX is definitely worth using to manage your bibliographic database. Then I recommend formatting your citations with \usepackage[colon,longnamesfirst]{natbib} (accompanied by \bibliographystyle{plainnat} to format the actual bibliography). The natbib package ordinarily produces reader-friendly citations such as Computers are getting exponentially faster (Moore, 1965). However, Biddle (1971) showed ... and is blessedly flexible enough to handle more complex forms that you'll probably need somewhere in your thesis: Bandura's (1977) theory ... ... (e.g., Butcher, 1954; Baker, 1955; Candlestick-Maker, 1957, and others). The work of Minor (2001, pp. 50-75; but see also Adams, 1999; Storandt, 1997) ... According to Manning and Schütze, 1999 (henceforth M&S), ... (It can also switch to numerical citations like [34] if you really want.) (Another option is the apacite package, which precisely follows the style manual of the American Psychological Association. It is nearly as flexible in its citation format, but APA style has some oddities, including lowercasing the titles of proceedings volumes. One nice thing about APA style is that if you have multiple Smiths in your bibliography, it will distinguish them where necessary, using first and middle initials. Another nice thing is the use of "&" rather than "and" in author lists; however, you can easily hack plainnat.bst to mimic this behavior.) Hyperlinks within your PDF file. I recommend including this in the LaTeX preamble: \usepackage[colorlinks]{hyperref} \usepackage{url} Notes to yourself. I like to use !!! to mark something that I have to come back and finish or fill in. For longer "to do" notes to yourself, try using this \ToDo macro so your note appears in the document, in blue: \usepackage{color} \newcommand{\ToDo}[1]{\textbf{\large\textcolor{blue}{[#1]}}} \ToDo{Either prove this or back away from the claim. I think Fermat's Last Theorem might be the key ...} To suppress all notes, change the definition to \newcommand{\ToDo}[1]{} Not all notes to yourself are to-do items that should jump out at you. You may also want to include TeX comments as documentation for your own use: ... only 58 words in the dictionary have this property. % to get that count: % perl -ne 'print if blah blah' /usr/share/dict/words | wc Version control. It's probably wise to use Subversion (or CVS or RCS) to keep the revision history of your dissertation files. This lets you roll back to an earlier version in case of disaster. Furthermore, if you host the repository on your cs.jhu.edu account, it will be backed up by the department. Sharing your thesis. When you're willing to open up for comments from fellow students, your advisor, or your committee, give them a secret URL from which they can always download the latest, up-to-date release of your thesis, as well as earlier versions. (This is probably friendlier than just pointing them to your Subversion repository.) Keep this URL up to date with your changes. Each distinct version should bear a visible date or version number, to avoid confusion. For each new version (or on request), you should probably also supply a PDF that marks up the differences from an appropriate earlier version, using the wonderful latexdiff program or a similar technique. (Note: If you use a makefile to build your document by running latex, gnuplot, etc., then you can also make it run latexdiff and update the URL for you.) _________________________________________________________________ Planning Your Dissertation Every dissertation is a little different. Talk to your advisor to draft a specific, written plan for what the thesis will contain, how it will be organized, and whom it will address. Discuss the plan with each of your committee members, who may suggest changes. They might disagree with advice on this page; find out. As the dissertation takes shape, your plan may need some revision. Your advisor and committee may be willing to provide early feedback. But no one will want to slog through more than a version or two in detail. So ask them each how many drafts of each chapter they're willing to read, and in what state and on what schedule. Some of them may prefer to influence your writeup while it's still in an early, outline form. Others may prefer to wait until your prose is fairly polished and easy to read. In addition to your advisor's goals and your committee's goals, you may have some goals of your own, e.g., * settle some open questions that are bugging you * reach out to a related field * present ideas so that you can cite them in future work * provide useful reference material for your own future students * make it easy to turn the thesis into a job talk or a book * make it easy to turn individual chapters into journal articles * establish a particular identity in the research community * convince certain senior researchers to read your thesis * graduate by a particular date GOOD LUCK!!! Here's how to get started ... _________________________________________________________________ This page online: http://cs.jhu.edu/~jason/advice/how-to-write-a-thesis.html Jason Eisner - jason@cs.jhu.edu (suggestions welcome) Last Mod $Date: 2008/09/29 15:37:25 $