How to Write Up a Ph.D. Dissertation
                    (for computer scientists and the like)

    by Jason Eisner (2006)

   This page is about how to turn your research (once it's done) into a
   readable multi-chapter document. You need to figure out what to
   include, how to organize it, and how to present it.

   Following this advice will make me happier about reading your
   submitted or draft dissertation. You may find it useful even if I'm
   not going to read your dissertation.

   Many other people have written usefully on this subject, including
   someone in the Annals of Improbable Research. This page focuses on
   what a finished dissertation should look like. You could also skim
   good dissertations on the web.
     _________________________________________________________________

     * What to include:
          + What Goes Into a Dissertation?
          + Know Your Audience
          + Planning Your Dissertation
     * How to organize it:
          + High-Level Organization
     * How to present it:
          + Make Things Easy on Your Poor Readers
          + Mechanics
     _________________________________________________________________

What Goes Into a Dissertation?

   A typical thesis will motivate why a new idea is needed, present the
   cool new idea, convince the reader that it's cool and new and might
   apply to the reader's own problems, and evaluate how well it worked.
   Just like a paper!

   The result must be a substantial, original contribution to scientific
   knowledge. It signals your official entrance into the community of
   scholars. Treat it as an chance to make a mark, not as a 900-page-tall
   memorial to your graduate student life.

  Beyond stapling

   The cynical view is that if you've written several related papers, you
   staple them together to get a dissertation. That's a good first-order
   approximation -- you should incorporate ideas and text from your
   papers. But what is it missing?

   First, a thesis should cohere -- ideally, it should feel like one long
   paper. Second, it should provide added value: there should be people
   who would prefer reading it to simply reading your papers. Otherwise
   writing it would be a meaningless exercise.

   Here's what to do after stapling:
     * integrate the pieces
          + craft a substantial introductory chapter that ties the work
            together somehow and highlights the novel contributions
          + reorganize the remaining presentation as supporting and
            developing your story from the introduction
          + write a (brief) concluding chapter that recapitulates your
            story and summarizes what was learned
          + make the notation, terminology, and style consistent
            throughout
          + do keep good ideas, text, and results from your previous
            papers (giving credit to any co-authors)
     * expand the text
          + make the text clearer, more tutorial, and more thoughtful
          + add more examples and intuitions to help the reader
          + add new experiments/theorems/significance tests to leave no
            stone unturned
          + consider counterarguments, variations, and alternative
            explanations
          + give enough details to allow a reader to replicate the work
            or apply it in new settings
     * contextualize the ideas
          + mention all obviously related work and explain how it relates
            to yours
          + discuss alternative solutions that you rejected or are
            leaving to future work
          + point out connections to other areas, including other
            possible applications of your ideas
          + describe possible generalizations (and try them if possible)
          + lay out future work for yourself or others
     * acknowledge help (usually in a preface)
          + acknowledge any collaborators on this work, such as your
            advisor
          + acknowledge financial support on this work, and perhaps also
            other financial support you've received as a grad student
          + thank other people who have helped you technically,
            administratively, socially, or emotionally over your grad
            student career
          + state which parts of the thesis text (if any) have appeared
            in your previous publications; get permission to republish if
            you are no longer the copyright holder of those works, or if
            you had co-authors

  Taking Responsibility

   Don't expect your advisor to be your co-author. It's your Ph.D.: you
   are sole author this time and the responsibility is on your shoulders.
   If your prose is turgid or thoughtless, misspelled or ungrammatical,
   oblivious or rude to related research, you're the one who looks bad.

   You can do it! Your advisor and committee are basically on your side
   -- they're probably willing to make suggestions about content and
   style -- but they are not obligated to fix problems for you. They may
   send your dissertation back and tell you to fix it.

   In the following sections, I'll start with advice about the thesis as
   a whole, and work downward, eventually reaching small details such as
   typography and citations.
     _________________________________________________________________

Know Your Audience

   First, choose your target audience. That crucial early decision will
   tell you what to explain, what to emphasize, and how to phrase and
   organize it. Checking it with your advisor might be wise.

   Pretty much everything in your thesis should be relevant to your
   chosen audience. Think about them as you write. Ask yourself:

  What does your audience already know?

   A computer science thesis can freely invoke basic ideas like hash
   tables and computational complexity without defining or even citing
   them. (After all, do biologists read a computer science thesis? Not
   unless they are pretty comfortable with computer science.)

   You can also safely assume that your readers have some prior
   familiarity with your research area. Just how much familiarity, and
   with which topics, is a judgment call -- again, you have to decide who
   your intended audience is.

   In practice, your audience will be somewhat mixed. Up to a point, it
   is possible to please both beginners and experts -- by covering
   background material crisply and in the service of your own story. How
   does that work? As you lay out the motivation for your own work, and
   provide notation, you'll naturally have to discuss background concepts
   and related work. But don't give a generic discussion! Make it an
   integral part of your motivation and argument. Present your detailed
   perspective on the intellectual landscape and where your own work sits
   in it -- a fresh (even opinionated) take that keeps tying back to your
   main themes and will be useful for both experts and beginners.

   Thus, be as considerate as you can to beginners without interrupting
   the flow of your main argument to your established colleagues. A good
   rule of thumb is to write at the level of the most accessible papers
   in the journals or conference proceedings that you read.

  What do you want your audience to learn from the thesis?

   You should set clear goals here. Just like a paper or a talk, your
   dissertation needs a point: it should tell a story. Writing the
   abstract and introductory chapter first will help you work out what
   that story is.

   You may find that you have to do further work to really support your
   chosen story: more experiments, more theorems, reading more
   literature, etc.

  What does your audience hope to get out of the thesis?

   Why does anyone crack open a dissertation, anyway? I sometimes do.
   Especially for areas that I know less well, a dissertation is often
   more accessible than shorter, denser papers. It takes a more leisurely
   pace, provides more explicit motivation and background, and answers
   more of the questions that I might have.

   There are other reasons I might look at your dissertation:
     * To better understand your cool idea: your grand vision, how you
       think about it, and what you did.
     * To look up details, clarification, or further results after
       reading a shorter version of your work.
     * To get a sense of what your field or area is all about.
     * To read a thorough summary of work in your area, via your
       literature review.
     * To describe your work accurately in a paper I'm writing.
     * To check whether a paper I'm reviewing should have cited you.
     * To decide whether to give you a Ph.D. :-)
     * To help me write a recommendation or promotion letter.

   Readers with different motivations may read your thesis in different
   ways. The strong convention is that it's a single document that must
   read well from start to finish -- your committee will read it that
   way. But it's worth keeping other readers in mind, too. Some will skim
   from start to finish. Some will read only the introduction and
   conclusion. Some will read a single chapter in the middle, going back
   for definitions as needed. Some will scan for what they need: a
   definition, example, table of results, or literature review. Some will
   flip through to get a general sense of your work or of how you think,
   reading whatever catches their eye.
     _________________________________________________________________

High-Level Organization

   Once you've chosen your target audience, you should outline the
   structure of the thesis. Again, the convention is that the document
   must read well from start to finish.

   The "canonical organization" is sketched by Douglas Comer near the end
   of his advice. You'll probably want something like it. A few further
   tips:

  Keep your focus

   Keep your focus. Length is not a virtue unless the content is actually
   interesting. You do have as much space as you need, but the reader
   doesn't have unlimited time and neither do you.

   Use space as needed for clarity and to flesh out and support your
   story. If you feel like your thesis is too short, it may need more
   ideas or thoughtful discussion or experiments (talk to your advisor),
   but it doesn't need more padding.

  Get to the good stuff

   A newspaper, like a dissertation, is a hefty chunk of reading. So it
   puts the most important news on page one, and leads each article with
   the most important part. You should try to do the same when
   reasonable.

   Get to the interesting ideas as soon as possible. A good strategy is
   to make Chapter 1 an overview of your main arguments and findings.
   Tell your story there in a compelling way, including a taste of your
   results. Refer the reader to specific sections in later chapters for
   the pesky details. Chapter 1 should be especially accessible (use
   examples): make it the one chapter that everyone should read.

   The same strategy works within a chapter. Start by telling your
   readers what the chapter is about and why they should read it. Then
   unfold your ideas and results. The order of your presentation should
   be natural and logical (e.g., motivation before experimental design
   before results), but try to keep the reader turning pages; seek
   reasonable ways to move the boring bits to later sections or later
   chapters.

  Include a road map

   Chapter 1 traditionally ends with a "road map" to the rest of the
   thesis, which rapidly summarizes what the remaining chapters or
   sections will contain. That's useful guidance for readers who are
   looking for something specific and also for those who will read the
   whole thesis. It also exhibits in one place how much work you've done.
   Here's a detailed example.

  Where to put the literature review

   I recommend against writing "Chapter 2: Literature Review." Such
   chapters are usually boring: they're plonked down like the author's
   obligatory list of what he or she was "supposed" to cite. They block
   the reader from getting to the new ideas, and can't even be contrasted
   with the new ideas because those haven't been presented yet.

   A better plan is to discuss related literature in conjunction with
   your own ideas. As you motivate and present your ideas, you'll want to
   refer to some related work anyway.
   Related work that didn't meld naturally into that presentation can be
   acknowledged soon afterwards in its own section -- where you should
   still focus on how it relates to your ideas and fits into your
   framework, which you have already presented.

   Each chapter might have its own related work section or sections,
   covering work that connects to yours in different ways.

  Where to define terminology and notation

   Basic terminology, concepts, and notation have to be defined
   somewhere. But where? You can mix the following strategies:

   Retail. You can define some terms or notation individually, when the
   reader first needs them. Then they will be well-motivated and fresh in
   the reader's mind. If you use them again later, you can refer back to
   the section where you first defined them.

   Wholesale. On the other hand, there are advantages to aggregating some
   of your fundamental definitions into a "Definitions" section near the
   start of the chapter, or a chapter near the start of the dissertation:
     * Sits readers down and gets them oriented all at once.
     * Makes the definitions harder to overlook.
     * Highlights how the definitions are related to one another.
     * Gets the definitions out of the way, so they don't have to
       interrupt the flow of your argument later.
     * Gives readers a place to check if they forget what you meant by
       hairy_variable_name or the "bumptiousness" of a model. (An
       alternative is to include a summary of notation and a glossary at
       the back of the dissertation, and advertise their existence.)

   The downside is that such sections or chapters can seem boring and
   full of not-yet-motivated concepts. Unless your definitions are novel
   and interesting in themselves, they block the reader from getting to
   the new and interesting ideas. So if you write something like "Chapter
   2: Preliminaries," keep it relatively concise -- the point is to get
   the reader oriented.

   Thrift shop. Use well-known notation and terminology whenever you can,
   either with or without a formal definition in your thesis. The point
   of your thesis is not to re-invent notation or to re-present
   well-known material, although sometimes you may find it helpful to do
   so.
     _________________________________________________________________

Make Things Easy on Your Poor Readers

   Now we get down to the actual writing. A dissertation is a lot to
   write. But it's also an awful lot to read and digest at once! You can
   keep us readers turning pages and following your argument. But it's a
   bigger and more complicated argument than usual, so you have to be
   more disciplined than usual.

  Break it down

   Long swaths of text are like quicksand for readers (and writers!). To
   keep us moving without sinking, use all the devices at your disposal
   to break the text down into short chunks. Ironically, short chunks are
   more helpful in a longer document, both to keep your argument tightly
   organized and to keep the reader focused and oriented.

   If a section or subsection is longer than 1 double-spaced page,
   consider whether you could break it down further. I'm not joking! This
   1-page threshold may seem surprisingly short, but it really makes
   writing and reading easier. Some devices you can use:

   subsectioning
          Split your section into subsections (or subsubsections) with
          meaningful titles that keep the reader oriented.

   lists
          If you're writing a paragraph and feel like you're listing
          anything (e.g., advantages or disadvantages of some approach),
          use an explicit bulleted list. Sometimes this might yield a
          list with only 2 or 3 rather long bullet points, but that's
          fine -- it breaks things down. (Note: To replace the bullets
          with short labels, roughly as in the list you're now reading,
          LaTeX's itemize environment lets you write \item[my label].)

   labeled paragraphs
          Label a series of paragraphs within the section, as a kind of
          lightweight subsectioning. Your experimental design section
          might look like this (using the LaTeX \paragraph command):

     Participants. The participants were 32 undergraduates enrolled in
     ...
     Apparatus. Each participant wore a Star Trek suit equipped with a
     Hasbro-brand Galactic Translator, belt model 3A ...
     Procedure. The subjects were seated in pairs throughout the
     laboratory and subjected to Vogon poetry broadcast at 3-minute
     intervals ...
     Dataset. The Vogon poetry corpus (available on request) was
     obtained by passing the later works of T. S. Eliot through the
     Systran translation system ...

   footnotes
          Move inessential points to footnotes. If they're too long for
          that, you could move them into appendices or chapters near the
          end of the thesis.

   captions
          Move some discussion of figures and tables into their captions.
          A helpful caption provides guidance on how to interpret the
          figure or table and what interesting conclusions to draw from
          it.

          (In LaTeX, you can write \caption[short version]{long version}.
          The optional short version argument will be used for the "List
          of Tables" or "List of Figures" at the start of the thesis.)

   theorems
          Even simple formal results can be stated as a theorem or lemma.
          The theorem (and proof, if included) form a nice little chunk,
          using the LaTeX theorem enviroment.

    Breaking down equations

   Long blocks of equations are even more intimidating than long swaths
   of text. You can break those apart, too:
     * Intersperse short bits of text for guidance. You might introduce
       line 3 of your formula with

     A change of variable from x to log x now allows us to integrate by
     parts:
     * Distinguish conceptually important steps from finicky steps that
       just push symbols around. You can even move finicky steps to a
       footnote, like this:

     Some algebraic manipulation^5 allows us to simplify to the
     following:
     * Use visual devices like boldface, underlining, or \underbrace to
       call attention to significant parts of a formula:

     [hairyformula.png]
     * Simplify the formulas in the first place by defining intermediate
       quantities or adopting notational conventions (e.g., "the t
       subscript will be dropped when it is clear from context").

  Now tie it back together

   Now that you've chopped your prose into bite-sized chunks, what binds
   it together?

    Coherent and explicit structure

   Your paragraphs and chunks have to tie together into a coherent
   argument. Do everything you can to highlight the structure of this
   argument. The structure should jump out at the reader, making it
   possible to read straight through your text, or skim it. Else the
   reader will get stuck puzzling out what you meant and lose momentum.

   Make sure your readers are never perplexed about the point of the
   paragraph they're reading. Make them want to keep turning the page
   because you've set up questions to which they want to know the
   answers. Don't make them rub their eyes in frustration or boredom and
   wander off to the fridge or the web browser.

   So how exactly do you "highlight the structure" and "set up
   questions"?
     * Ask questions explicitly and then answer them, as I just did. This
       is a great device for breaking up boring prose, communicating your
       rhetorical goals, and making the reader think.
     * Explicitly refer back to previous text, as when I wrote, "So how
       exactly do you 'highlight the structure' and 'set up questions'?"
     * Use lots of transitional phrases (discourse connectives). Note
       that it's fine to use these across chunk boundaries; that is, feel
       free to start a subsection with "For this reason, ...", picking up
       where the previous subsection left off.
     * Provide guidance through subsection titles.
     * As you start a section, explicitly state how it will be organized,
       or how it fits into the larger organization.
       (Note: If a section is skippable, or chapters can be read out of
       order, do say so. But don't use this as an excuse for poor
       organization or long distractions. Some readers tend to read
       straight through, and your advisor or committee may feel that they
       must read everything.)

    Lots of internal cross-references

   A thesis deals with a lot of ideas at once. Readers can easily lose
   track. Help them out:
     * Use plenty of references to your equations, sections, figures and
       tables. This is really helpful to a reader who might be getting
       confused, or who is skimming the thesis or reading it piecemeal or
       out of order.
          + Don't just say "as defined earlier" or "we will see below";
            give the section number.
          + Don't say "divide by Z"; say "divide equation (3.22) by Z
            from (3.19)."
          + Don't say "footnote 22"; say "footnote 22 on p. 99" (if it's
            far away), by using both the \ref and \pageref commands in
            LaTeX.
       You'll probably want define some LaTeX macros for frequent
       reference styles.
     * Each figure or table should be mentioned in the main text, so that
       the reader knows to go look at it. Conversely, the figure's
       caption may point the reader back to details in the main text
       (stating the section number). A caption may also refer to other
       figures or tables that the reader should be sure to compare.
     * Boldface terms that you are defining, as a textbook would. This
       makes the definitions easy to spot when needed. You may also want
       to generate an index of boldfaced terms.
     * Be very consistent in your terminology. Never use two terms for
       the same idea; never reuse one term or variable for two ideas.
     * Be cautious about using pronouns like "it," or other anaphors such
       as "this" or "this technique." With all the ideas flying around,
       it won't always be obvious to everyone what you're referring to.
       Use longer, unambiguous phrases instead, when appropriate.
     * Try saying "the time t" instead of just "t" or just "the time."
       Similarly, "the image transformation T," "the training example
       x[i]," etc. This style reminds the reader of which variables are
       connected to which concepts. You can further do this for
       expressions: "the total probability S[i] p[i]" instead of just
       "the total probability" or "the sum." 
     * Give the reader some clue about the type of each variable. This
       makes it easier to interpret formulas. State the type (range) when
       you introduce the variable: "let x e [1,N] be an index." The name
       of the variable should also be a clue to its type. You may want to
       adopt naming conventions and state them explicitly, e.g.,
          + i, j, k ... for integers in the range [1,N]
          + a, b, c ... for characters
          + A, B, C ... for sets of characters
          + a, b, g ... for strings
          + X, Y, Z ... for random variables
          + script X, Y, Z, ... for the ranges of random variables
          + x (or x with an arrow above it) for a vector
          + x^(n) for an n-tuple
          + SMALL CAPS for the name of a model or system
     * Feel free to lavish space where it confers extra understanding.
       Don't hesitate to give an example or a caveat, or repeat an
       earlier equation, or crisply summarize earlier work that the
       reader needs to understand.

  Be concrete

   As I read a thesis, or a long argument or construction within a
   thesis, I often start worrying whether I am keeping the pieces
   together correctly in my head. Something that has become deeply
   familiar and natural to you (the world expert) may be rougher going
   for me. If I can see some concrete demonstration of how your idea
   works, it helps me check and deepen my understanding.
     * Examples keep the reader, and you, from getting lost in a morass
       of abstractions. Example cases figured in your thinking; they can
       help the reader, too. Invented examples are okay, but using "real"
       examples will also show off what your methods should or can do.
     * Running examples greet the reader like old friends. The reader
       will grasp a point more quickly and completely, and remember it
       better, when it is applied to a familiar example rather than a new
       one. So if possible, devise one or two especially nice examples
       that you can keep revisiting to make a series of points.
     * Pictures serve much the same role as examples: they're concrete
       and they share how the ideas really look inside your head. A
       picture is worth at least a thousand words (= 2.5 double-spaced
       thesis pages).
     * Pseudocode is a concrete way to convey an algorithm. It is often
       more concise and precise than a prose description, and may be
       closer to your own thinking. Of course, you can comment your
       pseudocode!
     * Theorems, too, are concise and precise. They are also
       self-contained chunks, because they formally state all their
       assumptions. A reader sloshing through a long, complicated,
       contextual argument can always grab onto a theorem as an island of
       certainty.
     _________________________________________________________________

Mechanics

   Sentences. The previous section dealt with sections and paragraphs,
   but how about sentences? Yours should read well. The best advice in
   The Elements of Style: "Omit needless words. Vigorous writing is
   concise."

   Typography. It's nice to get the typography right. This might be a
   good time to read a LaTeX tutorial or book, if you don't know
     * the differences among -, --, and ---, and whether to put spaces
       around them
     * the differences among spacing commands like \@, ~, \ , \;, \!,
       \hspace and \hspace*
     * why $diff$, $p(x|y)$, and $argmax_x$ look ugly and how to fix them
     * how to use environments like eqnarray and theorem
     * how to make complicated tabular environments
     * how to use symbols and commands from the AMS-LaTeX packages
     * how to include graphics (hint: the includegraphics package)
     * how to define macros to make your life easier
     * that pdflatex (in place of latex) directly produces a PDF file
       with nice fonts

   Margins, spacing, title page, etc. JHU provides these style files for
   LaTeX.

   Citations. BibTeX is definitely worth using to manage your
   bibliographic database. Then I recommend formatting your citations
   with \usepackage[colon,longnamesfirst]{natbib} (accompanied by
   \bibliographystyle{plainnat} to format the actual bibliography). The
   natbib package ordinarily produces reader-friendly citations such as

     Computers are getting exponentially faster (Moore, 1965). However,
     Biddle (1971) showed ...

   and is blessedly flexible enough to handle more complex forms that
   you'll probably need somewhere in your thesis:

     Bandura's (1977) theory ...
     ... (e.g., Butcher, 1954; Baker, 1955; Candlestick-Maker, 1957, and
     others).
     The work of Minor (2001, pp. 50-75; but see also Adams, 1999;
     Storandt, 1997) ...
     According to Manning and Sch�tze, 1999 (henceforth M&S), ...

   (It can also switch to numerical citations like [34] if you really
   want.)

   (Another option is the apacite package, which precisely follows the
   style manual of the American Psychological Association. It is nearly
   as flexible in its citation format, but APA style has some oddities,
   including lowercasing the titles of proceedings volumes. One nice
   thing about APA style is that if you have multiple Smiths in your
   bibliography, it will distinguish them where necessary, using first
   and middle initials. Another nice thing is the use of "&" rather than
   "and" in author lists; however, you can easily hack plainnat.bst to
   mimic this behavior.)

   Hyperlinks within your PDF file. I recommend including this in the
   LaTeX preamble:

\usepackage[colorlinks]{hyperref}
\usepackage{url}

   Notes to yourself. I like to use !!! to mark something that I have to
   come back and finish or fill in. For longer "to do" notes to yourself,
   try using this \ToDo macro so your note appears in the document, in
   blue:

\usepackage{color}
\newcommand{\ToDo}[1]{\textbf{\large\textcolor{blue}{[#1]}}}

\ToDo{Either prove this or back away from the claim.  I think
   Fermat's Last Theorem might be the key ...}

   To suppress all notes, change the definition to

\newcommand{\ToDo}[1]{}

   Not all notes to yourself are to-do items that should jump out at you.
   You may also want to include TeX comments as documentation for your
   own use:

... only 58 words in the dictionary have this property.
% to get that count:
%    perl -ne 'print if blah blah' /usr/share/dict/words | wc

   Version control. It's probably wise to use Subversion (or CVS or RCS)
   to keep the revision history of your dissertation files. This lets you
   roll back to an earlier version in case of disaster. Furthermore, if
   you host the repository on your cs.jhu.edu account, it will be backed
   up by the department.

   Sharing your thesis. When you're willing to open up for comments from
   fellow students, your advisor, or your committee, give them a secret
   URL from which they can always download the latest, up-to-date release
   of your thesis, as well as earlier versions. (This is probably
   friendlier than just pointing them to your Subversion repository.)

   Keep this URL up to date with your changes. Each distinct version
   should bear a visible date or version number, to avoid confusion. For
   each new version (or on request), you should probably also supply a
   PDF that marks up the differences from an appropriate earlier version,
   using the wonderful latexdiff program or a similar technique. (Note:
   If you use a makefile to build your document by running latex,
   gnuplot, etc., then you can also make it run latexdiff and update the
   URL for you.)
     _________________________________________________________________

Planning Your Dissertation

   Every dissertation is a little different. Talk to your advisor to
   draft a specific, written plan for what the thesis will contain, how
   it will be organized, and whom it will address. Discuss the plan with
   each of your committee members, who may suggest changes. They might
   disagree with advice on this page; find out.

   As the dissertation takes shape, your plan may need some revision.
   Your advisor and committee may be willing to provide early feedback.
   But no one will want to slog through more than a version or two in
   detail. So ask them each how many drafts of each chapter they're
   willing to read, and in what state and on what schedule. Some of them
   may prefer to influence your writeup while it's still in an early,
   outline form. Others may prefer to wait until your prose is fairly
   polished and easy to read.

   In addition to your advisor's goals and your committee's goals, you
   may have some goals of your own, e.g.,
     * settle some open questions that are bugging you
     * reach out to a related field
     * present ideas so that you can cite them in future work
     * provide useful reference material for your own future students
     * make it easy to turn the thesis into a job talk or a book
     * make it easy to turn individual chapters into journal articles
     * establish a particular identity in the research community
     * convince certain senior researchers to read your thesis
     * graduate by a particular date

   GOOD LUCK!!! Here's how to get started ...
     _________________________________________________________________

   This page online:
   http://cs.jhu.edu/~jason/advice/how-to-write-a-thesis.html

   Jason Eisner - jason@cs.jhu.edu (suggestions welcome) Last Mod $Date:
   2008/09/29 15:37:25 $