Unicode defines numerous dash characters, some of which are language-specific. It is sometimes confusing to know what character to use when and for what purpose? Here are brief guidelines for properly using dash characters, particularly in LATEX.
The common dash characters are:
- a hyphen (-), U+002D
- an en-dash (–), U+2013
- an em-dash (—), U+2014
- a minus sign (−), U+2212
A hyphen is used in compound words, for example ‘son-in-law’, and in hyphenation (which is handled automatically by Latex). In latex source, just type the hyphen character which is found in all Latin keyboards:
son-in-law
The en-dash character is used in number ranges, for example ‘pages 9–13’, and other number contexts like ‘exercise 2.5–8’. To typeset en-dash in Latex, type double hyphens: ‘--’:
Exercise 2.5--8:
The em-dash—which is longer than en-dash—is used for punctuation in sentences and in-line comments. This is what is simply called dash. A hyphen is not a dash—it is a character for specific purposes. To typeset an em-dash in Latex, use triple hyphens: ‘---’:
A comment---like this---uses an em-dash, not a hyphen.
A minus sign (−) is used in mathematics and represents subtraction. It is typeset in Latex using math mode, i.e., enclosing it between $$:
$-$
If you enter more than three hyphens, Latex will always combine them in sets of three, and the remaining. So for example, if you enter five hyphens, Latex will transform them into em-dash + en-dash. If you really want two hyphens and not an en-dash character, you can separate them using empty {}:
-{}-
If you want to use a hyphen to form a compound word, but you want the whole word to appear on the same line (regularly, the Latex will break the word if there is no space on the same line), you can use the non-breaking hyphen character from Unicode: U+2011 (‑). However, this very much depends on what fonts you are using (not all fonts have a shape defined for this character).
Another useful hyphen is the soft hyphen (U+00AD), also called SHY (Wikipedia). This is used to indicate where a word should be broken when dividing it between lines. It is the manual way of forcing hyphenation. It does not appear when the full word is on a single line, but a regular hyphen will show at the place indicated by the soft hyphen when the word is split in two lines. In Latex, it is typeset with:
\-
References
- The TeXBook, by Donald E. Knuth
- Wikipedia
- Unicode specification version 8.0
For more details, see commonly confused characters.