ASCII and Unicode quotation marks (cl.cam.ac.uk)

176 points by anschwa 280 days ago

jimmies 279 days ago

I hate the "" -> “” thing with a passion. I don't know how much productivity that the world has lost with that “” shit.

It doesn't look that much better, and it always fucks with me at random times. That shit is on the list of annoying problems that shouldn't exist in the first place, along with the \nl\cr thing, and the txt saved as rtf thing, and the UTF-8 encoding-character-at-the-beginning-of-the-file or whatever it is called.

Someone complains your program gives them an error when they open a csv file you sent them. You tested your program, it works. You go on the phone with them for 30 minutes, try to figure out what the fuck was going on. There it is, it was opened in a program that meddles with that "" and replaces it with the “” shit.

Also, there has to be at least one time you're fucked by the "" -> “” snobbiness when you go to a random Wordpress site and paste the command they tell you to do to the command line and realize it doesn't work. You pull your hair for a couple of minutes, and there is that sneaky ” thing. Wordpress does that for anything it doesn't think as code (inb4 ”good programmers don't paste commands from wordpress to GNU+bash“).

One of the first things that I do when I set up a new Mac computer is to turn that damn "" -> “” ““““feature”””” off.

hunter2_ 279 days ago

You forgot to mention leading apostrophes getting autocorrected to left_single_quote instead of right_single_quote, as in

John Doe ‘42

An abomination ‘cause it's a damn apostrophe, not opening a quotation.

peterburkimsher 279 days ago

For anyone wondering, this is how to turn that feature off.

System Preferences -> Keyboard -> Text -> Use smart quotes and dashes (uncheck).

athenot 279 days ago

On a US-English keyboard on the mac, you can always type the proper glyphs directly on the keyboard. That's always been harder on windows where one would have to type some numerical character code each time they wanted some non-ascii character. That's where the auto-replace originated.

On the mac:

Option [          “      (English open quote)
Option Shift [    ”      (English close quote)

A lesser-known feature is that some other quotations are also possible from that same US-English keyboard:

Option \          «      (French open quote)
Option Shift \    »      (French close quote)
Option Shift W    „      (German open quote)

hunter2_ 279 days ago

> at-the-beginning-of-the-file

That thing's the BOM.

jiggunjer 279 days ago

Which you don't really need with UTF-8, it only has a purpose for UTF-16+.

It's useful to tell a text editor "This is UTF-8, not Windows-1252 or ISO-8859-1 or whatever you might be used to".

mjevans 279 days ago

No, just do 8-bit clean, don't SCREW with the encoding if you weren't asked to.

gnud 279 days ago

An editor can't "just do 8-bit clean", it has to display the characters. The same bytes will sometimes be displayed differently in utf-8 and (e.g.) ISO-8859-1.

I'm not sure if a BOM is a good way to handle it, but saying 'just do 8-bit clean' doesn't work when you're displaying or printing the characters for humans to understand.

279 days ago

LeoNatan25 279 days ago

Must not be a fan of typography then. Sadly, the tech world is full of the likes of you, so even if people care, in many cases—such as dashes—they are not even aware of the alternatives. Gladly Apple has decided not to follow your ““““opinion”””” by default.

jimmies 279 days ago

I never said I hate the 69 quotation thing. I said I hate the automatic conversion of the straight quote to the 69 quote behind my back.

Imagine they automatically replace the "fi" with the styled "fi" behind your back and breaks text searches and normal text files.

You can be snobby all you want, as long as you initialed it and know you don't break programs' assumptions. No one says you can't have nice shit with LaTEX and Pagemaker when you need it.

Here people who just open the file and save it, or copy the text and paste it, don't mean it, yet it happens without them ever knowing.

LeoNatan25 279 days ago

Regarding ﬁ, it just shows how broken most search engines are.

As a reader, what is the difference between between ﬁ and fi? Or between a and а?

If the search engine is distinguishing those because of Unicode characters, it has failed completely.

TeMPOraL 279 days ago

Unicode turns trivial string comparison problems into pretty much AI-complete ones. Same visual characters, different meaning. Different characters, but same visual meaning for the person who initiated the query. Up/down-case not well defined, and even if it is, it's not always reversible (i.e. tolower(toupper(tolower(string))) is not equal to tolower(string)). There's lots of that.

The longer I live, the bigger fan of simplification and standardization I become. Like with dates and times. Timezones and DST are a fucking nightmare, because politics. And then doubly so, because people enjoy themselves with their favourite regional writing formats. At this point I'm all for enforcing ISO 8601 on every communication involving dates and times.

Some people object that this is turning humans into machines, etc. So be it. Nature isn't perfect, and clear communications doesn't come to us naturally. Yet it is absolutely vital in a technological society.

LeoNatan25 279 days ago

That’s closed-mindedness bordering on the silly. ISO 8601, for instance, only represents Gregorian calendar dates and times. But what about Japanese, Hebrew and Buddhist calendars, for example?

Looking at the world as only “what will my Emacs store on the drive when I paste some text into it” is ridiculous. As computers become more advanced—even mobile phones are super computers now¹—we should use that computer power to make the technology work for us, not bend over to some 1980s concepts of computing and standards. It may be more difficult for you to build such a system, I understand, but as a used, I really don’t give a damn.

jimmies 279 days ago

>I understand, but as a used, I really don’t give a damn.

Yeah, Because we already changed " to 69 behind their back, let's double down on correcting the 69 so it processes like "!

Let's engineer our dumb, close-minded O(n) string search and CSV parser to do some AI image recognition shit in O(2^n) to figure out when the fuck the "used" had the  involuntarily changed to ' because MacOS or Wordpress decided it looks better.

That's brilliant engineering right there.

Satire aside, you realize that there is a cost of security and maintenance to over-engineer shit, it's just not just convenience, right?

LeoNatan25 279 days ago

Your “used” “joke” is so … satirical.

Your specific bug with CSV is a developer bug. The place where you copied the incorrect CSV format should not have had such substitution enabled. On the Mac, developer can specify for each text view and text field what substitutions are allowed by default. Likewise in web, it is possible to specify which substitutions should be allowed for text areas.

So instead of blaming incompetent developers for their incorrect use of system features, thus ruining some very narrow cases, let's hold back any kind of text input and processing advancements, because you are unable to input some CSV properly.

That's brilliant engineering right there.

When my mother types on her computer, she just wants things to work. When she searches for something, she doesn't care if she typed the wrong incorrect Unicode character. That's the cases that need to be solved for users.

And it is only O(2ⁿ) algorithm if you naively look at text as an array of bytes. Time to, perhaps, broaden some horizons.

jimmies 279 days ago

When you use your mom as a baseline, then do you think she would give a single fuck about how the " should look like? Does she zoom in the text with a magnifier glass to complain oh no they didn't change my " to 69, this doesn't look right, my os is shit?

Or she cares more when the shit she copy-and-pasted from some random website or your note to her to do something doesn't work because the site or the copy-paste process meddled with it?

PS: The joke is on the O(2^n) with AI Tensorflow, not on the "used." The "used" thing was 101% serious.

daemin 279 days ago

It would be fine if it just changed visually depending on the context, without actually changing the underlying character data.

Zarel 279 days ago

The problem is that you can't really detect which single quotation mark to use unambiguously.

'n'

can either be an abbreviation of "and", or a single-quoted letter "n".

as an abbreviation of "and", it should be rendered as [right single quote]n[right single quote]

’n’

and as a single-quoted letter "n", it should be rendered as [left single quote]n[right single quote]

‘n’

jimmies 279 days ago

If you can't determine it automatically, then why is that "feature" there in the first place?

TeMPOraL 279 days ago

And here it would be creating a very insidious problem - representation (external format) disconnected from underlying data (internal format).

What I mean is, for example, this: when I see " in a comment, I know the browser is actually storing the " character somewhere in memory. I know that when I send this comment form, HN will receive " character. If I copy and paste the comment into my Emacs, and save it, I know my hard drive now stores the " character.

When external format gets completely disconnected from internal format, understanding anything about what happens with the data gets much more difficult.

daemin 279 days ago

But we already have that with things such as style sheets, where you can specify font styles which transform the text to all upper case, all lower case, small caps, etc. And as another comment pointed out there's even ligatures, which interpret a sequence of characters and render them together with a different singular glyph.

LeoNatan25 279 days ago

What about ligatures and other font character substitution features? What will you Emacs do then and why does it matter? As we advance further, we should strive to disconnect from the technology—which should just work, despite how complex it is.

LeoNatan25 279 days ago

Then it would have to be a feature of the font, and I am not familiar with any font that does that automatically. Technically, it might be possible, with advanced scripting available in OpenType, but considering how even the most basic of ligatures are unsupported under most Linux and Windows fonts, I would not have my hopes up to have this any time soon.

aoloe 279 days ago

As far as i know it's (very) hard to do correctly.

There are (small or big) differences among languages and it is not always obvious to detect ,if the quote should be converted at all, if it's a left one, or a right one.

As a reference, this is a Python Script trying to do the conversion in Scribus:

https://wiki.scribus.net/canvas/Convert_Typewriter_Quotes_to...

There is also an article by the author of the script, explaining his work and pointing to at least one common case it cannot handle:

https://opensource.com/article/17/3/python-scribus-smart-quo...

Putting all this logic in a font might or not be something your really want...

But I 100%: this should be done at the font level... and hard replacing characters is not a good solution.

lisper 280 days ago

The fact that ASCII does not have balanced quotes is one of the great catastrophes of computing. It makes everything more complicated than it needs to be, from embedding code in strings to parsing CSV files, to regexps. For example, if I want to embed a quoted string in another quoted string, I have to escape the inner quotes like so:

"This is string containing an embedded \"quoted\" string"

Then I have to think about whether or not the system I'm going to send that string to is going to "helpfully" remove the backslashes, in which case I need to write:

"This is a string containing an embedded \\"quoted\\" string"

All this horrible complexity could have been avoided if we could just write:

«This is a string containing an «embedded» quoted string»

Alas.

eesmith 280 days ago

The complexity might be minimized, but not avoided. You would still need an escape mechanism for something like «She said «The \» key on the server doesn't work.»»

ASCII did add <>, [], and {}, any of which could have been used for quoted strings, had the programming language designers chosen that option.

https://en.wikipedia.org/wiki/String_literal#Paired_delimite... points out that PostScript and Tcl have a string literal which allows matched quotes.

PostScript: (The quick (brown fox))
Tcl: {The quick {brown fox}}

stormbrew 279 days ago

Ruby lets you use arbitrary tokens for string literals with %s{} (where the braces can be a bunch of things). I wish more languages would adopt this tbh.

vorg 279 days ago

Apache Groovy also had that in its early 1.0 betas but they were removed before their official 1.0 release party.

__david__ 279 days ago

Ruby lifted that from Perl.

say qq<I can do '" in here>;

petters 279 days ago

C++ has something similar.

lisper 280 days ago

> You would still need an escape mechanism for something like...

Yes, but that's a pretty rare case, much more so than embedded strings.

Even that case could be solved by having two different quotes, like Python which allows both 'string' and "string". So you could do:

«This is a string that mentions the ” character without escaping it»

“This is a string that mentions the « character without escaping it”

Yes, there are still some edge cases, like embedding both “ and « in the same string. But that's really rare.

PickNChoose 280 days ago

You don't want any 'rare' cases at all. That's the point.

Stop using "punctuation" when you are attempting to "delimit" text. Use a character that is not punctuation, specifically designed for "field delimiter" purposes.

Trying to do two things at once is ridiculous.

panic 279 days ago

If your text is always valid UTF-8, there are various illegal UTF-8 octets available for this purpose: 0xff, 0xfe, and so on. Unlike null terminators or record separator characters, these characters are guaranteed not to exist in your string by the UTF-8 validation code you're already running.

diamondo25 280 days ago

I've been trying to push TSV (tab separated values) as a standard response/implementation when they ask for CSV. "Yes but its comma separated!", sure is, but text can contain commas... I have seen issues with Google Spreadsheets not recognizing the tabs however... Excel doesnt know what to do with a TSV either. But both have a complete wizard for parsing CSV...

ComputerGuru 279 days ago

> but text can contain commas

Erm.. text can contain tabs, too. This problem was solved so, so long ago when all the various ANSI/ASCII/whatever encodings were compiled by specifically reserving not one but two characters precisely to serve as field and record delimiters.

0x30 and 0x31 solved not only the problem of having commas or tabs in your text preventing you from treating them as field delimiters, but also allowing you to include new lines and carriage returns in your fields, too!

0x30 is the unit separator (aka field delimiter) and 0x31 is the record separator (aka, well, the record separator).

I _believe_ there was a record key on some standardized keyboard layout back in the day, too.

Edit: sorry, they are decimal, not hex. Thanks @jrochkind1

jrochkind1 279 days ago

I find no matter what you do you will _sometimes_ need escaping. There will, eventually, come a time when you want to embed an ASCII 30 (0x usually means hex, it's actually decimal code 30, but hex 0x1E) RS Record Separator in some record delimited by 30 RS. So you'll need some method of escaping anyway. Or it'll be annoying.

I have spent some time working with MARC 21 binary encoding (used for library cataloging records) which uses ASCII 0x1D, 0x1E, and 0X1F as delimiters. I would def not call it appreciably more _convenient_ than a more modern 'text' record format. If it has benefits, convenience isn't really one of them.

nitrogen 279 days ago

I think it's common to use ESC (0x1b) and then set the high bit on the next byte, so ESC itself would be sent as 0x1b, 0x8b.

Pxtl 279 days ago

Yes, but at that point the text file is basically binary - it contains exotic characters that confuse most text editors and can't be typed.

I know XML et al are frustrating, but I'd rather see them than a "creative" solution. It seems like 60% of the reason we still have to deal with archaic flat formats is support for Excel.

alanh 279 days ago

I have to suspect the fatal flaw is that these code points don’t look like anything and can’t be found on the keyboard.

Granted, that’s the whole point, but it also makes authoring and instruction harder. (And we all know how many programmers are really just competent copy-pasters.)

taftster 279 days ago

Right. And if you add 0x28 (file separator) and 0x29 (group separator) to the mix, then you have a whole set of nice options to concatenate multiple data files into a single stream, etc.

coldtea 279 days ago

There is actually an ASCII character that doesn't appear in strings, and is meant to be used as such a separator. Actually two of them, record (30) and unit separator (31).

slavik81 279 days ago

Sadly, eventually someone will want to enter one document as a field in another document and then you end up needing escaping anyways. Using a rare symbol for the delimiter would still be nice for typing documents by hand, but it would have to be available on modern keyboards to be convenient.

coldtea 279 days ago

>Sadly, eventually someone will want to enter one document as a field in another document and then you end up needing escaping anyways.

Yes, but CSV files are record collections, they are not in 99% of cases recursive like that.

If a column contains escaped secondary documents, there's something wrong.

laumars 279 days ago

Excel will convert any tabulated text file into a spreadsheet regardless of the delimiters, or even lack of, as you can set which character(s) to delimit by or even just go by column numbers for tables of fixed widths. This is actually one of the few things Excel gets right with regards to CSV files as I've found it a horrid tool if you need to save any changes and preserve the original formatting of the CSV file (even the data itself gets altered!!)

Also most CSV parsers support quotation marks and escaping to get around the comma and new line et al problems. eg:

"Homer Simpson", "742 Evergreen Terrace,\nSpringfield"
"Bart \"El Barto\" Simpson", "742 Evergreen Terrace,\nSpringfield"

Granted it's not the prettiest and some spreadsheets really love to break the formatting upon save (cough Microsoft Excel cough) but it does work.

As a side note, the best spreadsheet I've found for manipulating CSV data without breaking the formatting upon saving was OpenOffice Calc. This was a few years back before the LibreOffice fork was created as I've thankfully not needed to deal with CSV files large enough to warrant a full blown spreadsheet editor, but I would assume LibreOffice Calc would behave the same.

ygra 279 days ago

Homer Simpson,"742 Evergreen Terrace,
Springfield"
"Bart ""El Barto"" Simpson","742 Evergreen Terrace,
Springfield"

(Omitted optional quotes for fields that don't need them). Quotes are escaped with "", and line breaks don't need escaping, they just have to be in a quoted field. And there is no space after a comma, except you want that space to be part of the field's value.

laumars 279 days ago

Thanks for the correction regarding escaping, but I think you went a little overboard on the other alterations:

> Omitted optional quotes for fields that don't need them

I think it's good policy to always wrap your contents in quotes regardless of whether you have a delimiter that needs quoting. And in fact many CSV marshallers will do just this.

> And there is no space after a comma

That was added purely for readability on HN. I agree it's not how you'd normally marshal the contents.

emmelaich 279 days ago

If it's for my own programs, I use pipe (|) separated values. They're visually appropriate and even less likely than tabs.

spc476 279 days ago

ASCII does contain control characters set aside for record and unit separators (codes 30 and 31 respectively). Sometimes I wish they got more use than they do.

Pxtl 279 days ago

Except pipe is really easy to get as a typo. It's right next to the enter key. And then you're dealing with escaping characters and before you know if you've rolled your own file format.

Been there. Use a lib that implements a documented standard, even a bad one. Only problem is Excel, which basically standardizes on CSV and occasionally mangles your data into malformed dates because reasons anyways.

jl6 279 days ago

What's the problem you're having with Excel reading TSV? Works fine here.

Avshalom 279 days ago

Most *SV importers will actually accept any character as the delimiter, it's just people insist on believing CSV is utterly trivial and thus not worth using a real library for it.

xelxebar 279 days ago

> You would still need an escape mechanism for ...

I think this is actually desirable, since in your case the escape denotes different semantics. The unescaped pairs act like quotation operators while the escaped version is a character literal.

dragonwriter 280 days ago

Also Ruby:

%q{This is a string with an %q{embedded quote}.}

stephengillie 280 days ago

Powershell: "This is a string with an 'embedded quote'."

It's helpful to remember that quotes will interpret the variables inside, while apostrophes will not. Very useful for scripting the creation of scripts. Example:

"It is $time" > It is 15:22 'It is$time' > It is $time "'$time' is $time" > '$time' is 15:22

dragonwriter 279 days ago

That's not a string where the same thing used for quoting is used inside the string without escaping, nor is it an example of the distinct begin-vs-end quote pairs approach under discussion.

But, yes, having single and double quoted strings is another way to avoid escaping (which Ruby and a number of other languages discussed as supporting the approach being discussed also support.)

alanh 279 days ago

not sure why you were downvoted. Your comment is relevant and the convention can be useful. (PHP worked exactly the same way as your first two examples, although the third would have produced <'15:22' is 15:22>.)

zaxomi 280 days ago

Actually, ASCII have mechanism for solving the problem that you describe, with control codes FS, GS, RS and US.

jiggunjer 279 days ago

I disagree. Sure it might regex better, but my typing speed and typo rate would be much worse if I had to type separate open and close quotes for all my strings.

coldtea 279 days ago

>All this horrible complexity could have been avoided if we could just write

Only if there was no chance of unbalanced quotes to need to be in the string.

mort96 279 days ago

You'd still have escape sequences for those cases.

tome 280 days ago

If ASCII had balanced quotes then they would be used by programming languages to delimit strings and we would be back to square one with regards to escaping them!

hk__2 280 days ago

You don’t need escaping in «This is a string containing an «embedded» quoted string».

cassowary 279 days ago

«Hi. How do I open a quote?»

«Oh, you just use the « character.»

Parse error. Unexpected EOF.

hk__2 278 days ago

That wouldn’t be a good idea but you could adapt your parser to support that case. You can write /* /* */ in C, for instance.

To clarify: we’d still need escaping but in fewer cases.

tedunangst 280 days ago

Debatable whether that would actually work in practice.

/* nested /* comments */ don't work */

int_19h 279 days ago

They don't work in C, but that's an arbitrary decision made by its designers. There are many languages that have balanced comments that can be nested. In OCaml, for example, this is legal:

(* nested (* comments *) work *)

drdeca 280 days ago

I think it would make the parsing a tiny bit more inconvenient though

Pxtl 279 days ago

Yup, suddenly your parser needs to keep a count of how many nested strings deep it is.

munificent 279 days ago

It's not hard. Obviously the parser has to do that things that aren't a single token like parenthesized expressions.

Even for things where the nesting does happen during lexical analysis, it's pretty trivial to keep a count in your lexer. Lots of languages support nesting comment syntax or string interpolation, which both have equivalent difficulty.

userbinator 279 days ago

Don't forget that "parser" also includes human brains, which tend to not be that great at parsing nested things.

To use formal language theory, strings containing escape characters are regular, i.e. parseable with a finite-state machine. Allowing nesting means you need a stack to find the matching pair.

hk__2 278 days ago

It’s trivial to do; parsers handle nested things quite easily.

jrochkind1 279 days ago

ruby does that. computers are pretty fast now.

Someone 279 days ago

So, how would you encode this in a string:

To end a string, use the » character.

?

hk__2 278 days ago

You escape it. What I was saying was nested strings wouldn’t need escaping.

vvanders 280 days ago

Yeah, seen a ton of tools that auto-format to left/right quote automatically but then output ASCII and mangle the conversion.

279 days ago

agumonkey 280 days ago

let's rewrite social idioms to use < > as quotes.

Symbiote 280 days ago

«These characters» are the usual way of quoting in several languages.

Pxtl 279 days ago

Well then it's a good thing c used its ASCII equivalent that's accessible on anglophone keyboards for bit shifting and so if any programming language tried to use <<strings>> you'd have c grognards screaming about lshift and rshift.

cgtyoder 279 days ago

> The fact that ASCII does not have balanced quotes is one of the great catastrophes of computing.

Okay.

peapicker 280 days ago

I'm pretty sure text like:

quoted''
Is how you're supposed to write short quotes in the TeX/LaTeX typesetting system.

[edit: My point being that the author seems to think this type of quoting originated with X11... which is actually newer than TeX (X11 was first released in 1984), and that the prevalence of this type of quoting likely originated with TeX when it was released in 1978... which isn't mentioned at all in the article. In fact, since TeX/LaTeX is what all the CS, Physics, and Math types were using for journal articles, it is likely the X11 font bitmap glyphs were intentionally shaped like curly quotes to make editing your TeX source files prettier.

At least, that's how I remember it...]

leephillips 280 days ago

Interesting historical note. Of course that's TeX input, and the author using it knows that it will be interpreted in TeX's special way and the correct characters used in the typeset output. Also, with the current Unicode-aware TeX engines, you can just input the normal Unicode quotation marks. That makes your source easier to read.

ams6110 280 days ago

Yeah but that gets rendered as the proper quote glyph in the final document.

gbacon 280 days ago

Another giveaway of a TeX-savvy writer out of water is when you see --- for em-dash, i.e., ‘—’.

gbacon 280 days ago

Does HN markdown understand &mdash; or &lsquo;?

EDIT: Nope.

koolba 280 days ago

It doesn't need to. You can put the mdash directly in comments: —

As opposed to regular dash: -

PeterisP 280 days ago

Well, since my keyboard doesn't have an mdash key, if there's no support for something like &mdash; or --- then I can't use mdashes.

At least on the US keyboard, Mac OS lets you input all three dash types.

Hitting the button to the right of [0] outputs a hyphen ("-"). Holding alt/option when hitting it will output an en dash ("–"), holding both alt/option and shift will output an em dash ("—").

- = -

⌥- = –

⇧⌥- = —

pluma 279 days ago

If you're on Linux (maybe macOS supports this too?) you can open the keyboard settings and turn one of several keys into the Compose/Multi key (I picked CapsLock because I rarely hit it on accident and don't use it). Then you can type all kinds of weird combinations:

https://www.x.org/releases/X11R7.7/doc/libX11/i18n/compose/e...

For example, em dash is Compose+minus+minus+minus, en dash is Compose+minus+minus+period.

It's also useful for foreign glyphs like Compose+a+a for the Nordic å, Compose+s+s for German ß (with Shift it becomes ẞ, which the new German orthography rules officially recognise!), Compose+quote+<vowel> for the various umlauts and Compose+i+period for the dotless ı (with Shift it becomes the dotted İ).

yorwba 279 days ago

The best way to enter arbitrary Unicode characters on Linux I have found is fcitx's https://fcitx-im.org/Unicode

Just press Ctrl-Alt-Shift-U, type (part of) the name of the character and select from the list of results.

— (em dash), ︱ (presentation form for vertical em dash), ⤐ (rightwards two-headed triple-dash arrow), ﷽ (arabic ligature bismillah ar-rahman ar-raheem) are all easy to enter.

Tomte 279 days ago

I'm using AutoHotKey on Windows. Not only for this, but for other things, as well.

For example (<^>! means the AltGr key):

<^>!4::Send „
<^>!5::Send “
<^>!2::Send ‚
<^>!3::Send ‘
<^>!+6::Send “
<^>!+7::Send ”
<^>!+8::Send ‘
<^>!+9::Send ’
<^>!-::Send –
<^>!.::Send …

Also to use Caps Lock as another Control key:

Capslock::Ctrl

Or make windows stay on top of others, if even they lose focus:

<^>!t::Winset, Alwaysontop, TOGGLE, A

dghf 280 days ago

Fire up vim, use a digraph (^K followed by the two relevant characters -- I believe em dash is -M -- in insert mode), copy & paste.

leephillips 280 days ago

Nobody's keyboard had an em-dash key (well, maybe some compositor keyboards...). It's well worth the time to figure out how to enter at least the most useful Unicode characters. See, here's an em-dash: —.

Symbiote 280 days ago

Linux users should enable the compose key, it's very useful.

Test it with:

Then press the menu/compose key (next to right control), then C, then =. You get €.

Try compose, 1, 2 for ½.

Compose, ^, 3 for ³.

Compose, A, : for Ä.

It's pretty intuitive for the most useful characters, and easily the fastest way I have of typing the ö, ñ and å in various colleagues' names.

jgtrosh 280 days ago

Actually, A: is not valid with the default compose bindings (it is however valid with vim digraphs). In general umlauts or trémas are inserted with the double quote (A").

YSFEJ4SWJUVU6 280 days ago

Worth noting that international layouts (with dead keys) support many of those without the use of compose key (for obvious reasons not your first example – but the € symbol has its own key combination at least where it's commonly used). I admit it might be hard to adjust to caret being a dead key, though.

Symbiote 279 days ago

Dead keys support accents, and a few symbols printed on the keyboard are discoverable, but I find the Compose key is much more intuitive for occasional use.

I don't yet speak the language of my adopted country, so it's better for me to keep []{} etc where I like them in the British layout, and use three keypresses for typing the ø in a (place) name like København.

If I do end up typing lots of Danish, I'll probably map AltGr+A,E,O to Å, Æ, Ø. É is rare, so I'll still use the Compose key for that and German / Swedish names.

pluma 279 days ago

I use the default German keyboard layout. Even with dead keys I wouldn't have Nordic characters or the Turkish dotless i and dotted I. I love the Compose key. It's actually making me consider using the US layout because the lack of German glyphs was what was keeping me back.

ComputerGuru 279 days ago

I use the code behind this web app: http://latex2unicode.herokuapp.com/

It converts latex to unicode where possible. It's pretty impressive how much of Latex can be replaced with Unicode today.

This program translates LaTeX markup to human-readable Unicode when possible.

Here's the default text from that webapp:

Basic math notations: ∵ A͡B + B͡C ≠ A͡C ∴ ∬∜x̅ ξᶿ⁺¹ - ⅜ ≤ Σ ζᵢ ∴ ∃x∀y x ∈ Â

Easily type in hundreds of other symbols and special characters: , ℵ, Œ, ⇊, etc.

Font styles support: 𝔹𝕝𝕒𝕔𝕜 𝔹𝕠𝕒𝕣𝕕 𝔹𝕠𝕝𝕕, 𝔉𝔯𝔞𝔨𝔱𝔲𝔯, 𝐁𝐨𝐥𝐝 𝐅𝐚𝐜𝐞, 𝓒𝓪𝓵𝓵𝓲𝓰𝓻𝓪𝓹𝓱𝓲𝓬, 𝐼𝑡𝑎𝑙𝑖𝑐, 𝙼𝚘𝚗𝚘𝚜𝚙𝚊𝚌𝚎.

Now type in this box and try it yourself. ⌣̈

alanh 279 days ago

Careful — some of the Unicode pseudoalphabets won't render on mobile.

ComputerGuru 278 days ago

No problem on iOS; I see the same thing that I see on my Windows desktop.

280 days ago

mark-r 280 days ago

If I need a special character, I find a web page or document that has it and use copy/paste.

jiggunjer 279 days ago

I have an A4 pinned next to me with all the windows Alt codes. Old-school methods are fastest and I subconsciously learn them by heart over the years.

mark-r 279 days ago

Nice idea. I don't need them often enough to bother. Do you have a source for the chart?

devindotcom 280 days ago

Yeah. Can't count how many times I've just googled "yen" or "e with accent" because I can't remember the shortcut.

fish_fan 280 days ago

Even then the character pallete built into MacOS is much more convenient for searching, saving, and inputting.

mark-r 280 days ago

The character palette built into Windows uses too small a font, I find it almost useless. And the size is not adjustable. I've been tempted on more than one occasion to write my own.

jiggunjer 279 days ago

Keyboards and typing aids can enter any character if you configure them properly.

alanh 279 days ago

not that HN needs more pedantry, but HN’s lightweight markup format is not in any sense a Markdown. I believe that literally the only thing they have in common is that a single set of asterisks yields italics. Bolding, headings, code, lists, quotes, links, etc. don’t transfer from one to the other.

fish_fan 280 days ago

That's because single quotes used to be rendered as a right single quote (as you might have in a contraction), and the backtick was angled much less aggressively. That is, it looked much more natural at the time.

emmelaich 279 days ago

Yeah, I think the motivation for ' is for markup too. I'm pretty sure they've been recommended in GNU info and groff for that reason.

ISL 280 days ago

It is, and denotes opening and closing quotes.

dheera 280 days ago

I always hated this horribly inconsistency.

\left( \right)
\left{ \right}
 ''

Why not

\left" \right"
\left' \right'

Better yet, make it completely DRY:


\{ \}
\ \''
\ \'

dungle6 280 days ago

The author (Markus Kuhn) almost certainly knows about this (I assume based on his prolificness in publishing tech documents nearly 20 years ago). But the use of grave accent is sort of a different problem since it is merely the input to the TeX language that is then properly interpreted in a correct way. I do wonder about your story of X11, since although when X11 was released in 1984 LaTeX was not a thing and TeX did not have quite the widespread usage it did through the 90s. troff also uses grave accents for what it is worth.

darkengine 280 days ago

MS PGothic, a very common font in Japan, still uses this type of quote. "Quoting like this'' (double quote, then two single quotes) looks the most natural in this font. "Using two double quotes" looks quite odd (see screenshot) [1]

If you've ever seen an English-language page on a Japanese website that used weird quotes, this is probably why.

boondaburrah 280 days ago

Ah, the old dead giveaway "this game was translated from japanese and we CBA to handle localisation properly" fonts.

zeratax 275 days ago

So I just saw parentheses like this "(）" on japanese twitter

it starts with a regular parenthesis "(" but ends with a fullwidth parenthesis "）"(U+FF09)

is this a similar thing?

ttepasse 279 days ago

The usage of of an accent as syntax in markup and programming languages annoys me to no end. And it will still be used, to this day, the latest example are template string in Javascript.

• It is semantically idiotic because it's an accent, not a character.

• It is visually annoying because you almost can't see the thing.

• It is bad for usability, because on non-US keyboards the accents are implemented as dead keys. Yes, accent + space gives you the character but that's really unintuitive for people who grew up expecting accents only over letters.

evincarofautumn 279 days ago

Same, I’ve never cared for it. For these reasons I’ve decided to take a stand and avoid using the grave accent for anything in a programming language I’m working on. Same goes for the dollar sign, because it’s somewhat Americentric, and as a currency character it doesn’t have any great semantic or mnemonic value except for, well, currency units. I guess you could argue for $trings (BASIC) or$calars (Perl) if you have \$igils, but I don’t.

Sacrificing these bits of ASCII is fine by me, because the language is small enough, and I also allow Unicode. For example, curved quotes are allowed and can be nested or contain ASCII quotes without escaping:

// Character literals
‘'’
=
'\''

// Text literals
“Some "text" with “curved quotes”.”
=
"Some \"text\" with “curved quotes”."

For the sake of usability, of course, everything in the core language & standard library has an ASCII spelling, like in Perl6. I’d like for other languages to adopt this view as well. If new languages allow proper Unicode notation in some sensible places, then programming editors’ input methods will catch up, e.g., automatically replacing “->” with “→” or “\theta” with “θ” (like Emacs’ TeX input mode).

Also, does anyone know of a reference for keyboard layouts from around the world that includes estimates of the number of people using them? I’ve tried to keep things relatively easy to type on all the major layouts I know of, but I don’t want to alienate anyone if I can help it.

int_19h 279 days ago

> Same goes for the dollar sign, because it’s somewhat Americentric, and as a currency character it doesn’t have any great semantic or mnemonic value except for, well, currency units.

By that metric, wouldn't & be too Anglo-centric, and # be too Euro-centric? There are layouts out there on which neither is readily available.

evincarofautumn 279 days ago

I suppose so. It’s just one of the many small judgement calls you make when designing a language, and definitely falls into the category of “design” more than engineering or science. At some point I decided that grave and dollar were out, while ampersand and octothorpe are in. And you can still define a dollar-sign operator if you want, it’s just not in the core language or standard library.

English is the lingua franca of programming, so it’s hard to avoid some Anglicisms (like ampersand meaning “and”, dot instead of comma for decimals, and English-language keywords) without going against strong precedents set by other languages. If I really wanted to be pedantic, I might use /\ and \/ for logical “and” and “or”—those spellings are the major reason that the backslash even exists in ASCII.

sjy 279 days ago

If anything, & is Latin rather than English :-)

garou 280 days ago

It's very odd for me to see the grave accent () as quoting mark in bash and other programming languages. I understand that the accent alone lose its function for the human language. But still uncomfortable to se an accent as delimiter to a string.

unkown-unknowns 280 days ago

Well, we also use the dollar sign to signify a variable in bash and PHP even though we aren't talking about an amount of USD. Likewise we use single and doublequote to mean special things.

HTML tags have nothing to do with less than or greater than, yet here we are.

In conclusion, it's simply convenient to use the standard keys we have and to use the symbols that are on it to mean something different from their original meaning in order to be able to express ourselves succinctly so that we don't have to spend so much time typing as we'd otherwise have to.

Of course you could always buy yourself an APL keyboard and write your programs in APL and use an APL REPL as your command line instead of using bash ;)

sp332 280 days ago

I'm not even sure why ASCII has a grave accent. There are no combining marks so you could never write it over another letter.

Edit: I forgot HTAB was actually part of ASCII. Oh well!

electroly 280 days ago

On a teletype, ALL characters are combining marks because you can backspace (another ASCII character derived directly from teletype codes) and type another character overtop it.

reaperducer 279 days ago

Are you old enough to remember when you printed your code on a teletype machine that blank spaces were represented by a "b" with a slash through it? I hated that.

Even worse, I remember one shop where the teletypes didn't have question marks, so people used capital P's instead.

280 days ago

saint_fiasco 280 days ago

In ASCII instead of combining marks what you have to do is write three characters:

* The unaccented letter

* The backspace character

* The accent character

If this makes no sense to you, try to imagine a literal, physical typewriter. Windows line terminators also work with a similar principle.

cesarb 280 days ago

From what I recall from my childhood, physical typewriters worked slightly differently: the accent keys were non-advancing ("dead") keys. You pressed the "acute" key followed by the "e" key for an é, for instance. If you wanted a bare accent, you pressed the accent key followed by the space bar.

(The typewriters I recall also didn't have a 0 or 1 key, you used uppercase O or I for these numbers.)

kps 280 days ago

Yes. I consider it a mistake of Unicode that combining characters follow rather than precede the base character. If they preceded, most dead keys could simply generate the appropriate combining character, rather than requiring complicated input method support. (And finding the end of a sequence of multiple combining character wouldn't require lookahead.)

moron4hire 279 days ago

The Unicode way makes sorting easier. Your way would require special knowledge about the characters to know that ä should sort directly after a, rather than directly before ë.

jrochkind1 279 days ago

In fact this requires special language-specific knowledge anyway (which unicode provides in some tables and algorithms actually). In some languages ä should sort exactly as if it were 'a'. "aa", "äb", "ac". In others it should sort as a distinct letter (but not necessarily between 'a' and 'b'). Different Latin languages sort differently, I'm not sure if exact UTF-8 (or UTF 16 or UTF 32) byte ordering is actually appropriate collation in any latin-alphabet language.

But I do suspect it had something to do with ascii compatibility, I don't recall what. Very little of unicode is accidental, there's usually some reason for whatever in it.

cassowary 279 days ago

some languages even sort

aa ah az ba bh bz ca cz ch

treating "ch" as a single letter that comes between c and d.

or

ab ah az b c .... z aa

treating aa the same as a separate letter at the end of the alphabet.

Then there's other rules for sorting that aren't directly alphabetic, like that names beginning with "Mc" should be treated as "Mac" or "St " as "Saint ".

"10 cats" should sort after "2 cats", not before it.

Anyone who tries to sort by just numeric ordering is doing it wrong.

kalleboo 279 days ago

Except not really, since in Swedish, it's sorted xyzåäö, not aåäbc. Also, it used to be that w and v were equivalent sorting-wise and you'd mix them together.

eesmith 280 days ago

I believe that depended on the manufacturer and country convention. Most US keyboards didn't have an accent character. For example, here's one from the 1950s:

For acute or umlaut you could use a + backspace + ' or u + backspace + " (or the opposite order). For grave or circumflex, I don't think there was a solution. Write it in by hand?

cassowary 279 days ago

When I was in school back in the 1990s, that was certainly the approach taken for the Vietnamese edition of the school newsletter.

mxfh 280 days ago

' PRIME (U+2032)

" DOUBLE PRIME aka inch mark (U+2033)

have their own codepoints

http://practicaltypography.com/foot-and-inch-marks.html

which describes implications for typesetting coordinates and other things:

118° 19′ 43.5″

118° 19’ 43.5” wrong (curly quotes, although it renders identical in some fonts)

118° 19' 43.5" right

treve 280 days ago

It just occurred to me how much easier certain text-operations (like syntax highlighting, regular expressions and other parsers) if we consistently used the right unicode symbols for quotes and apostrophes

mbrock 280 days ago

The only languages I know off the top of my head that use balanced delimiters for strings are M4 and Perl 6.

Hey, imagine being able to nest strings without escaping! What a concept!

chaosfox 280 days ago

Perl does that as well, and you can even choose the delimiters you wanna use:

> For the constructs except here-docs, single characters are used as starting and ending delimiters. If the starting delimiter is an opening punctuation (that is (, [, {, or < ), the ending delimiter is the corresponding closing punctuation (that is ), ], }, or >). If the starting delimiter is an unpaired character like / or a closing punctuation, the ending delimiter is the same as the starting delimiter. Therefore a / terminates a qq// construct, while a ] terminates both qq[] and qq]] constructs.

kazinator 279 days ago

PostScript! It uses (...) for strings.

Nesting string literals without escaping is a somewhat poor concept, though. Firstly, what does that even mean? Given abc def' ghi', what is the string here? Is it abc def ghi or is it abc def' ghi? Secondly, what if I want to just have an unbalanced  character in the string data?

And every time I get in an argument with a poorly-escaped CSV file, I wish we had just used ASCII 28-31 as delimiters. (File, Group, Record and Unit Separator)

ratmice 280 days ago

FYI For a long time GNU coding standards prescribed using the grave accent, but this changed some years ago now

https://www.gnu.org/prep/standards/html_node/Quote-Character...

josteink 280 days ago

> Although GNU programs traditionally used 0x60 (‘’) for opening and 0x27 (‘'’) for closing quotes, nowadays quotes ‘like this'’ are typically rendered asymmetrically, so quoting ‘"like this"’ or ‘'like this'’ typically looks better.

Is this link saying I can quit using QUOTES' in my Emacs-documentation? That style always struck me as odd :)

13of40 279 days ago

CSB: Years ago I was working on a team that developed a scripting language and we had this recurring problem where someone would write up a code sample in a Word document and it would break if you cut and pasted it because all of the single and double quotes would be Unicode. My boss was this tough guy who tried to snap the whole team to a standard of strictly disabling that behavior in all of our Office applications, but I piped up and said maybe we should just make the language treat all of those characters like apostrophes and quotes.... I think around version 5 they finally made an API for doing proper anti-injection escaping because you pretty much needed a PhD to get it right due to all of the variations introduced by the extended characters.

Or ... use a text editor?

13of40 279 days ago

You know a lot of PMs who write specs in notepad?

279 days ago

jiggunjer 279 days ago

What bothers me about Unicode isn't that apostrophe (U+0027) is overloaded by having two semantic meanings ("apostrophe" or "single straight quote"), but that they exacerbate the confusion by recommending to overload "right single quote" (U+2019) to also mean apostrophe.

We now have two characters for apostrophe and extra ambiguity for processing correct right single quotes. Great job not breaking historical documents Unicode.

Loic 279 days ago

And now, imagine that your own name has an apostrophe in it. Like my family name. I can tell you, I crashed many databases and in 90% of the cases where people need to find again my name in a database, it is ending up with requesting my address because each time a different character is put by the clerk doing the data entry and they cannot match my name. Even state level authorities are bad, really bad, at it.

kazinator 280 days ago

> Please do not use the ASCII grave accent (0x60) as a left quotation mark together with the ASCII apostrophe (0x27) as the corresponding right quotation mark (as in quote').

Tell that to GCC:

/usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o: In function _start':
(.text+0x18): undefined reference to main'
Looks good to me, by the way.

> Where quoting like this'' comes from

I did it for a while out of a habit acquired from working with TeX. In TeX, it is the source code syntax for encoding quotes. Of course, it is lexically analyzed and converted to proper typesetting.

> If you can use only ASCII’s typewriter characters, then use the apostrophe character (0x27) as both the left and right quotation mark (as in 'quote').

It looks like shit in any font in which the apostrophe is a little nine, which is historically correct. What you want is a little "six" on one side and a "nine" on the other, or at least some approximation thereof. Even if the apostrophe is crappily rendered as a little vertical notch, it still pairs with a backwards-slanted .

(The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)

> If you can use Unicode characters ...

then you should still stick to ASCII unless you have other good reasons to. Can'' is not the same thing as should'', let alone must''.

> For example, 0x60 and 0x27 look under Windows NT 4.0 with the TrueType font Lucida Console (size 14) like this:

The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.

Freak_NL 280 days ago

> then you should still stick to ASCII unless you have other good reasons to.

Why? Using non-ASCII Unicode characters acts like a nice canary for detecting character encoding issues. Besides, why would I purposely limit my text to ASCII? It doesn't even suffice for English, let alone almost any other language I use ­— including my native language Dutch, German, and Japanese.

kazinator 280 days ago

All sorts of reasons. Diagnostic printf message in some embedded firmware. Do you need to drag Unicode into it? Git log message. Ditto.

comex 279 days ago

> Diagnostic printf message in some embedded firmware. Do you need to drag Unicode into it?

Why not? The firmware itself would usually have no reason to care about the details of a diagnostic message's encoding, whether that be ASCII or UTF-8 - it can mostly just treat strings as bags of bytes. There might be some byte values that are special (nul terminator, % for printf, etc.), but UTF-8 is a superset of ASCII and represents extended characters using only bytes with the highest bit set, so there will never be 'false positives' of the special byte values. Other than that, the bytes can stay uninterpreted as they go over whatever serial port or diagnostic protocol the device is using, until they eventually show up on - most likely - some sort of terminal application on a modern computer, which probably supports UTF-8 already. So in most cases it should 'just work'.

Of course, there are situations where it won't just work, such as if the firmware needs to display the diagnostic message on a screen (by itself), but from what I've seen those are the minority.

edit: As for Git, what's wrong with people writing log messages in their language of choice? (Other than the social issue of it making it harder for English speakers to use the codebase.)

XaspR8d 280 days ago

Not specifically trying to weigh in on the overall conversation, but aren't git commands generally UTF-8?

> git commit and git commit-tree issues a warning if the commit log message given to it does not look like a valid UTF-8 string, unless you explicitly say your project uses a legacy encoding.

> git log, git show, git blame and friends look at the encoding header of a commit object, and try to re-code the log message into UTF-8 unless otherwise specified.

Zarel 279 days ago

> It looks like shit in any font in which the apostrophe is a little nine, which is historically correct. What you want is a little "six" on one side and a "nine" on the other, or at least some approximation thereof. Even if the apostrophe is crappily rendered as a little vertical notch, it still pairs with a backwards-slanted .

> (The representation of apostrophe as a little vertical notch, I suspect, caters to literals in programming languages.)

"Historically", U+0027 has been used as all of an opening quote, a closing quote, an apostrophe, a prime symbol, an ʻokina, a modifier, etc.

So the historically correct thing is render it as a vertical notch so it looks non-horrible in all these uses, and render U+2018 and U+2019 as the "little nine" and "little six" symbols.

You don't need to speculate what the representation caters to; the Unicode spec actually does explain this (see Unicode 9.0 Chapter 6 Section 2)...

> The idea that people should change their behavior because of which font is default on the Windows cmd.exe console is laughable.

So your alternative is to change behavior because of which font is default on a system from 1984 which no one uses anymore?

kazinator 273 days ago

Open any random book in the English language printed in the last 200 years.

All the apostrophes look like a little nine: in contractions like it's, and the possessive 's.

That's the character that was included in the American Standard Code for Information Interchange.

The glyph appearing in the standard looks like a little 9. It is denoted as "APOS" in parentheses. A reference to it is made in A6.8, calling it "apostrophe".

Wikipedia's (https://en.wikipedia.org/wiki/Apostrophe) page refers to a vertical notch glyph as a "typewriter apostrophe". The normal non-typewriter apostrophe looks like a comma.

Okina? That indicates a glottal stop in some languages none of which are English, and so which were understandably not represented in the American Standard Code.

Zarel 271 days ago

> That's the character that was included in the American Standard Code for Information Interchange.

Yes, and that's the character that was immediately overloaded to used to mean a whole bunch of other things, because ASCII only included 95 printable characters, and did not include a prime symbol, an 'okina, or a left single quotation mark.

For that reason, U+0027 is not an apostrophe anymore. As the only ASCII character that can be used for a long list of uses, it's been massively overloaded, which is why Unicode currently defines U+0027 as a typewriter apostrophe and U+2019 as a real apostrophe.

tedunangst 280 days ago

gcc will output fancy Unicode quotes if you set locale. This of course even more fun if LANG is set incorrectly and you still have an 8 bit xterm; then the entire quoted string just disappears!

sengork 279 days ago

Things become really fun when you're trying to figure out why that command fails when you've copy/pasted it from another application window.

Often it's the quotes which have been silently (automatically) converted to a visually similar (but functionally incompatible) character variant.

Tepix 279 days ago

It seems that half of the people in this company use the wrong acute sign as an apostrophe instead of ' or ’. Unfortunately it's the half that creates presentations and talks to customers.

It looks terrible and to me it's a disgrace!

Example: its versus it's or it’s. (first one is wrong).

ygra 279 days ago

I've seen a café which had its name written in large, lit letters on the façade and it included the following gem: Cafe. Yes, the wrong accent, and not even combined. Easy access to DTP tools (or even a word processor) for the typographically uneducated masses ends up with quite painful results sometimes.

mirimir 279 days ago

In another life, I analyzed enterprise data. Variation in quotation marks was a common problem. I mean, is it "D'arcy" or "D’arcy"? Sometimes, I think, people would mangle data in spreadsheets, with auto-correct on.

alanh 279 days ago

While I can’t expect many to follow suit, I myself often type educated quotes and nice apostrophes. The macOS keyboard combinations (nearly-intuitive combinations of Option-(Shift)-[ and -] for “”‘’) have long been committed to muscle memory. And since nearly all (web) file formats seem to be UTF-8, the days of manually typing &ldquo; and friends are long, long gone.

Benefits of typing and using typographer’s quotes directly in your JS/JSON/HTML/source:

1. No backslashes or other escape sequences needed!

2. WYSIWYG

3. Retina screens and gorgeous modern fonts mean that your sloppy quotes will look extra bad if you just use ASCII quotes

tempodox 280 days ago

I would fain use the curly quotes if only Darwin's groff(1) wouldn't barf on them. For the time being, man pages for one still need to quote like this''.

anjbe 279 days ago

In troff you can escape “ ” ‘ ’ as \(lq, \(rq, \(oq, \(cq respectively.

If you’re writing manpages, though, you should be using the -mdoc macros (https://manpages.bsd.lv/mdoc.html), which have “Dq” and “Sq” macros that wrap the arguments in double and single quotes respectively.

robin_reala 280 days ago

brew install groff? That’ll get you 1.22.3 instead of the default 1.19.2.

gumby 280 days ago

I find it interesting that the article includes a German keyboard that doesn't include the proper ,,'' (or ,') quotation glyphs. However it does include grave and acute accents as well as French primary quotations (<< and >>) though not the secondary guillemots (quotation characters < and >) none of which are used in German text.

And of course I used ascii analogues to type these into HN :-(

YSFEJ4SWJUVU6 280 days ago

>And of course I used ascii analogues to type these into HN :-(

But why, though? To the best of my knowledge, HN supports unicode quite well, including the following quotes: »«›‹„“‚‘ (available with the help of AltGr and sometimes shift from keys y, x, v, b when selecting the German keyboard layout on my computer).

gumby 279 days ago

I'm using a travel laptop on a plane and it came with a US keyboard

rdtsc 280 days ago

> The Unix m4 macro processor is probably the only widely used tool that uses the quote' combination as part of its input syntax; however, even that could be modified via changequote.

I remember staring for a long time at the file when I first saw an m4 macro. My brain was telling, surely this has go to be a typo, but then everything worked as expected. Then I learned that's a proper way of quoting there.

timb07 279 days ago

It's a little bit off-topic since the article was primarily about quotation marks and coding, but it would have been good if it mentioned that an ʻokina (as found in "Hawaiʻi") is neither an apostrophe nor a left quotation mark.

https://en.wikipedia.org/wiki/%CA%BBOkina

dmitriid 280 days ago

It's worse for other languages. Russian quotation marks are « and ». Thanks to early computers being predominantly from/designed in the US, they are now highjacked by American quotes.

Same probably goes for French and other languages with their own sets of quotation marks.

ansgri 280 days ago

«Russian» quotation marks are actually the « French » ones with different spacing. There's another, less used set of quotes in Russian, so called „German“ ones (used as inner quotes and in handwriting). English quotes are widely accepted though.

contingencies 279 days ago

Modern Chinese usage includes all of 《》〈〉「」『』【】“” and probably others, roughly in that order. Modern typographic convention is perhaps 《title》「quote」 but that's surely opine and debatable. Hong Kong and Taiwan have their own typesetting conventions, distinct from mainland China, and in the latter case no doubt influenced by Japanese occupation and cultural inflow (manga, etc.). Historically for most of Chinese history written language had no punctuation, and sentence endings were merely inferred from context, which was historically clearer 也. See https://en.wikipedia.org/wiki/Chinese_punctuation and https://en.wiktionary.org/wiki/%E4%B9%9F#Definitions (definition #4)

jiggunjer 279 days ago

So why isn't there a straight single quotation, but there is a straight double quotation? I get it probably arose from compatibility reasons, but nowadays Unicode should be able to offer something?

P.S. Major coincide I was googling this very question yesterday?

exikyut 279 days ago

For reference, the BIOS text-mode font included with some IBM PCs (I've observed this on NetVistas and ThinkPads myself, at least) renders  as a nice-looking opening quote, and ' looks like a nice closing quote.

audiodude 279 days ago

Honestly I've been seeing quote' in bash and other CLIs for my entire career and always thought they were just funny or strange, but carried no meaning.

jrochkind1 279 days ago

MRI ruby still does this in some error messages. I hate it. Always messing up my copy-and-paste into  markdown too.