SearchSearch   

Entities in alt and title text

 
Goto page Previous  1, 2, 3, 4, 5
   Webmaster Forums (Home) -> HTML RSS
Next:  using tag first time  
Author Message
Helmut Richter

External


Since: Jun 12, 2007
Posts: 18



(Msg. 46) Posted: Thu Aug 02, 2007 5:09 pm
Post subject: Re: Entities in alt and title text
Archived from groups: comp>infosystems>www>authoring>html (more info?)

On Thu, 2 Aug 2007, Andreas Prilop wrote:

> On Wed, 1 Aug 2007, Helmut Richter wrote:
>
> > It does not matter whether a right single quote (rsq) *looks like* an
> > apostrophe, it *is not* an apostrophe!
>
> That's your personal taste. Unicode does not distinguish between
> a single quotation mark 9 and an apostrophe. There is only U+2019.
> http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html
>
> Your personal taste might also call for different characters for
> umlaut and for diaeresis. Nevertheless, Unicode has only one ¨
> and only one ü .

The problem is explained at length in the Unicode standard
<http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf> on page 21 under
"Unification", and it is clear by this description that much of it is indeed
a matter of personal taste.

In this particular case, it is my personal taste that it is an extremely
unfortunate decision to unify a single character (the apostrophe) with one
of a pair of characters (the rs quotes) which come always as a pair.

> Unicode likewise has only one character for dagesh/mappiq/shuruq,
> U+05BC - even if you think they are different.

I think this is the right decision: they are distinguished by context, and
even appear in mutually exclusive contexts. On the other hand, Qamats U+05B8
and Qamats Qatan U+05C7 are distinguished for no identifyable reason. That
is along the lines of distinguishing umlauts from diaereses and is even less
reasonable than would be to distinguish apostrophes according to whether
they denote omission (en "they're"), root/ending separation (en "Peter's")
or phonemic distinction (sw "ng'ombe").

Something that is more parallel to the apostrophe case would be to
distinguish a Paseq U+05C0 whether it appears alone as punctuation or as the
second of two marks which are *together* a cantillation symbol, a so-called
Legarmeh. Would I follow my own logic, I should advocate this distinction.
That would, however, lead to situations where there is no obvious criterion
for the distinction, and encoding text would amount to starting its
interpretation. My personal taste favours Unicode's decision to use only one
code in this case, but I have to admit that I cannot say why.

--
Helmut Richter
Back to top
Helmut Richter

External


Since: Jun 12, 2007
Posts: 18



(Msg. 47) Posted: Thu Aug 02, 2007 5:15 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On Thu, 2 Aug 2007, Andy Dingley wrote:

> On 2 Aug, 14:59, Andreas Prilop <Prilop2... DeleteThis @trashmail.net> wrote:
>
> > That's your personal taste. Unicode does not distinguish between
> > a single quotation mark 9 and an apostrophe. There is only U+2019.
>
> So what's the Unicode 0x0027 apostrophe?

An analogous thing to U+002D which is neither hyphen U+2010 nor minus
U+2212 nor any kind of dash U+2012..14.

--
Helmut Richter
Back to top
Andreas Prilop

External


Since: Jul 04, 2007
Posts: 23



(Msg. 48) Posted: Thu Aug 02, 2007 5:18 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On Thu, 2 Aug 2007, Andy Dingley wrote:

>> That's your personal taste. Unicode does not distinguish between
>> a single quotation mark 9 and an apostrophe. There is only U+2019.
>
> So what's the Unicode 0x0027 apostrophe?

You snipped away my reference to
http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html

U+0027 is the "ASCII apostrophe". All characters from ASCII (and
ISO-8859-1) remain unchanged in Unicode. You need ASCII ' "
in programming languages, in HTML, and the like. But both are
supposed to look straight, not curly like the number 9.

Therefore ' " are not suited for typograpical use where
quotation marks and apostrophe look like numbers 6 and 9.

There is a similar issue with U+002D hyphen-minus.
You must use this (ASCII) character in programming languages,
spreadsheets, etc. for a minus sign. However, there is a real,
typographical minus sign at U+2212.
Back to top
TheBicyclingGuitarist

External


Since: Aug 05, 2007
Posts: 1



(Msg. 49) Posted: Sun Aug 05, 2007 10:15 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On Aug 5, 9:46 pm, "Chris F.A. Johnson" <cfajohn... DeleteThis @gmail.com> wrote:
> On 2007-08-06, The Bicycling Guitarist wrote:
>
>
>
> > "Jukka K. Korpela" <jkorp... DeleteThis @cs.tut.fi> wrote
>
> >> Actually, ' isn't an entity but a character
>
> > Aarrghh...so I am to change all the ' to ’ ?
>
> Why not just use an apostrope (')?
>
> --
Well Chris, I would just use an apostrophe except nobody can agree on
what one is. The one on the keyboard is a vertical one, not a "smart"
one, and apparently there are a lot of people who aren't happy that
the right single quote IS the preferred way to display an apostrophe
using HTML in English.
Back to top
Jukka K. Korpela

External


Since: Feb 13, 2004
Posts: 3794



(Msg. 50) Posted: Mon Aug 06, 2007 12:00 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Scripsit Andreas Prilop:

> On Wed, 1 Aug 2007, Helmut Richter wrote:
>
>> It does not matter whether a right single quote (rsq) *looks like* an
>> apostrophe, it *is not* an apostrophe!
>
> That's your personal taste. Unicode does not distinguish between
> a single quotation mark 9 and an apostrophe. There is only U+2019.
> http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html

That's indeed what the Unicode Standard says, though this is based on a
change that caused (and still causes) much dispute. Pragmatically, it's
something that we need to live with, at the character code level. People may
still treat the two as distinct concepts, even though Unicode has just a
single code point, a single character.

> Your personal taste might also call for different characters for
> umlaut and for diaeresis. Nevertheless, Unicode has only one ¨
> and only one ü .

Well, it's not _quite_ so. You can, if you like, make a distinction with u
umlaut and u diaeresis, even at the character code level, e.g. by
representing the former as U+00FC (in HTML, as ü in a suitable encoding, or
as ü) and the latter as U+0075 U+0308, i.e. as normal "u" followed by
combining diaeresis (in HTML, e.g. as u&#x308;). Although these are
canonically equivalent in Unicode, they are not the same, i.e. a program
(even a web browser) _may_ treat them as different, though you cannot rely
on such treatment in any program that is beyond your control. In practice,
different renderings _may_ result, depending on how a program implements
combining diacritic marks.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Back to top
Jukka K. Korpela

External


Since: Feb 13, 2004
Posts: 3794



(Msg. 51) Posted: Mon Aug 06, 2007 12:08 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Scripsit The Bicycling Guitarist:

> Well there is an entity called apostrophe that I will use there, and
> ’ will only be used for right single quote. I want to be
> technically correct no matter how it looks.

But the entity called apostrophe isn't technically correct. The name is
misleading. It's a historical holdover, and a really long story, but the
bottom line is: Some Unicode names of characters are outright misleading.
They even contain known typos, which will never be fixed, since Unicode
names have been engraved on stone, guaranteed to be stable (even though some
of them are just mad and maddening).

> example from my web site:
>
> “How could you ‘bust’ me, I'm
> irresistible!”

Actually, ' isn't an entity but a character reference. The entity
reference is ' (and it has a sad history of its own). Neither of them
is ever needed in textual content*) of HTML documents, and they denote the
Ascii apostrophe ('), which can be written as such and is _not_ the
recommended (by Unicode) character for punctuation apostrophe.

*) As opposite to attribute values, where they might be needed in the rare
case where the Ascii apostrophe is used as a string delimiter for a value
that contains the Ascii apostrophe.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Back to top
The Bicycling Guitarist

External


Since: Nov 03, 2004
Posts: 121



(Msg. 52) Posted: Mon Aug 06, 2007 12:08 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

"Jukka K. Korpela" <jkorpela.RemoveThis@cs.tut.fi> wrote in message
news:RWqti.202910$SZ.114939@reader1.news.saunalahti.fi...
> Scripsit The Bicycling Guitarist:
>
>> Well there is an entity called apostrophe that I will use there, and
>> ’ will only be used for right single quote. I want to be
>> technically correct no matter how it looks.
>
> But the entity called apostrophe isn't technically correct. The name is
> misleading. It's a historical holdover, and a really long story, but the
> bottom line is: Some Unicode names of characters are outright misleading.
> They even contain known typos, which will never be fixed, since Unicode
> names have been engraved on stone, guaranteed to be stable (even though
> some of them are just mad and maddening).
>
>> example from my web site:
>>
>> “How could you ‘bust’ me, I'm
>> irresistible!”
>
> Actually, ' isn't an entity but a character

Aarrghh...so I am to change all the ' to ’ ?
Again? I had changed them all, then changed them back after discussion in
this forum...so ’ the right single quote IS the "preferred" apostrophe
after all regardless of its name or those who don't like it that way?

Please give me credit for TRYING to code things properly even if I don't
have the expertise of others in this newsgroup.
Back to top
Chris F.A. Johnson

External


Since: Jul 09, 2006
Posts: 320



(Msg. 53) Posted: Mon Aug 06, 2007 12:46 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 2007-08-06, The Bicycling Guitarist wrote:
>
> "Jukka K. Korpela" <jkorpela.TakeThisOut@cs.tut.fi> wrote
>
>> Actually, ' isn't an entity but a character
>
> Aarrghh...so I am to change all the ' to ’ ?

Why not just use an apostrope (')?

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
===================================================================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Back to top
Chris F.A. Johnson

External


Since: Jul 09, 2006
Posts: 320



(Msg. 54) Posted: Mon Aug 06, 2007 1:23 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 2007-08-06, TheBicyclingGuitarist wrote:
> On Aug 5, 9:46 pm, "Chris F.A. Johnson" <cfajohn....TakeThisOut@gmail.com> wrote:
>> On 2007-08-06, The Bicycling Guitarist wrote:
>>
>>
>>
>> > "Jukka K. Korpela" <jkorp....TakeThisOut@cs.tut.fi> wrote
>>
>> >> Actually, ' isn't an entity but a character
>>
>> > Aarrghh...so I am to change all the ' to ’ ?
>>
>> Why not just use an apostrope (')?
>>
> Well Chris, I would just use an apostrophe except nobody can agree on
> what one is. The one on the keyboard is a vertical one, not a "smart"
> one, and apparently there are a lot of people who aren't happy that
> the right single quote IS the preferred way to display an apostrophe
> using HTML in English.

An apostrophe is ' -- it's the same as '. It's not as
sophisticated, typographically, as other characters, but it is
accepted and universally legible, as are the neutral double quotes
("). You are worrying far to much about a very minor point.

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
===================================================================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Back to top
John Hosking

External


Since: Jan 07, 2007
Posts: 318



(Msg. 55) Posted: Mon Aug 06, 2007 2:40 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

The Bicycling Guitarist wrote:
>
> Please give me credit for TRYING to code things properly even if I don't
> have the expertise of others in this newsgroup.

The credit has been applied to your account. If you do not see the
credit on your next statement please notify our office within 5 business
days.

--
John
Pondering the value of the UIP: http://blinkynet.net/comp/uip5.html
Back to top
Harlan Messinger

External


Since: Apr 25, 2004
Posts: 1190



(Msg. 56) Posted: Tue Aug 07, 2007 11:36 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley wrote:
>
> So 8217 is a code point (right single quote). So is 27 a code point
> (apostrophe). As a purely typographic question, we should discuss
> whether one is better than the other for representing apostrophes
> with. I think the answer is fairly clear to that.

As a purely typographic question, codepoints and encodings are of no
interest at all because typographical principles and conventions have
nothing to do with code points and encodings. In typography, at least in
the US English language, a single right quote is a curved symbol printed
exactly the same way as an apostrophe. Do you think in days of
individual pieces of movable type, that the tray held separate bins for
apostrophes and right single quotes and were filled with identical
pieces of type?
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 57) Posted: Wed Aug 08, 2007 4:57 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 8 Aug, 04:36, Harlan Messinger <hmessinger.removet... DeleteThis @comcast.net>
wrote:
> As a purely typographic question, codepoints and encodings are of no
> interest at all because typographical principles and conventions have
> nothing to do with code points and encodings.

Typography has nothing to do with encodings, but a lot to do with code
points. The code point is the ordinal and defining identifier for the
"thing" that you will turn into a glyph. It is, if you like, the empty
boxes in your typecase. The shape of the type defines the glyph, but
how you choose it and whether you can distinguish them depends on the
code point. There is no "glyph identifier" other than this.

> Do you think in days of
> individual pieces of movable type, that the tray held separate bins for
> apostrophes and right single quotes and were filled with identical
> pieces of type?

Linotype did (for a typeface where the glyphs were the same)
Back to top
Harlan Messinger

External


Since: Apr 25, 2004
Posts: 1190



(Msg. 58) Posted: Wed Aug 08, 2007 11:55 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley wrote:
> On 8 Aug, 04:36, Harlan Messinger <hmessinger.removet... RemoveThis @comcast.net>
> wrote:
>> As a purely typographic question, codepoints and encodings are of no
>> interest at all because typographical principles and conventions have
>> nothing to do with code points and encodings.
>
> Typography has nothing to do with encodings, but a lot to do with code
> points. The code point is the ordinal and defining identifier for the
> "thing" that you will turn into a glyph.

So--typography didn't pre-exist codepoints? Gutenberg had codepoints?

> It is, if you like, the empty
> boxes in your typecase. The shape of the type defines the glyph, but
> how you choose it and whether you can distinguish them depends on the
> code point. There is no "glyph identifier" other than this.
>
>> Do you think in days of
>> individual pieces of movable type, that the tray held separate bins for
>> apostrophes and right single quotes and were filled with identical
>> pieces of type?
>
> Linotype did (for a typeface where the glyphs were the same)

When someone would drop his tray and have to put the type back into the
box, how did he know which of the pieces of type were supposed to go
into the apostrophe bin and which into the right single quote bin?
Back to top
Ben Bacarisse

External


Since: Feb 05, 2006
Posts: 84



(Msg. 59) Posted: Wed Aug 08, 2007 5:46 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley <dingbat.RemoveThis@codesmiths.com> writes:

> On 8 Aug, 04:36, Harlan Messinger <hmessinger.removet....RemoveThis@comcast.net>
> wrote:
>> As a purely typographic question, codepoints and encodings are of no
>> interest at all because typographical principles and conventions have
>> nothing to do with code points and encodings.
>
> Typography has nothing to do with encodings, but a lot to do with code
> points. The code point is the ordinal and defining identifier for the
> "thing" that you will turn into a glyph. It is, if you like, the empty
> boxes in your typecase. The shape of the type defines the glyph, but
> how you choose it and whether you can distinguish them depends on the
> code point. There is no "glyph identifier" other than this.
>
>> Do you think in days of
>> individual pieces of movable type, that the tray held separate bins for
>> apostrophes and right single quotes and were filled with identical
>> pieces of type?
>
> Linotype did (for a typeface where the glyphs were the same)

Is there a typo here? You are saying that Linotype machines had
separate "mats" for apostrophe and close quote? If so, how did the
user select them? The keyboard did not have separate keys, and it
seems unlikely that such a common piece of type had to be selected by
hand.

--
Ben.
Back to top
Jukka K. Korpela

External


Since: Feb 13, 2004
Posts: 3794



(Msg. 60) Posted: Wed Aug 08, 2007 9:08 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Scripsit Andy Dingley:

> The code point is the ordinal and defining identifier for the
> "thing" that you will turn into a glyph.

No, the code point identifies a location in the coding space, and that
location may be occupied by a _character_. A character may be represented
with a glyph, or it may be invisible, or several characters may be rendered
using a single glyph, for example.

> It is, if you like, the empty
> boxes in your typecase. The shape of the type defines the glyph, but
> how you choose it and whether you can distinguish them depends on the
> code point. There is no "glyph identifier" other than this.

You are simply confusing glyphs with characters.

There _are_ glyph identifiers as conceptually and practically different from
code points.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Back to top
Display posts from previous:   
       Webmaster Forums (Home) -> HTML
Goto page Previous  1, 2, 3, 4, 5
Page 4 of 5

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum