SearchSearch   

Entities in alt and title text

 
Goto page Previous  1, 2, 3, 4, 5
   Webmaster Forums (Home) -> HTML RSS
Next:  using tag first time  
Author Message
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 31) Posted: Wed Aug 01, 2007 4:48 am
Post subject: Re: Entities in alt and title text
Archived from groups: comp>infosystems>www>authoring>html (more info?)

On 1 Aug, 09:15, Andreas Prilop <Prilop2....DeleteThis@trashmail.net> wrote:
> On Tue, 31 Jul 2007, Andy Dingley wrote:
> > Agreed, but that's by their definition as _characters_, not codepoints.
>
> You still struggle with basic terms.

So go on then, please enlighten us as to my error here. Anyone can
point to huge great references from the W3C or even Plan 9, but unless
you cite a specific piece of text here, and a specific statement in
the reference, then all you're doing is a feeble bit of proof by
authority.

> if you want to discuss further on this topic and if you want
> to be taken seriously.

I presume your unhelpful patronising attitude is holiday cover for
Jukka.


As to the issue here, then a "character" is "the smallest component of
written language that has semantic value; refers to the abstract
meaning and/or shape" (from Unicode 4.0). A "code point" is the
integer that refers to a location within the space defined by Unicode
(Unicode themselves are inconsistent as to what this space is called).
If you're being that picky, then it's "code point", not "codepoint".
The term code point isn't used outside the Unicode world, but the
concept could usefully be extended to describe other character sets,
within a bounded world of discourse where you're careful to define the
term beforehand.

So 8217 is a code point (right single quote). So is 27 a code point
(apostrophe). As a purely typographic question, we should discuss
whether one is better than the other for representing apostrophes
with. I think the answer is fairly clear to that.

As to which is prettier, then if you want a pretty apostrophe glyph:
choose a pretty typeface to render it with.

It's not unheard of for characters to be deliberately mis-encoded, so
as to gain a prettier glyph than the one intended for that character.
We've got four choices here: apostrophe, quote, prime and even an
acute accent! They all look much the same as glyphs, why not choose
the one that's prettiest and hang the accuracy of the markup used to
obtain it? Again, I think the answer is fairly clear to that.


Now, as to the mapping of windows-1252. We're talking about 0x92 which
is a "right single quote" and not an "apostrophe". As such, the
_character_ of "right single quote" is an exact match for the Unicode
character "right single quote" found at code point 0x2019 / ’

This does not mean that 0x92 = 0x2019, that the integers 146 =
8217 ! As far as we can attach the same definition of code point
("an integer that defines a location in codespace") to windows-1252,
then it's clear that these are quite different code points.

Their characters are the same. Their code points are mappable to each
other.
That's not the same thing as saying that the code points are the
same.

In particular, we have web-encodings that can distribute encodings of
the Unicode codespace correctly through a number of different
encodings, including ISO-8859-* encodings that don't even support
those characters. By using numeric character references we can work
around this. Note though that these many web pages all (by
definition) refer to codepoints in the Unicode codespace, no matter
what their encoding. If you use a literal character from ISO-8859-*
then it will be transcoded to the appropriate Unicode code point (and
just to show you that I read the damn thing years ago, here's the
reference that describes the process http://www.w3.org/TR/charmod/#sec-Transcoding
).

So if you're going to use ’ as a numeric character reference,
then use it - it'll work from any encoding (caveat the problem if it
_stops_ being a numeric character reference, owing to some part of a
supposedly transparent CMS changing it into the literal)

If you're going to use a literal character, then use it. It's probably
simplest, you just have to track that you've labelled it with an
appropriate encoding.

Personally my strong recommendation is to do this, and additionally to
_only_ use UTF-8 for _all_ of your content. It's easier to manage than
allowing variation. Expunge the ISO-8859-* encodings and the Windows
encodings.

Although 0x92 is a fine character to use as a literal in a
windows-1252 encoding (transcoding will map it for you) it's generally
a bad idea to use the arcane and obsolete windows encodings anyway.
Using 0x92 as a numeric character reference (i.e. ’) in a
windows-1252 encoding is ugly. It's wrong according to the standard
(it's not a Unicode codepoint) and you're relying on browser fix-ups
to make it work. It probably will work, but why do it? If you _want_
a numeric character reference for "right single quote", ’ is the
correct one to use.
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 32) Posted: Wed Aug 01, 2007 4:50 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 1 Aug, 12:22, Stan Brown <the_stan_br....DeleteThis@fastmail.fm> wrote:

> Maybe because Usenet isn't supposed to be a UTF medium?

RFC 977 wasn't, RFC 3977 is
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 33) Posted: Wed Aug 01, 2007 9:25 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 1 Aug, 16:02, Ben Bacarisse <ben.use... DeleteThis @bsb.me.uk> wrote:

> Yes, but I feel we will disagree! Unicode is clear: about U+0027 they
> say:
>
> neutral (vertical) glyph with mixed usage
> U+2019 is preferred for apostrophe
>
> http://www.fileformat.info/info/unicode/char/0027/index.htm

That's not Unicode Inc. though, that's just J. Random Blogger with no
more visible contact details than a gmail mailbox.

> Wikipedia is clear:

If we accept Wikipedia as a source, then:
http://en.wikipedia.org/wiki/Apostrophe

"The apostrophe should not be confused with the closing single
quotation mark (usually rendered identically but serving a quite
different purpose), or with the similar-looking prime (which is used
to indicate measurement in feet or arcminutes, or for various
mathematical purposes)."

Real typographers tend to be on paper references rather than the web,
and those require me to have time in the evening to hunt down
references in the books of Tschichold etc., something I've just not
had this week.

There's certainly a convention that "right single quote is the same as
apostrophe", but that's far from universal and is frequently (and
loudly decried). It's a British convention to do so, I'm not sure if
it's English-speaking world or American convention too. In Europe
though it's seen more widely to be wrong than right, and as a
peculiarly British idiosyncrasy. As the typographic art has been
upheld rather better over the last century by mainland Europe than by
Britain (a handful of exceptions like Eric Gill apart), then my own
preference is certainly to keep them apart.

Really it's up to the OP, it's his site after all. But he should be
aware that it's certainly _not_ an obvious and indisputable
equivalence.
Back to top
Andreas Prilop

External


Since: Jul 04, 2007
Posts: 23



(Msg. 34) Posted: Wed Aug 01, 2007 10:09 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On Tue, 31 Jul 2007, Stan Brown wrote:

> User-Agent: MicroPlanet-Gravity/2.70.2067
> Content-Type: text/plain; charset="iso-8859-1"
>
>> â??How could you â??bustâ?? me; Iâ??m irresistible!â??
>
> Res ipsa loquitur.

No, MicroPlanet-Gravity speaks - and it says:
I still don't support UTF-8 in the year 2007.
I'm a lousy, obsolete newsreader.

But remember: We were talking about browsers, not newsreaders.
Back to top
Stan Brown

External


Since: Jul 13, 2004
Posts: 1233



(Msg. 35) Posted: Wed Aug 01, 2007 10:09 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Wed, 1 Aug 2007 10:09:38 +0200 from Andreas Prilop <Prilop2007
@trashmail.net>:
> No, MicroPlanet-Gravity speaks - and it says:
> I still don't support UTF-8 in the year 2007.

Maybe because Usenet isn't supposed to be a UTF medium?

Or did the standard change while I wasn't noticing?

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/2003/05/05/why_we_wont_help_you
Back to top
Andreas Prilop

External


Since: Jul 04, 2007
Posts: 23



(Msg. 36) Posted: Wed Aug 01, 2007 10:15 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On Tue, 31 Jul 2007, Andy Dingley wrote:

> Agreed, but that's by their definition as _characters_, not codepoints.

You still struggle with basic terms. I'm sorry to say this,
but you constantly confuse characters, character references,
codepoints, encodings, etc.

Please read carefully
http://www.w3.org/TR/charmod/
if you want to discuss further on this topic and if you want
to be taken seriously.
Back to top
David Trimboli

External


Since: Apr 06, 2006
Posts: 14



(Msg. 37) Posted: Wed Aug 01, 2007 11:22 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Stan Brown <the_stan_brown RemoveThis @fastmail.fm> wrote:
> Tue, 31 Jul 2007 16:12:12 -0400 from David Trimboli
> <david RemoveThis @trimboli.name>:
>> One day I realized this was a big waste of time. Why bother with
>> entities when you can simply use an encoding that supports those
>> characters? Use UTF-8 and you can use those characters directly.
>>
>> â??How could you â??bustâ?? me; Iâ??m irresistible!â?
>
> Res ipsa loquitur.

How clever of you to intentionally repost this in the wrong encoding!

David
Stardate 7582.8
Back to top
Ben Bacarisse

External


Since: Feb 05, 2006
Posts: 84



(Msg. 38) Posted: Wed Aug 01, 2007 4:02 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley <dingbat.RemoveThis@codesmiths.com> writes:

> So 8217 is a code point (right single quote). So is 27 a code point
> (apostrophe). As a purely typographic question, we should discuss
> whether one is better than the other for representing apostrophes
> with. I think the answer is fairly clear to that.

Yes, but I feel we will disagree! Unicode is clear: about U+0027 they
say:

neutral (vertical) glyph with mixed usage
U+2019 is preferred for apostrophe

http://www.fileformat.info/info/unicode/char/0027/index.htm

Wikipedia is clear:

Older 8-bit character encodings, like Windows CP1252, Mac Roman or
ISO-8859-1, universally support the typewriter quote in the same
position, 39, inherited from ASCII (as does Unicode; see
below). However, most of them place the typographic apostrophe in
different positions. ISO-8859-1, the most common encoding used for
web pages, omits the typographic apostrophe altogether.

http://en.wikipedia.org/wiki/Apostrophe#Typographic_form

I.e. unless you are deliberately simulating a typewriter (for example
in computer code) U+2019 is the referred apostrophe for typographic
use.

I would not rule out using U+0027 as an apostrophe and it is hard to
make a case that it is "wrong" (Unicode says it has multiple uses,
and presumably one of those matches it's name), but the matter about
which is best "as a purely typographic question" is clear: U+2019.

(Of course if that is what you meant, I apologise for missing your
point. You did not answer your own question, but it seemed from
previous posts that your "clear" answer would be the other way round.)

--
Ben.
Back to top
Helmut Richter

External


Since: Jun 12, 2007
Posts: 18



(Msg. 39) Posted: Wed Aug 01, 2007 7:28 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On Wed, 1 Aug 2007, Andy Dingley wrote:

> There's certainly a convention that "right single quote is the same as
> apostrophe", but that's far from universal and is frequently (and
> loudly decried). It's a British convention to do so, I'm not sure if
> it's English-speaking world or American convention too. In Europe
> though it's seen more widely to be wrong than right, and as a
> peculiarly British idiosyncrasy. As the typographic art has been
> upheld rather better over the last century by mainland Europe than by
> Britain (a handful of exceptions like Eric Gill apart), then my own
> preference is certainly to keep them apart.

It does not matter whether a right single quote (rsq) *looks like* an
apostrophe, it *is not* an apostrophe! In the next example, I'll use
`single quotes´ and an apostroph':

He said `you're using the wrong apostrophe´

and it is clear that the quote does not end after 3 characters.

The first letter in the Russian word for "Russian" *looks like* the first
letter in the Polish word for "Polish", but they are different letters
(U+0440 vs. U+0070). For other characters, such distinctions have been
made, e.g. the difference between a dash (U+2013) and a minus (U+2212),
even if they often are rendered the same. For an apostrophe, however,
there is only U+0027, so its rendering should be improved, rather than
replacing apostrophes by rsqs. In the same spirit, I find fake ,,German´´
quotes nothing but disgusting.

--
Helmut Richter
Back to top
Ben Bacarisse

External


Since: Feb 05, 2006
Posts: 84



(Msg. 40) Posted: Wed Aug 01, 2007 10:25 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley <dingbat DeleteThis @codesmiths.com> writes:

> On 1 Aug, 16:02, Ben Bacarisse <ben.use... DeleteThis @bsb.me.uk> wrote:
>
>> Yes, but I feel we will disagree! Unicode is clear: about U+0027 they
>> say:
>>
>> neutral (vertical) glyph with mixed usage
>> U+2019 is preferred for apostrophe
>>
>> http://www.fileformat.info/info/unicode/char/0027/index.htm
>
> That's not Unicode Inc. though, that's just J. Random Blogger with no
> more visible contact details than a gmail mailbox.

Oh please! Why do people do this? Did you not recognise the format?
That site takes the Unicode DB and formats it more conveniently than
the Unicode site does. But since you obviously did not want to check
first, here is the same text from the horse's mouth (but you will see
why I preferred the other link).

http://www.unicode.org/charts/PDF/U0000.pdf

>> Wikipedia is clear:
>
> If we accept Wikipedia as a source, then:
> http://en.wikipedia.org/wiki/Apostrophe
>
> "The apostrophe should not be confused with the closing single
> quotation mark (usually rendered identically but serving a quite
> different purpose), or with the similar-looking prime (which is used
> to indicate measurement in feet or arcminutes, or for various
> mathematical purposes)."

Very clear. Why does that mean that the multipurpose U+0027 should be
preferred on a web page? The passage I quoted (and you snipped) from
the same page *does* address that question and comes down (mildly)
against using U+0027 in "typographical" use -- the specific case you
were referring to.

> Real typographers tend to be on paper references rather than the web,
> and those require me to have time in the evening to hunt down
> references in the books of Tschichold etc., something I've just not
> had this week.

I doubt they will add anything very helpful. The "real" typographic
solution is likely to be a glyph designed for the purpose. They may
help in giving an opinion about what that glyph should look like in
various faces, but how will that help? One does not author web pages
by choosing characters for their appearance. Currently, the best
source of guidance about what character to use for a particular
purpose (on a web page) is Unicode.

> Really it's up to the OP, it's his site after all. But he should be
> aware that it's certainly _not_ an obvious and indisputable
> equivalence.

Ah, then we agree. You seemed to say the choice was "clear" (in
favour of U+0027) and it is what prompted my reply.

It would be good to have a code point for a typographic apostrophe, but
we don't. It can't be U+0027 because that must produce (on a
web-page) a multipurpose glyph as it does now. Because that code
point is multipurpose, using it as an apostrophe is not wrong, but
equally, I can't see how it can preferred for that use from a
typographic point of view.

--
Ben.
Back to top
Ben Bacarisse

External


Since: Feb 05, 2006
Posts: 84



(Msg. 41) Posted: Wed Aug 01, 2007 10:35 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Helmut Richter <hhr-m RemoveThis @web.de> writes:

> For an apostrophe, however,
> there is only U+0027, so its rendering should be improved,

I don't think that is possible. All of the low-numbered old ASCII
characters have become too widely used for too may purposes to have
that code point "re-designed" for one use.

> rather than
> replacing apostrophes by rsqs. In the same spirit, I find fake ,,German´´
> quotes nothing but disgusting.

All languages have their own typographical rules and traditions. The
suggestion (in Unicode) that U+2019 be used for apostrophe is
explicitly stated as being for English text only. Of course, you are
at liberty to find it disgusting none the less, but I think those
brought up on English-language typography will find it less so.

--
Ben.
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 42) Posted: Thu Aug 02, 2007 7:58 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 2 Aug, 14:59, Andreas Prilop <Prilop2....RemoveThis@trashmail.net> wrote:

> That's your personal taste. Unicode does not distinguish between
> a single quotation mark 9 and an apostrophe. There is only U+2019.

So what's the Unicode 0x0027 apostrophe?
Back to top
Harlan Messinger

External


Since: Apr 25, 2004
Posts: 1190



(Msg. 43) Posted: Thu Aug 02, 2007 12:18 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley wrote:
> On 2 Aug, 14:59, Andreas Prilop <Prilop2... DeleteThis @trashmail.net> wrote:
>
>> That's your personal taste. Unicode does not distinguish between
>> a single quotation mark 9 and an apostrophe. There is only U+2019.
>
> So what's the Unicode 0x0027 apostrophe?
>
It's the same as the Unicode 0x0027 closing single quote.
Back to top
Andreas Prilop

External


Since: Jul 04, 2007
Posts: 23



(Msg. 44) Posted: Thu Aug 02, 2007 3:59 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On Wed, 1 Aug 2007, Helmut Richter wrote:

> It does not matter whether a right single quote (rsq) *looks like* an
> apostrophe, it *is not* an apostrophe!

That's your personal taste. Unicode does not distinguish between
a single quotation mark 9 and an apostrophe. There is only U+2019.
http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html

Your personal taste might also call for different characters for
umlaut and for diaeresis. Nevertheless, Unicode has only one ¨
and only one ü .

Unicode likewise has only one character for dagesh/mappiq/shuruq,
U+05BC - even if you think they are different.
Back to top
Ben Bacarisse

External


Since: Feb 05, 2006
Posts: 84



(Msg. 45) Posted: Thu Aug 02, 2007 4:36 pm
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley <dingbat.DeleteThis@codesmiths.com> writes:

> On 2 Aug, 14:59, Andreas Prilop <Prilop2....DeleteThis@trashmail.net> wrote:
>
>> That's your personal taste. Unicode does not distinguish between
>> a single quotation mark 9 and an apostrophe. There is only U+2019.
>
> So what's the Unicode 0x0027 apostrophe?

Just in case that question was not rhetorical, Unicode describes it
thus:

"neutral (vertical) glyph with mixed usage"

--
Ben.
Back to top
Display posts from previous:   
       Webmaster Forums (Home) -> HTML
Goto page Previous  1, 2, 3, 4, 5
Page 3 of 5

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum