SearchSearch   

Entities in alt and title text

 
Goto page 1, 2, 3, 4, 5
   Webmaster Forums (Home) -> HTML RSS
Next:  using tag first time  
Author Message
The Bicycling Guitarist

External


Since: Nov 03, 2004
Posts: 121



(Msg. 1) Posted: Tue Jul 31, 2007 1:07 am
Post subject: Entities in alt and title text
Archived from groups: comp>infosystems>www>authoring>html (more info?)

Hello. I am replacing straight vertical apostrophes with ’ throughout
my web site. Should I do this in alt text and title attributes as well as in
the regular content? What about in page titles?
--
The Bicycling Guitarist
www.TheBicyclingGuitarist.net/
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 2) Posted: Tue Jul 31, 2007 2:44 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 31 Jul, 09:07, "The Bicycling Guitarist"
<Ch....RemoveThis@TheBicyclingGuitarist.net> wrote:
> I am replacing straight vertical apostrophes with ’ throughout
> my web site.

Why?

Are you certain that your management of encoding is accurate and
reliable enough to deliver these correctly in all cases?

> Should I do this in alt text and title attributes

Look in the DTD. What is the type of the attribute in question?
http://www.w3.org/TR/html4/sgml/dtd.html#Text

What is the definition of this type?
http://www.w3.org/TR/html4/types.html#type-cdata
Back to top
The Bicycling Guitarist

External


Since: Nov 03, 2004
Posts: 121



(Msg. 3) Posted: Tue Jul 31, 2007 3:06 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

"Andy Dingley" <dingbat RemoveThis @codesmiths.com> wrote in message
news:1185875083.997505.29670@k79g2000hse.googlegroups.com...
> On 31 Jul, 09:07, "The Bicycling Guitarist"
> <Ch... RemoveThis @TheBicyclingGuitarist.net> wrote:
>> I am replacing straight vertical apostrophes with ’ throughout
>> my web site.
>
> Why?
>
> Are you certain that your management of encoding is accurate and
> reliable enough to deliver these correctly in all cases?
>

Hello and thanks for replying. I recently replaced " with “ and
” and I wanted to make similar improvement on the apostrophes of
possessives and contractions. I am being careful about what I replace.

If I read the links you provided correctly, then I *think* the character
entities will be converted to characters in the alt text, title attributes
and even the page titles in the <head>.
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 4) Posted: Tue Jul 31, 2007 6:29 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 31 Jul, 12:22, Harlan Messinger <hmessinger.removet....DeleteThis@comcast.net>
wrote:

> Unless the encoding is something like EBCDIC (in which case all bets are
> off), the treatment of the characters &, #, 8, 2, 1, 7, and ; will be
> independent of the encoding.

Yes, but they're only _meaningful_ if the target is expecting Unicode.
It doesn't have to be _encoded_ as UTF-8 (or similar) if you use the
numeric character references like this, but it still depends on
characters from outside the Basic Latin script as being recognised.

For "modern" web browsers, Unicode will be supported and this will
work.

For many still-current tools or editors commonly used in content-
management and web publishing though, including most XSLT that wasn't
written by Unicode wonks, then it can (and frequently will) fail.
We've all seen this - the primary culprit is Word (which just _loves_
these apostrophe and quote characters). Who can say they've not seen
’ turn up mangled in web content? Either unrecognised into a
blob character, unexpectedly literalised as "8217" or most commonly
of all, correctly rendered into a UTF-8 character which _then_ becomes
garbled when it's incorrectly served as ISO-8859. Here's an example of
a well-known blog doing it, just a few days ago: http://www.badscience.net/?p=398

’ is taken from the convoluted far-reaches of the Unicode
punctuation script, not the Basic Latin. As there are perfectly good
apostrophes to be had _from_ Basic Latin (might I suggest the good old
' ?) then why complicate things?

There's also the typographic issue that ’ isn't even an
apostrophe, it's a single right quote character. These aren't even the
same thing and it's a pointless affectation to use one for the other,
for no more reason than, "Look at me, I know how to use fluffy quotes"
Back to top
Harlan Messinger

External


Since: Apr 25, 2004
Posts: 1190



(Msg. 5) Posted: Tue Jul 31, 2007 7:22 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley wrote:
> On 31 Jul, 09:07, "The Bicycling Guitarist"
> <Ch....TakeThisOut@TheBicyclingGuitarist.net> wrote:
>> I am replacing straight vertical apostrophes with ’ throughout
>> my web site.
>
> Why?
>
> Are you certain that your management of encoding is accurate and
> reliable enough to deliver these correctly in all cases?

Unless the encoding is something like EBCDIC (in which case all bets are
off), the treatment of the characters &, #, 8, 2, 1, 7, and ; will be
independent of the encoding. Which is actually the point in using
numeric codes to represent these characters.
Back to top
The Bicycling Guitarist

External


Since: Nov 03, 2004
Posts: 121



(Msg. 6) Posted: Tue Jul 31, 2007 7:22 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

"Harlan Messinger" <hmessinger.removethis.TakeThisOut@comcast.net> wrote in message
news:5h8kcgF3firuaU1@mid.individual.net...
> Andy Dingley wrote:
>> On 31 Jul, 09:07, "The Bicycling Guitarist"
>> <Ch....TakeThisOut@TheBicyclingGuitarist.net> wrote:
>>> I am replacing straight vertical apostrophes with ’ throughout
>>> my web site.
>>
>> Why?
>>
>> Are you certain that your management of encoding is accurate and
>> reliable enough to deliver these correctly in all cases?
>
> Unless the encoding is something like EBCDIC (in which case all bets are
> off), the treatment of the characters &, #, 8, 2, 1, 7, and ; will be
> independent of the encoding. Which is actually the point in using numeric
> codes to represent these characters.
Hi Harlan and thanks for posting.
I don't know what you mean by what you just said (perhaps because it is four
a.m. and I am tired). Won't those characters in that order be recognized as
a character entity and be rendered as a curly apostrophe? What about in meta
tags such as description? Do I dare replace the straight apostrophes in the
description with the character entity for a curly apostrophe?
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 7) Posted: Tue Jul 31, 2007 8:19 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 31 Jul, 14:53, Andreas Prilop <Prilop2... RemoveThis @trashmail.net> wrote:

> >http://www.badscience.net/?p=398
>
> Good example! Such things can happen only with UTF-8-encoded
> characters but never with character references like ’ .

Of course they can happen with ’, just pass it through an XML
tool that works on the entirely correct and specification-conformant
basis that ’ can be converted transparently to and from the
literal character "'" at the serializer's whim. Then you've entered
the domain where incorrect encodings will break the content.

’ isn't _wrong_, but it's fragile.


> Your example shows that ’ is indeed safer.

’ is _not_ safer than '
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 8) Posted: Tue Jul 31, 2007 9:01 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 31 Jul, 15:56, Harlan Messinger <hmessinger.removet....RemoveThis@comcast.net>
wrote:
> Andy Dingley wrote:
> > On 31 Jul, 12:22, Harlan Messinger <hmessinger.removet....RemoveThis@comcast.net>
> > wrote:
>
> >> Unless the encoding is something like EBCDIC (in which case all bets are
> >> off), the treatment of the characters &, #, 8, 2, 1, 7, and ; will be
> >> independent of the encoding.
>
> > Yes, but they're only _meaningful_ if the target is expecting Unicode.
>
> That isn't true at all.

Of course it is. What's wrong with that statement? ’ only makes
any sense if some stage understands SGML / HTML / XML numeric
references, and the final rendering stage understands the Unicode
codepoint it represents.

This is an issue of character sets, not encoding. Even in EBCDIC it's
supported; I've no idea what octets EBCDIC will use for ’, but
the seven characters will survive.

OTOH, I don't think it will work for Baudot as there's no "#"



> > It doesn't have to be _encoded_ as UTF-8 (or similar) if you use the
> > numeric character references like this, but it still depends on
> > characters from outside the Basic Latin script as being recognised.
>
> Which has nothing to do with expections of Unicode.

Of course it does.

> You're confusing
> support for one or another character set with support for one or another
> encoding.

In what way? My point isn't about encodings, it's about the fact that
this codepoint is from a relatively obscure part of Unicode that
_requires_ Unicode support, rather than a simpler solution needing no
more than ASCII characters.

> As for character sets, *all* HTML user agents "expect" Unicode
> because Unicode is *the* document character set for HTML documents.

As Andreas pointed out, that's practically all user agents since the
NS4 era. Web browsers have been widespread in their adoption of
Unicode standards. HTML (since 2 at least) has required it,
implementations haven't been far behind.

However that's _not_ my point. My point is that HTML flows through a
lot of things other than user agents, including editors. Many of
_these_ still don't understand Unicode fully or correctly. As my
trivial XSLT example demonstrates, you can break a document containing
’ with the aid of a correct XSLT identity copy transform, a
correct XSLT implementation, a correctly functioning Apache httpd and
a trivial error in mis-configuring the content-type with which you
serve the results. ’ might make a file robust against encoding
errors, but it doesn't guarantee incorruptibility for the lifetime of
the document, through the commonly-expected processes it may be
subject to. It's not the Book of Mormon!


> The problem there is that the character *was* entered directly into the
> text, using an encoding different from the encoding communicated to the
> browser. Using numeric codes *avoids* that problem.

No, because the use of the numeric reference isn't guaranteed to be
_preserved_. If the OP uses one editor and one static web server, then
maybe their process is demonstrably safe and they can (not should!)
use them. If they're Ben Goldacre (the badscience blog) then clearly
there's something else risky happening. If they're the journos at a
major local magazine publisher (to name just one CMS I worked on not
too long ago) then it will certainly munch an ’.


> For the sake of a polished appearance.

Quote marks _aren't_ apostrophes (and primes are yet a third group).
Confusing them is a sign of typographic naivety, not polish.

If you want them rendered prettily, choose a pretty typeface. You
don't kern a O into a 0, just because it's rendered more narrowly.

> The ASCII apostrophe isn't even a quote character, yet in the quoting
> style that uses single quotation marks, that's the character that people
> use to delineate quotations. Is that wrong as well?

No, because that's a technical restriction that's imposed externally.
It would be equally wrong to typeset an XML textbook "prettily" by re-
jigging the code examples to print with curly single quotes. It's a
technical medium, it has inherent technical requirements and reversing
those would be just as bad as mis-using the quote as an apostrophe.

> And everyone should just wear the old, dull Mao-style uniforms because,
> if anyone dresses any more nicely than that, they're saying, "Look at
> me, I know how to dress nicely."

What happens if the "Island of The Andy Warhol Clones" has a cargo of
parasols and spats wash up on its shores? Should the inhabitants then
don those along with their baseball boots and Mao suits and start
dancing a cakewalk? Would they be better dressed than they were
beforehand, or merely a ridiculous parody?
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 9) Posted: Tue Jul 31, 2007 9:11 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 31 Jul, 16:47, Harlan Messinger <hmessinger.removet....TakeThisOut@comcast.net>
wrote:

> > Of course they can happen with ’, just pass it through an XML
> > tool that works on the entirely correct and specification-conformant
> > basis that ’ can be converted transparently to and from the
> > literal character "'" at the serializer's whim. Then you've entered
> > the domain where incorrect encodings will break the content.
>
> *Anything* that works fine to begin with won't work if you first pass it
> through something that breaks it.

XML doesn't "break" it. It does something entirely legal.

The risk here is that the world, and certainly not the web world,
isn't simple. Even if the OP thinks they're using a simple process,
how simple is it really? What happens when they post that code into a
blog engine? Through something that's collected by RSS and re-
distributed? Now there's an XML-based process that certainly does
hammer on numeric entities.


> My TV is fragile. So I don't use a sledgehammer within striking distance.

A shame, but the Evil of TV is another matter Cool
Back to top
Harlan Messinger

External


Since: Apr 25, 2004
Posts: 1190



(Msg. 10) Posted: Tue Jul 31, 2007 9:13 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

The Bicycling Guitarist wrote:
> "Harlan Messinger" <hmessinger.removethis.RemoveThis@comcast.net> wrote in message
> news:5h8kcgF3firuaU1@mid.individual.net...
>> Andy Dingley wrote:
>>> On 31 Jul, 09:07, "The Bicycling Guitarist"
>>> <Ch....RemoveThis@TheBicyclingGuitarist.net> wrote:
>>>> I am replacing straight vertical apostrophes with ’ throughout
>>>> my web site.
>>> Why?
>>>
>>> Are you certain that your management of encoding is accurate and
>>> reliable enough to deliver these correctly in all cases?
>> Unless the encoding is something like EBCDIC (in which case all bets are
>> off), the treatment of the characters &, #, 8, 2, 1, 7, and ; will be
>> independent of the encoding. Which is actually the point in using numeric
>> codes to represent these characters.
> Hi Harlan and thanks for posting.
> I don't know what you mean by what you just said (perhaps because it is four
> a.m. and I am tired). Won't those characters in that order be recognized as
> a character entity and be rendered as a curly apostrophe?

Yes.

> What about in meta
> tags such as description? Do I dare replace the straight apostrophes in the
> description with the character entity for a curly apostrophe?

Yes, and for the same reason. The title is different--since the title is
typically rendered by the operating system (for example, in the Internet
Explorer window's caption bar) rather than by the browser's
HTML-rendering engine, I think there can be problems.
Back to top
The Bicycling Guitarist

External


Since: Nov 03, 2004
Posts: 121



(Msg. 11) Posted: Tue Jul 31, 2007 10:05 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

"Andy Dingley" <dingbat.TakeThisOut@codesmiths.com> wrote in message
news:1185897678.614824.184980@k79g2000hse.googlegroups.com...
> On 31 Jul, 15:56, Harlan Messinger <hmessinger.removet....TakeThisOut@comcast.net>
> wrote:
>> Andy Dingley wrote:
>> > On 31 Jul, 12:22, Harlan Messinger <hmessinger.removet....TakeThisOut@comcast.net>
>> > wrote:
>
> Quote marks _aren't_ apostrophes (and primes are yet a third group).
> Confusing them is a sign of typographic naivety, not polish.

Dang. I found more than one reputable-looking resource that said ’ did
double duty as right single quote AND as apostrophe. Some of the regulars
here also seem to think it's okay to use the right single quote as an
apostrophe. Obviously there is disagreement. Is there an "official" position
on this matter?
Back to top
The Bicycling Guitarist

External


Since: Nov 03, 2004
Posts: 121



(Msg. 12) Posted: Tue Jul 31, 2007 10:15 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

"The Bicycling Guitarist" <Chris.TakeThisOut@TheBicyclingGuitarist.net> wrote in message
news:p5Cri.1257$xq2.413@newsfe02.lga...
> Hello. I am replacing straight vertical apostrophes with ’
> throughout


Well there is an entity called apostrophe that I will use there, and ’
will only be used for right single quote. I want to be technically correct
no matter how it looks.

example from my web site:

“How could you ‘bust’ me, I'm irresistible!”
Back to top
Andy Dingley

External


Since: Jun 01, 2007
Posts: 134



(Msg. 13) Posted: Tue Jul 31, 2007 10:48 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

On 31 Jul, 17:10, Andreas Prilop <Prilop2....DeleteThis@trashmail.net> wrote:
> On Tue, 31 Jul 2007, Andy Dingley wrote:
> > ’
>
> > My point isn't about encodings, it's about the fact that
> > this codepoint is from a relatively obscure part of Unicode that
> > _requires_ Unicode support, rather than a simpler solution needing no
> > more than ASCII characters.
>
> This character (') is included in Windows code page 1252

The character might be (as 0x92), but the codepoint (0x2019 / ’)
certainly isn't.

That's a statement of equivalence between characters from different
worlds: Windows and Unicode. They might be equivalent (right single
quotes are right single quotes), or they might just be a close mapping
(right single quotes aren't quite apostrophes). In this case they
happen to be an exact match.

So what are you suggesting here? Use the numeric entity ’ and
label it as the Windows 1252 code page? That's just plain wrong --
’ is _not_ in 1252, even if an equivalent character is.

Or use the Windows 1252 character ’ ? That's certainly valid,
and it represents a right single quote (assuming that's what we want).
However it also requires the use of a Windows codepage to label a web
resource (ugly, albeit workable) and this will _certainly_ break if
the encoding is unreliable (the whole reason we're talking about using
numeric references).


> It is *not* from a "relatively obscure part of Unicode".

I don't like to see 4 digit numeric entities in everyday web pages
(and I work with Poles and Czechs!). If there's a reason to write that
character in everyday usage, your keyboard ought to have a simple key
for you to type it with. If you're up in the character stratosphere
you're either typing maths or you need a very good excuse. Everyday
work just shouldn't need this level of complexity.

The prevalence of ’ is a M$oft aberration. It's almost always
an indication that HTML content has been pasted in from Word. While
much of the world used simple ASCII and coped with its limits (merging
apostrophe and single quote into one character), M$oft invented their
non-ISO Windows-1252 codepage and sprinkled it with a handful of the
more useful characters. The use of right single quote (as 0x92) became
more common than on the non-Windows platforms. When M$oft adopted
Unicode, they correctly mapped this to ’. This much is correct.

When the Word over-eager autotype wizard starts flipping apostrophes
to single quotes though, that's wrong. Not badly wrong, but still not
right.

> By your arguments, we should not even write in German with
> German special letters (ä ö ü).

My argument is twofold:

* ’ is more fragile than it appears at first.

* An apostrophe is not the same thing as a right single quote.

Now by which of these do you think that German shouldn't be written
correctly?
Back to top
Harlan Messinger

External


Since: Apr 25, 2004
Posts: 1190



(Msg. 14) Posted: Tue Jul 31, 2007 10:56 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley wrote:
> On 31 Jul, 12:22, Harlan Messinger <hmessinger.removet....RemoveThis@comcast.net>
> wrote:
>
>> Unless the encoding is something like EBCDIC (in which case all bets are
>> off), the treatment of the characters &, #, 8, 2, 1, 7, and ; will be
>> independent of the encoding.
>
> Yes, but they're only _meaningful_ if the target is expecting Unicode.

That isn't true at all.

> It doesn't have to be _encoded_ as UTF-8 (or similar) if you use the
> numeric character references like this, but it still depends on
> characters from outside the Basic Latin script as being recognised.

Which has nothing to do with expections of Unicode. You're confusing
support for one or another character set with support for one or another
encoding. As for character sets, *all* HTML user agents "expect" Unicode
because Unicode is *the* document character set for HTML documents.

>
> For "modern" web browsers, Unicode will be supported and this will
> work.
>
> For many still-current tools or editors commonly used in content-
> management and web publishing though, including most XSLT that wasn't
> written by Unicode wonks, then it can (and frequently will) fail.
> We've all seen this - the primary culprit is Word (which just _loves_
> these apostrophe and quote characters). Who can say they've not seen
> ’ turn up mangled in web content? Either unrecognised into a
> blob character, unexpectedly literalised as "8217" or most commonly
> of all, correctly rendered into a UTF-8 character which _then_ becomes
> garbled when it's incorrectly served as ISO-8859. Here's an example of
> a well-known blog doing it, just a few days ago: http://www.badscience.net/?p=398

The problem there is that the character *was* entered directly into the
text, using an encoding different from the encoding communicated to the
browser. Using numeric codes *avoids* that problem.

>
> ’ is taken from the convoluted far-reaches of the Unicode
> punctuation script, not the Basic Latin. As there are perfectly good
> apostrophes to be had _from_ Basic Latin (might I suggest the good old
> ' ?) then why complicate things?

For the same reason one does in book publishing. The same reason it's
nice that computers now display, and printers print, proportional
typefaces instead of just System font 8-pixels-wide on the screen and
Courier 10pt on the printer. For the sake of a polished appearance.

> There's also the typographic issue that ’ isn't even an
> apostrophe, it's a single right quote character. These aren't even the
> same thing

The ASCII apostrophe isn't even a quote character, yet in the quoting
style that uses single quotation marks, that's the character that people
use to delineate quotations. Is that wrong as well?

> and it's a pointless affectation to use one for the other,
> for no more reason than, "Look at me, I know how to use fluffy quotes"

And everyone should just wear the old, dull Mao-style uniforms because,
if anyone dresses any more nicely than that, they're saying, "Look at
me, I know how to dress nicely."
Back to top
Harlan Messinger

External


Since: Apr 25, 2004
Posts: 1190



(Msg. 15) Posted: Tue Jul 31, 2007 11:47 am
Post subject: Re: Entities in alt and title text
Archived from groups: per prev. post (more info?)

Andy Dingley wrote:
> On 31 Jul, 14:53, Andreas Prilop <Prilop2....DeleteThis@trashmail.net> wrote:
>
>>> http://www.badscience.net/?p=398
>> Good example! Such things can happen only with UTF-8-encoded
>> characters but never with character references like ’ .
>
> Of course they can happen with ’, just pass it through an XML
> tool that works on the entirely correct and specification-conformant
> basis that ’ can be converted transparently to and from the
> literal character "'" at the serializer's whim. Then you've entered
> the domain where incorrect encodings will break the content.

*Anything* that works fine to begin with won't work if you first pass it
through something that breaks it. Why would you invoke a scenario that
probably isn't part of the OP's arrangement (the OP has a small personal
website; he is probably not passing anything through an XML-based
processor) as a reason not to do something?

As it happens, the CMS we use leaves numeric codes intact. Any HTML
content management system *should* do so regardless of what XML
processors in general are *allowed* to do, given the very fact that
their purpose is to process, store, and deploy HTML, and that it's
critical that it be done correctly.

> ’ isn't _wrong_, but it's fragile.

My TV is fragile. So I don't use a sledgehammer within striking distance.
Back to top
Display posts from previous:   
       Webmaster Forums (Home) -> HTML
Goto page 1, 2, 3, 4, 5
Page 1 of 5

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum