Ticket #2452 (new defect)

Opened 6 months ago

Last modified 2 days ago

URL generation for macrons, umlauts and other special characters

Reported by: sminnee Assigned to: ischommer
Type: defect Priority: medium
Milestone: 2.3.0 Component: i18n
Version: 2.2.2-rc2 Severity: medium effort / impact
Keywords: Cc: everplays@gmail.com
Due date: Hours:

Description

If you have a page title that contains a macron, acute, grave, etc, then the URL should contain the unmarked character. For example, é should be transformed to e. Currently they are stripped out.

There is a server-side solution to this, which Ingo will be posting shortly.

It would also be worth making use of Ajax to generate the URLs on the server-side as soon as people enter the URLs, rather than relying on a javascript implementation of this.

Attachments

patch.diff (1.1 kB) - added by ischommer 2 days ago.

Change History

Changed 6 months ago by ischommer

this is a fix i've successfully applied in a project branch:

$str = preg_replace('~[^\\pL0-9_]+~u', '-', $str); // substitutes anything but letters, numbers and '_' with separator
$str = trim($str, "-");
$str = iconv("utf-8", "us-ascii//TRANSLIT", $str); // TRANSLIT does the whole job
$str = strtolower($str);
$str = preg_replace('~[^-a-z0-9_]+~', '', $str); // keep only letters, numbers, '_' and separator

this heavily borrows from a drupal-patch, see http://drupal.org/node/63924

it requires the iconv php library, which is fairly standard in PHP5 distrubtions - does anybody know in which scenarios iconv might not be present on the platform or php-distribution?

Changed 6 months ago by ischommer

note: minimum requirement for iconv is php 5.1, which conflicts with http://doc.silverstripe.com/doku.php?id=server-requirements: "PHP 5.2.0+ recommended, PHP as low as 5.0.4 have been known to work, but for best results 5.2.0+ is recommended."

any reason we shouldn't raise this to php 5.2.0 as a minimum?

Changed 6 months ago by sminnee

Because there are a heap of people out there who are running on PHP 5.1. I'm less certain about PHP 5.0 but I imagine there are still a lot of people out there.

I think that raising the requirement to PHP 5.1 *could* be considered. However, this particular issue doesn't warrant it.

A better solution is to only use this strategy if the iconv() method exists. Graceful degradation isn't just for javascript! :-P

Changed 6 months ago by sharvey

  • milestone set to 2.2.3 feature-lock

Changed 2 weeks ago by ischommer

  • milestone changed from 2.2.3 to 2.3.0

Changed 2 weeks ago by ischommer

  • summary changed from URL generation for macronised content doesn't work to URL generation for macrons, umlauts and other special characters

Some more discussion at #2278

Changed 2 weeks ago by ischommer

From sminnee on #2278: "I don't think that this change should affect our handling of old URLs. By default, this code will only affect the URL generation on new or edited pages.

A functions for doing a global update of all URLs to fit a new rule is a completely separate system.

One of the things that we were working on is making use of the SiteTree??_versions table to search for legacy URLs before redirecting to a 404."

Changed 2 days ago by ischommer

  • cc set to everplays@gmail.com

I've attached another take at the problem from Behrooz (he sent it in via email, everplays at gmail dot com). Its not trying to convert characters it doesnt know, but rather just strip characters which are explicitly disallowed in URL strings. Not sure how well unicode characters are supported across browsers and in our handling of URLs through PHP5 (Director?). Needs some further investigation.

Changed 2 days ago by ischommer

Note: See TracTickets for help on using tickets.