Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
423 views
in Technique[技术] by (71.8m points)

internationalization - Slugify and Character Transliteration in C#

I'm trying to translate the following slugify method from PHP to C#: http://snipplr.com/view/22741/slugify-a-string-in-php/

Edit: For the sake of convenience, here the code from above:

/**
 * Modifies a string to remove al non ASCII characters and spaces.
 */
static public function slugify($text)
{
    // replace non letter or digits by -
    $text = preg_replace('~[^\pLd]+~u', '-', $text);

    // trim
    $text = trim($text, '-');

    // transliterate
    if (function_exists('iconv'))
    {
        $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    }

    // lowercase
    $text = strtolower($text);

    // remove unwanted characters
    $text = preg_replace('~[^-w]+~', '', $text);

    if (empty($text))
    {
        return 'n-a';
    }

    return $text;
}

I got no probleming coding the rest except I can not find the C# equivalent of the following line of PHP code:

$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

Edit: Purpose of this is to translate non-ASCII characters such as Reformáció Genfi Emlékm?ve El?tt into reformacio-genfi-emlekmuve-elott

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I would also like to add that the //TRANSLIT removes the apostrophes and that @jxac solution doesn't address that. I'm not sure why but by first encoding it to Cyrillic and then to ASCII you get a similar behavior as //TRANSLIT.

var str = "é???í?";
var noApostrophes = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(str)); 

=> "eaaoiO"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...