regex - Remove non-ASCII characters from a string using python / django

Question

Welcome To Ask or Share your Answers For Others

regex - Remove non-ASCII characters from a string using python / django

1 Answer

深蓝 · Answer 1 · 2021-10-17T02:58:01+0000

You can use that the ASCII characters are the first 128 ones, so get the number of each character with ord and strip it if it's out of range

# -*- coding: utf-8 -*-

def strip_non_ascii(string):
    ''' Returns the string without non ASCII characters'''
    stripped = (c for c in string if 0 < ord(c) < 127)
    return ''.join(stripped)


test = u'éáé123456tgreáé@€'
print test
print strip_non_ascii(test)

Result

éáé123456tgreáé@€
123456tgre@

Please note that @ is included because, well, after all it's an ASCII character. If you want to strip a particular subset (like just numbers and uppercase and lowercase letters), you can limit the range looking at a ASCII table

EDITED: After reading your question again, maybe you need to escape your HTML code, so all those characters appears correctly once rendered. You can use the escape filter on your templates.

Categories

regex - Remove non-ASCII characters from a string using python / django

regex - Remove non-ASCII characters from a string using python / django

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags