regex - Does this set of regular expressions FULLY protect against cross site scripting?

Question

Welcome To Ask or Share your Answers For Others

regex - Does this set of regular expressions FULLY protect against cross site scripting?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - Does this set of regular expressions FULLY protect against cross site scripting?

What's an example of something dangerous that would not be caught by the code below?

EDIT: After some of the comments I added another line, commented below. See Vinko's comment in David Grant's answer. So far only Vinko has answered the question, which asks for specific examples that would slip through this function. Vinko provided one, but I've edited the code to close that hole. If another of you can think of another specific example, you'll have my vote!

public static string strip_dangerous_tags(string text_with_tags)
{
    string s = Regex.Replace(text_with_tags, @"<script", "<scrSAFEipt", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"</script", "</scrSAFEipt", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"<object", "</objSAFEct", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"</object", "</obSAFEct", RegexOptions.IgnoreCase);
    // ADDED AFTER THIS QUESTION WAS POSTED
    s = Regex.Replace(s, @"javascript", "javaSAFEscript", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onabort", "onSAFEabort", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onblur", "onSAFEblur", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onchange", "onSAFEchange", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onclick", "onSAFEclick", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"ondblclick", "onSAFEdblclick", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onerror", "onSAFEerror", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onfocus", "onSAFEfocus", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onkeydown", "onSAFEkeydown", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onkeypress", "onSAFEkeypress", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onkeyup", "onSAFEkeyup", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onload", "onSAFEload", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onmousedown", "onSAFEmousedown", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmousemove", "onSAFEmousemove", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseout", "onSAFEmouseout", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseup", "onSAFEmouseup", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseup", "onSAFEmouseup", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onreset", "onSAFEresetK", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onresize", "onSAFEresize", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onselect", "onSAFEselect", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onsubmit", "onSAFEsubmit", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onunload", "onSAFEunload", RegexOptions.IgnoreCase);

    return s;
}

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:30:38+0000

It's never enough – whitelist, don't blacklist

For example javascript: pseudo-URL can be obfuscated with HTML entities, you've forgotten about <embed> and there are dangerous CSS properties like behavior and expression in IE.

There are countless ways to evade filters and such approach is bound to fail. Even if you find and block all exploits possible today, new unsafe elements and attributes may be added in the future.

There are only two good ways to secure HTML:

convert it to text by replacing every < with <.
If you want to allow users enter formatted text, you can use your own markup (e.g. markdown like SO does).
parse HTML into DOM, check every element and attribute and remove everything that is not whitelisted.
You will also need to check contents of allowed attributes like href (make sure that URLs use safe protocol, block all unknown protocols).
Once you've cleaned up the DOM, generate new, valid HTML from it. Never work on HTML as if it was text, because invalid markup, comments, entities, etc. can easily fool your filter.

Also make sure your page declares its encoding, because there are exploits that take advantage of browsers auto-detecting wrong encoding.

Categories

regex - Does this set of regular expressions FULLY protect against cross site scripting?

regex - Does this set of regular expressions FULLY protect against cross site scripting?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

It's never enough – whitelist, don't blacklist

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags