Although Darin's answer is perfectly feasible, I wouldn't recommend using Scott Hanselman's technique of turning all this validation off step by step. You will sooner or later end up in deep...
Second suggestion of using IDs along with dummy strings (which are great for SEO and people) is a way to go, but sometimes they're not feasible either. Imagine this request URL:
/111/Electronics/222/Computers/333/Apple
Although we'd have these IDs we can rely on and human/SEO friendly category names as well, this is definitely not desired. ID + Dummy string is feasible when we need to represent one single item. In other cases it's not. And since you have to display category and subcategory, this is a problem.
So what can you do?
Two possible solutions:
Cleanup category names to only have valid characters - this can be done but if these are not static and editable by privileged users, you're out of luck here, because even if you've cleaned them up now, someone will enter something invalid later
Cleanup your string on the go - When you use category name, clean it up and when reading it and using it (to get the actual category ID) you can compare provided (previously cleaned) category name with value in DB that you clean up on the fly either:
- now while filtering categories
- before when generating category names
I'd suggest you take the 2.2 approach. Extend your DB table to have two columns:
- Category display name
- Category URL friendly name
you can also set a unique constraint on the second column, so it won't happen that two of your categories (even though they'd have different display names) would have same URL friendly names.
How to clean
The first thing that comes to mind is to strip out invalid characters, but that's very tedious and you'll most probably leave something out. It's much easier and wiser to get valid characters from your category display name. I've done the same when generating dummy URL category names. Just take out what is valid and bin the rest. It usually works just fine. Two examples of such regular expressions:
(w{2,})
- only use letters, digits and underscores and at least two of them (so we leave out a or single numbers and similar that doesn't add any meaning and unnecessarily lengthens our URL
([a-zA-Z0-9]{2,})
- only letters and digits (also 2+)
Get all matches in your category display name and join them with a space/dash and save along with original display name. A different question of mine was exactly about this.
Why an additional column? Because you can't run regular expression in SQL server. If you're using MySql you can use one column and use regular expression on DB as well.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…