Short Version — You can do some of this in pure SPARQL
You can use a query like this to find cities whose names are like "Londn", and order them by (one measure of) similarity. The rest of the answer explains how this works:
select ?city ?percent where {
?city a dbpedia-owl:City ;
rdfs:label ?label .
filter langMatches( lang(?label), 'en' )
bind( replace( concat( 'x', str(?label) ), "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$", '$1$2$3$4$5' ) as ?match )
bind( xsd:float(strlen(?match))/strlen(str(?label)) as ?percent )
}
order by desc(?percent)
limit 100
SPARQL results
city percent
----------------------------------------------
http://dbpedia.org/resource/London 0.833333
http://dbpedia.org/resource/Bonn 0.75
http://dbpedia.org/resource/Loudi 0.6
http://dbpedia.org/resource/Ladnu 0.6
http://dbpedia.org/resource/Lonar 0.6
http://dbpedia.org/resource/Longnan 0.571429
http://dbpedia.org/resource/Longyan 0.571429
http://dbpedia.org/resource/Luoding 0.571429
http://dbpedia.org/resource/Lodhran 0.571429
http://dbpedia.org/resource/Lom%C3%A9 0.5
http://dbpedia.org/resource/Andong 0.5
Computing string similarity metrics
Note: the code in this part of this answer works in Apache Jena. There's actually an edge case that causes this to (correctly) fail in Virtuoso. The update at the end addresses this issue.
There's nothing built into SPARQL for computing string matching distances, but you can do some of it using the regular expression replacement mechanism in SPARQL. Suppose you want to match the sequence "cat" in some strings. Then you can use a query like this to figure out how much of a given string is present in the sequence "cat":
select ?string ?match where {
values ?string { "cart" "concatenate" "hat" "pot" "hop" }
bind( replace( ?string, "^[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$", "$1$2$3" ) as ?match )
}
-------------------------
| string | match |
=========================
| "cart" | "cat" |
| "concatenate" | "cat" |
| "hat" | "at" |
| "pot" | "t" |
| "hop" | "" |
-------------------------
By examining the length of string and match, you should be able to compute some various similarity metrics. As a more complicated example that uses the "Londn" input you mentioned. The percent column is the percent of the string that matched the input.
select ?input
?string
(strlen(?match)/strlen(?string) as ?percent)
where {
values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
"concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }
values (?input ?pattern ?replacement) {
("cat" "^[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$" "$1$2$3")
("Londn" "^[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5")
}
bind( replace( ?string, ?pattern, ?replacement) as ?match )
}
order by ?pattern desc(?percent)
--------------------------------------------------------
| input | string | percent |
========================================================
| "Londn" | "Londn" | 1.0 |
| "Londn" | "London" | 0.833333333333333333333333 |
| "Londn" | "Lando" | 0.6 |
| "Londn" | "London Fog" | 0.5 |
| "Londn" | "Land Ho!" | 0.375 |
| "Londn" | "concatenate" | 0.272727272727272727272727 |
| "Londn" | "port" | 0.25 |
| "Londn" | "catnap" | 0.166666666666666666666666 |
| "Londn" | "cat" | 0.0 |
| "Londn" | "chart" | 0.0 |
| "Londn" | "chat" | 0.0 |
| "Londn" | "hat" | 0.0 |
| "Londn" | "part" | 0.0 |
| "cat" | "cat" | 1.0 |
| "cat" | "chat" | 0.75 |
| "cat" | "hat" | 0.666666666666666666666666 |
| "cat" | "chart" | 0.6 |
| "cat" | "part" | 0.5 |
| "cat" | "catnap" | 0.5 |
| "cat" | "concatenate" | 0.272727272727272727272727 |
| "cat" | "port" | 0.25 |
| "cat" | "Lando" | 0.2 |
| "cat" | "Land Ho!" | 0.125 |
| "cat" | "Londn" | 0.0 |
| "cat" | "London" | 0.0 |
| "cat" | "London Fog" | 0.0 |
--------------------------------------------------------
Update
The code above works in Apache Jena, but fails in Virtuoso, due to the fact that the pattern can match the empty string. E.g., if you try the following query on DBpedia's endpoint (which is powered by Virtuoso), you'll get the following error:
select (replace( "foo", ".*", "x" ) as ?bar) where {}
Virtuoso 22023 Error The regex-based XPATH/XQuery/SPARQL replace()
function can not search for a pattern that can be found even in an
empty string
This surprised me, but the spec for replace says that it's based on XPath fn:replace. The docs for fn:replace say:
An error is raised [err:FORX0003] if the pattern matches a zero-length
string, that is, if the expression fn:matches("", $pattern, $flags)
returns true. It is not an error, however, if a captured substring is
zero-length.
We can get around this problem, though, by adding a character to the beginning of both the pattern and the string:
select ?input
?string
(strlen(?match)/strlen(?string) as ?percent)
where {
values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
"concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }
values (?input ?pattern ?replacement) {
("cat" "^x[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$" "$1$2$3")
("Londn" "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5")
}
bind( replace( concat('x',?string), ?pattern, ?replacement) as ?match )
}
order by ?pattern desc(?percent)
--------------------------------------------------------
| input | string | percent |
========================================================
| "Londn" | "Londn" | 1.0 |
| "Londn" | "London" | 0.833333333333333333333333 |
| "Londn" | "Lando" | 0.6 |
| "Londn" | "London Fog" | 0.5 |
| "Londn" | "Land Ho!" | 0.375 |
| "Londn" | "concatenate" | 0.272727272727272727272727 |
| "Londn" | "port" | 0.25 |
| "Londn" | "catnap" | 0.166666666666666666666666 |
| "Londn" | "cat" | 0.0 |
| "Londn" | "chart" | 0.0 |
| "Londn" | "chat" | 0.0 |
| "Londn" | "hat" | 0.0 |
| "Londn" | "part" | 0.0 |
| "cat" | "cat" | 1.0 |
| "cat" | "chat" | 0.75 |
| "cat" | "hat" | 0.666666666666666666666666 |
| "cat" | "chart" | 0.6 |
| "cat" | "part" | 0.5 |
| "cat" | "catnap" | 0.5 |
| "cat" | "concatenate" | 0.272727272727272727272727 |
| "cat" | "port" | 0.25 |
| "cat" | "Lando" | 0.2 |
| "cat" | "Land Ho!" | 0.125 |
| "cat" | "Londn" | 0.0 |
| "cat" | "London" | 0.0 |
| "cat" | "London Fog" | 0.0 |
--------------------------------------------------------