sql - PostgreSQL - Replace HTML Entities

Question

Welcome To Ask or Share your Answers For Others

sql - PostgreSQL - Replace HTML Entities

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

sql - PostgreSQL - Replace HTML Entities

I have just set about the task of stripping out HTML entities from our database, as we do a lot of crawling and some of the crawlers didn't do this at input time :(

So I started writing a bunch of queries that look like;

UPDATE nodes SET name=regexp_replace(name, '&#xe0;', 'à', 'g') WHERE name LIKE '%#xe0%';
UPDATE nodes SET name=regexp_replace(name, '&#xe1;', 'á', 'g') WHERE name LIKE '%#xe1%';
UPDATE nodes SET name=regexp_replace(name, '&#xe2;', 'a', 'g') WHERE name LIKE '%#xe2%';

Which is clearly a pretty naive approach. I've been trying to figure out if there is something clever I can do with the decode function; maybe grabbing the html entity by regex like /&#x(..);/, then passing just the %1 part to the ascii decoder, and reconstructing the string...or something...

Shall I just press on with the queries? There will probably only be 40 or so of them.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:59:18+0000

Write a function using pl/perlu and use this module https://metacpan.org/pod/HTML::Entities

Of course you need to have perl installed and pl/perl available.

1) First of all create the procedural language pl/perlu:

CREATE EXTENSION plperlu;

2) Then create a function like this:

CREATE FUNCTION decode_html_entities(text) RETURNS TEXT AS $$
    use HTML::Entities;
    return decode_entities($_[0]);
$$ LANGUAGE plperlu;

3) Then you can use it like this:

select decode_html_entities('aaabbb&amp;.... asasdasdasd &hellip;');
   decode_html_entities    
---------------------------
 aaabbb&.... asasdasdasd …
(1 row)

Categories

sql - PostgreSQL - Replace HTML Entities

sql - PostgreSQL - Replace HTML Entities

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags