urlencode with only built-in functions
php urlencode (4)
Here is pretty short version, and it's even "pure SQL" function, not plpgsql. Multibyte chars (including 3- and 4-bytes emoji) are supported.
create or replace function urlencode(in_str text, OUT _result text) returns text as $$ select string_agg( case when ol>1 or ch !~ '[0-9a-za-z:/@._?#-]+' then regexp_replace(upper(substring(ch::bytea::text, 3)), '(..)', E'%\\1', 'g') else ch end, '' ) from ( select ch, octet_length(ch) as ol from regexp_split_to_table($1, '') as ch ) as s; $$ language sql immutable strict;
Without using plpgsql, I'm trying to urlencode a given text within a pgsql SELECT statement.
The problem with this approach:
select regexp_replace('héllo there','([^A-Za-z0-9])','%' || encode(E'\\1','hex'),'g')
...is that the encode function is not passed the regexp parameter, unless there's another way to call functions from within the replacement expression that actually works. So I'm wondering if there's a replacement expression that, by itself, can encode matches into hex values.
There may be other combinations of functions. I thought there would be a clever regex (and that may still be the answer) out there, but I'm having trouble finding it.
select regexp_replace(encode('héllo there','hex'),'(..)',E'%\\1','g');
This doesn't leave the alphanumeric characters human-readable, though.
decodeURIComponent for postgres
You're not making yourself too clear, but maybe this will be of interest for you.
Here's a function I wrote that handles encoding using built in functions while preserving the readability of the URL.
Regex matches to capture pairs of (optional) safe characters and (at most one) non-safe character. Nested selects allow those pairs to be encoded and re-combined returning a fully encoded string.
I've run through a test suite with all sorts of permutations (leading/trailing/only/repeated encoded characters and thus far it seems to encode correctly.
The safe special characters are _ ~ . - and /. My inclusion of "/" on that list is probably non-standard, but fits the use case I have where the input text may be a path and I want that to remain.
CREATE OR REPLACE FUNCTION oseberg.encode_uri(input text) RETURNS text LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ DECLARE parsed text; safePattern text; BEGIN safePattern = 'a-zA-Z0-9_~/\-\.'; IF input ~ ('[^' || safePattern || ']') THEN SELECT STRING_AGG(fragment, '') INTO parsed FROM ( SELECT prefix || encoded AS fragment FROM ( SELECT COALESCE(match, '') AS prefix, COALESCE('%' || encode(match::bytea, 'hex'), '') AS encoded FROM ( SELECT regexp_matches( input, '([' || safePattern || ']*)([^' || safePattern || '])?', 'g') AS match ) matches ) parsed ) fragments; RETURN parsed; ELSE RETURN input; END IF; END; $function$