Friday, July 16, 2010

CLOB to VARCHAR2 conversion problems on UTF8 databases

Some well-known facts:
  • VARCHAR2 has a max size of 32767 characters
  • UTF8 is a multi-byte character set, one character uses up to 4 bytes
  • 32767/4 = 8191.75
It looks like there are conversion problems when copying from a CLOB to a VARCHAR2 on UTF8 databases. If the VARCHAR2 gets assigned a CLOB (either directly or via SUBSTR(clob,...)) of more than 8191 characters, this raises VALUE_ERROR (see example below). If reading via DBMS_LOB.SUBSTR, the result gets truncated to 8191 characters. Reading via DBMS_LOB.READ works as expected.

An annoying inconsistency. DB was 11.1.0.6 on Windows.

CN@cn> select value from v$nls_parameters where parameter='NLS_CHARACTERSET';



VALUE
----------------------------------------------------------------
WE8MSWIN1252
 

CN@cn> declare
2 c clob;
3 v varchar2(10000);
4 begin
5 for i in 8191 .. 8192 loop
6 v := rpad('abc',i);
7 c := v;
8 dbms_output.put_line('Trying length '||i||'...');
9 v := c;
10 dbms_output.put_line('...ok');
11 end loop;
12 end;
13 /
Trying length 8191...
...ok
Trying length 8192...
...ok
PL/SQL-Prozedur erfolgreich abgeschlossen.
 

CN@cn> conn cn/test@cnutf
Connect durchgef▒hrt.
CN@cnutf> select value from v$nls_parameters where parameter='NLS_CHARACTERSET';
 

VALUE
----------------------------------------------------------------
AL32UTF8
CN@cnutf> declare

2 c clob;
3 v varchar2(10000);
4 begin
5 for i in 8191 .. 8192 loop
6 v := rpad('abc',i);
7 c := v;
8 dbms_output.put_line('Trying length '||i||'...');
9 v := c;
10 dbms_output.put_line('...ok');
12 end;
13 /
Trying length 8191...
...ok
Trying length 8192...
declare
*
FEHLER in Zeile 1:
ORA-06502: PL/SQL: numerischer oder Wertefehler
ORA-06512: in Zeile 9

2 comments:

Patrick Wolf said...

Hi Chris,

if you are working on a database where APEX is installed then you can use wwv_flow_utilities.clob_to_varchar2 to take care of this problem.

Regards
Patrick

Chris said...

Hi Patrick!

Thanks a lot for the pointer. As you know, I'd be more than happy to use APEX. I've already found a work-around for that particular problem, however (btw, my eyes are burning from looking at the code of clob_to_varchar2 *g*).

Greetings,
Chris