Welcome to another ABAP Trapdoors article. If you are intersted in the older articles, you can find a link list at the bottom of this post.
There are various ways to handle XML data in ABAP, all of them more or less well-documented. If you need a downwards-compatible event-based parsing approach, for example, you might want to use the iXML library with its built-in SAX-style parser. (Note that iXML still constructs the entire document, so it's more like a DOM parser with a SAX event output attached to it. If you're looking for a strictly serial processing facility, check out the relatively new sXML library instead.)
The iXML documentation has a, let's say, distinctive writing style, and the library proudly distinguishes itself from the remaining ABAP ecosystem (for example, by using zero-based indexes instead of one-based lists in various places), but all things considered, it's a viable and stable solution. That is, if you observe the first rule of SAX: Size Does Matter. Consider the following example:
REPORT ztest_ixml_sax_parser. CLASS lcl_test_ixml_sax_parser DEFINITION CREATE PRIVATE. PUBLIC SECTION. CLASS-METHODS run. ENDCLASS. CLASS lcl_test_ixml_sax_parser IMPLEMENTATION. METHOD run. CONSTANTS: co_line_length TYPE i VALUE 100. TYPES: t_line TYPE c LENGTH co_line_length, tt_lines TYPE TABLE OF t_line. DATA: lt_xml_data TYPE tt_lines, l_xml_size TYPE i, lr_ixml TYPE REF TO if_ixml, lr_stream_factory TYPE REF TO if_ixml_stream_factory, lr_istream TYPE REF TO if_ixml_istream, lr_document TYPE REF TO if_ixml_document, lr_parser TYPE REF TO if_ixml_parser, lr_event TYPE REF TO if_ixml_event, l_num_errors TYPE i, lr_error TYPE REF TO if_ixml_parse_error. DATA: lr_ostream TYPE REF TO cl_demo_output_stream. " prepare the output stream and display lr_ostream = cl_demo_output_stream=>open( ). SET HANDLER cl_demo_output_html=>handle_output FOR lr_ostream. " prepare the data to be parsed lt_xml_data = VALUE #( ( '<?xml version="1.0"?>' ) ( '<foo name="bar">' ) ( ' <baz number="1"/>' ) ( ' <baz number="2"/>' ) ( ' <baz number="4"/>' ) ( '</foo>' ) ). " determine the size of the table - since the lines have a fixed length, that should be easy l_xml_size = co_line_length * lines( lt_xml_data ). " initialize the iXML objects lr_ixml = cl_ixml=>create( ). lr_stream_factory = lr_ixml->create_stream_factory( ). lr_istream = lr_stream_factory->create_istream_itable( table = lt_xml_data size = l_xml_size ). lr_document = lr_ixml->create_document( ). lr_parser = lr_ixml->create_parser( stream_factory = lr_stream_factory istream = lr_istream document = lr_document ). lr_parser->set_event_subscription( if_ixml_event=>co_event_attribute_post + if_ixml_event=>co_event_element_pre + if_ixml_event=>co_event_element_post ). " the actual event handling loop. lr_ostream->write_text( iv_text = 'iXML Parser Events' iv_format = if_demo_output_formats=>heading iv_level = 1 ). DO. lr_event = lr_parser->parse_event( ). IF lr_event IS INITIAL. " if either the end of the document is reached or an error occurred EXIT. ENDIF. CASE lr_event->get_type( ). WHEN if_ixml_event=>co_event_element_pre. lr_ostream->write_text( |new element '{ lr_event->get_name( ) }'| ). WHEN if_ixml_event=>co_event_attribute_post. lr_ostream->write_text( |attribute '{ lr_event->get_name( ) }' = '{ lr_event->get_value( ) }'| ). WHEN if_ixml_event=>co_event_element_post. lr_ostream->write_text( |end of element '{ lr_event->get_name( ) }'| ). ENDCASE. ENDDO. " error handling l_num_errors = lr_parser->num_errors( ). IF l_num_errors > 0. lr_ostream->write_text( iv_text = 'iXML Parser Errors' iv_format = if_demo_output_formats=>heading iv_level = 1 ). DO l_num_errors TIMES. lr_error = lr_parser->get_error( sy-index - 1 ). " because iXML is 0-based lr_ostream->write_text( |{ lr_error->get_severity_text( ) } at offset { lr_error->get_offset( ) }: { lr_error->get_reason( ) }| ). ENDDO. ENDIF. lr_ostream->close( ). ENDMETHOD. ENDCLASS. START-OF-SELECTION. lcl_test_ixml_sax_parser=>run( ).
You can copy this program into your system and execute it, it doesn't do anything harmful: It simply assembles a simple XML document (in a real application, you would get this from a file, a database, a network source - whatever), constructs an input stream around it, passes it to a parser and executes a parse-evaluate-print-loop until either the end of the output is encountered or something bad happens.
If your system is a non-unicode (NUC) system (you can easily check if this is the case using System --> Status), the program will run just fine, producing an output similar to the following image:
If your system happens to be a unicode (UC) system, the program won't behave quite the same way - you will get a rather nondescriptive error message (error at offset 0: unexpected symbol; expected '<', '</', entity reference, character data, CDATA section, processing instruction or comment).
It certainly does not help that the parser does not return an offset (or a line and column number) when assembling the error message. However, the events logged prior to the error messages provide a hint: The error always occurs after half of the lines of the table have been processed. You can easily verify this by changing the number of baz elements in the sample above. Since I've already mentioned that this issue occurs on UC systems only, it's now easy to deduce what went wrong here:
The iXML stream factory expects the size to be the number of bytes, not the number of characters. The code works as long as a character is represented by a single byte, but in UC systems, that's not the case. The solution - or maybe one of the solutions - is relatively simple:
" determine the size of the table for both UC and NUC systems l_xml_size = co_line_length * lines( lt_xml_data ) * cl_abap_char_utilities=>charsize.
This trapdor is a rather devious contraption because it will not be detected by the standard unicode checks and the error message is about as misleading as it can get. Also, whether you get to see the message at all depends on the actual implementation of the parsing program. If the original developer thought that error handling might be left to be implemented by those who follow - well, it's a long way down...
Older ABAP Trapdoors articles