Quantcast
Channel: SCN : Blog List - ABAP Development
Viewing all articles
Browse latest Browse all 943

A complete guide to OpenSQL statements - Step-by-step tutorial with screenshots

$
0
0

In my first blog post on SCN, I’d like to give an overview of the OpenSQL query syntax. Starting from the simplest SELECT statement I’d like to demonstrate all language elements of OpenSQL, gradually building up a very complex database query. For each step, I’ll provide a business example and a screenshot so you can see the result of each SELECT query. I have used the Flight demo database available in every SAP system, so you can test the queries for yourself.


Note: in this post I will not discuss performance-related topics in detail. That would make this post way too long. Maybe in a future post


So let’s begin!


Example 1: the simplest SELECT statement


There are three mandatory parts of a SELECT statement, basically defining what you want to read from which table and where to put the result:


1, After the SELECT keyword, we must specify the fields of the so called “result set” (“resulting set” in the SAP help). Here we define which fields we want to see in the result of the selection. In Example 1, we entered “*”, a special character that means that all fields from the defined table(s) will be returned. This is a handy feature but keep in mind if that you select all fields from a table that consists of 250 columns, and you use only five of them, that wastes a lot of CPU, memory and network resources. In this case it is better to list the five fields explicitly (especially if you select a lot of records).


2, the FROM clause defines which table(s) to read the data from. Here you must specify at least one  table or view that exists in the Data Dictionary. In the first example we will only access one table, and the later examples will use more tables.


3, The INTO clause defines where to put the results, which is usually called as a "work area". Here you must define the data object that will hold the result set using one of the options below:


- INTO x, where x is a structure.

- INTO TABLE y, where y is an internal table.

- APPENDING TABLE z, where z is an internal table. In this the result set is appended to the internal table (not clearing the existing contents).

- INTO (a,b,c, ... ), where a, b, c etc. are variables with elementary types


There are several instructions in the SAP help regarding the prerequisites of work areas (data type etc.) and assignment rules (automatic conversions etc.) which I don't want to copy and paste here. I'd anyway recommend either using a structure identical to the field selection, or using the "CORRESPONDING FIELDS OF" addition. This will assign the selected fields to the fields of the work area based on the field name (not from left to right).


For all constructs except INTO TABLE, if the result set is empty, the target remains unchanged.


Hint: the INTO clause can be omitted in a special case, when you use a group function to count the number of lines that match the WHERE clause. In this case system variable SY-DBCNT will hold the number of records found. See Example 5.


Note: The INTO clause is missing from the screenshots because the tool I’ve used automatically generates it. Anyway, all the examples would use the INTO TABLE variant to fetch all matching records at once.


Business requirement: we want to select all data for 100 bookings from the booking table. Really simple, right?


How to achieve this? Simply define "*" after the SELECT keyword, specify table SBOOK in the FROM clause and limit the number of records fetched using the “UP TO N ROWS” clause. This can be used to define the maximum number of lines we want the query to return. This is typically used for existence checks (UP TO 1 ROWS), but in this example we have limited the query to return maximum 100 records. Unless you specify the ORDER BY clause, this will return 100 arbitrary records, so you cannot know which 100 it will return.


1_simple_select.jpg

Screenshot 1: Simply select 100 bookings from table SBOOK


Example 2: Adding a WHERE clause and a table join


The WHERE clause


Usually SELECT statements do not read all the contents of a table (typical exceptions are small customizing tables holding a few records), but return only specific entries. The WHERE clause is used to filter the result set, or in other words tell the database which records to retrieve. Here you define a logical expression that the database will evaluate for each row in the database.


Business requirement: return only bookings of Lufthansa (field CARRID must contain “LH” ) and cancelled bookings are not needed (CANCELLED must be equal to space).


The database will evaluate this condition for each record in the table and if the condition is true, it will be placed into the result set. You can write much more complex logical conditions as we will see in the following examples. Any number of logical expressions can be linked to a logical expression using keywords AND or OR and the result of a logical expression can be negated using keyword NOT. Keep the order of evaluation in mind (ensure the proper use of parentheses). Simple operators used to compare field values are EQ, NE, GT, LT, GE and LE (equivalent to =, <>, >, <, >=, <=).


Table joins


The second interesting addition in this example is the table join. Often data required for a specific business process is stored in several tables. Although it could be an option to select data from each table using separate SELECT commands and combine the results using ABAP code executed on the application server, many times it is more convenient and performant to use only one SELECT statement to read all the tables at once.


Business requirement: read customer data from table SCUSTOM for the customer of each booking.


This is achieved using a so called “table join”: a construct that instructs the database to access a further table based on a condition. This so called “join condition” (or “join expression”) represents the logical link between the two tables. In this example it is the customer number: for each booking in table SBOOK the database will search for the customer data in table SCUSTOM based on the customer number.


Because we use “*” to define the fields of the result set and we read from two tables in this example, the result set contains all the fields from both tables.  If there is a field with the same name in both tables, only one will be returned (the one from the last table in the FROM clause - each join overwrites the field contents).


Note: the tool I’ve used for the screenshots automatically generates a separate field in this case. This is the reason why you can see duplicate fields.


The syntax of a join condition is almost the same like in a WHERE clause with some notable differences which I don’t want to copy and paste here from the SAP Help (can not use subqueries, must use AND to link logical expressions etc).


There are two kinds of table joins in OpenSQL: inner joins and outer joins. We will discuss the difference in the next example.


Hint: SELECT statements using table joins bypass SAP buffering.


2_inner_join_and_where_clause.jpg

Screenshot 2: adding a WHERE clause and a table join


Example 3: Adding data from another two tables


Fortunately OpenSQL allows to join multiple tables together at once: maximum 25 tables can be joined in a SELECT statement. Here in this example we add data from two more tables: T005T which holds country data and GEOT005S which contains geographic information of regions.


Business requirement: Display the country name instead of country code and display the latitude and longitude of the region of the customer.


One special thing in this example is the join condition of table T005T. This table is language dependant, so a language key is needed to get the textual description of a country in a specific language. Remember: the join condition tells the database how to get a record from table B based on a record from table A (A being the left hand side and B being the right hand side table). This is special now because we use the logon language of the current user from field SY-LANGU. All fields of the SY structure can be used in join conditions (current date, time, system ID etc.) as well as WHERE and HAVING clauses.


What would happen if we would omit the language key specification? The database would return multiple rows with the same booking, customer and region information. Why? Simply because there are more than one entries in table T005T for the same country. Let’s say we have two entries for country ‘DE’: it is ‘Deutschland’ in German and it is ‘Germany’ in English. In case of a German customer, the database engine would evaluate the join condition (T005T-LAND1 = SCUSTOM-COUNTRY) for both records, and both would be true, so two rows would be returned: one with English text and one with German text.


3_another_inner_join_geocode.jpg

Screenshot 3: multiple table joins. Notice that all lines contain a latitude and a longitude.


Example 4:  Another kind of table join - Left outer join


Here comes the difference between the two kinds of table joins in OpenSQL: the inner join and the outer join (for ex. “SELECT * FROM A INNER JOIN B” / “SELECT * FROM A LEFT OUTER JOIN B”).


Basically the difference is the behavior in case there is no corresponding entry in table B for a record in table A. In case of an inner join, there would be no record placed into the result set. In case of a left outer join, there would be a record in the result set, but all fields coming from table B would be empty.


This is behavior is very easy to see in this example: we added geographic coordinates to our query in the previous example using an inner join. Table GEOT005S contains coordinates for country regions. Whenever a record was not found in table GEOT005S for the region of a customer, the whole line was dropped from the result set. This is why you can only see customers with non-empty latitude and longitude.


In the current example we add the latitude and longitude of the region of the customer using a left outer join. As you can see in the screenshot, there is only one customer, for whom there are coordinates returned. For all other records, the DB did not find a suitable record in GEOT005S, so the coordinates are empty. If GEOT005S would be accessed using an INNER JOIN, these records would be excluded from the result set.


4_left_outer_join_error.jpg

Screenshot 4: table join using a left outer join. Notice that now customers with empty latitude and longitude appear on the list.


Note: if you have carefully checked the screenshot, you could notice that there is a customer with a region defined (‘IL’) but there are no coordinates displayed. The reason could be that there is no entry in GEOT005S for country ‘US’ region ‘IL’, but it is not the case. Somehow the standard flight demo database contains incorrect entries having an unnecessary space. This is the reason why the DB does not find the matching record, since ‘IL’ <> ‘ IL’.


5_error_reason.jpg

Screenshot 4 b: incorrect entries for region ‘IL’ in table GEOT005S


I corrected these entries using this simple update statement:


6_error_correction.jpg

After correcting the entries in table SCUSTOM, the query fills the coordinates for all customers in region ‘IL’:


7_left_outer_join_corrected.jpg

Screenshot 4 c:  table join using a left outer join after correcting the problematic database entries. Notice that now every customer which has a region defined has the coordinates filled.


Example 5: adding a simple group function


Many times the business is not interested in all individual data records, but want to see an aggregation based on a specific field. For example, the sum of all sales per salesperson, costs per project etc. In this case we have two options: either retrieve all relevant records from the database and perform the calculations on the application server using our own ABAP code, or perform the calculation using the database engine.


Business requirement: we want to see the number of bookings that match our selection criteria. (bookings of Lufthansa that are not cancelled).


In order to achieve this using the database, we have to add a GROUP BY clause and we have to define the aggregate function (also called as group function) COUNT in the SELECT clause.


The GROUP BY clause combines groups of rows in the result set into a single row. In this very simple case, we want to see the total number of bookings, not broken down by any other field.


The COUNT( * ) function determines the number of rows in the result set or in the current group. Right now we don’t have any groups defined so in our case it returns the total number of bookings.


8_simple_count_function.jpg

Screenshot 5: a simple group function counting the number of bookings that match the selection criteria.


Note: The COUNT function can be also used to determine the number of  different values of a specific field in the result set. In this case, simply put the field name in the parentheses and add the DISTINCT keyword. For ex. to count how many countries are the customers from, use “COUNT( DISTINCT country )” in the SELECT clause.


It is important to note that the group functions are always performed after the evaluation of the WHERE clause. The database engine first reads the records matching the WHERE clause, then forms groups (see next example) and then performs the group function.


Using the GROUP BY clause and aggregate functions ensures that aggregates and groups are assembled by the database system, not the application server. This can considerably reduce the volume of data that has to be transferred from the database to the application server. Of course on the other side, this needs more resources from the database.


Hint: With the use of GROUP BY, the statement SELECT avoids the SAP buffering.


Example 6: defining groups for the group functions


We’ve learned that group functions perform certain calculations on groups of database records. If we do not explicitly specify a group, then all the records in the result set are considered as one big group, as in the previous example.


Business requirement: display the number of (non-cancelled Lufthansa) bookings per customer.


In this case we form groups of database records based on the customer number by listing field ID in the group by clause. Also, we have to enter the ID field of table SCUSTOM in the SELECT clause before the group function.


9_group_by_clause_1.jpg

Screenshot 6: Grouping rows by customer ID. The COUNT group function is performed on every group to count the number of bookings for each customer.


As you can see, now the database engine returns as many records as many customer IDs we have in the result set, and the number of relevant bookings next to them. Exactly what we wanted.


Note: in all of our previous examples, we’ve used the “*” sign in the SELECT clause (field list). However, here we have to explicitly define the fields needed to create groups of records. It is mandatory to add the table name and the ~ sign before the name of a database field, if more than one table has a field with the same name.


Example 7: Adding extra information


Business requirement: display additional fields in the result list (name, city, country, region code, coordinates).


How to do it? Simply add them to the SELECT clause after the customer number before the COUNT function. Keep in mind to add the field to the GROUP BY clause too, otherwise you will encounter a syntax error. The fields you use to form groups must be in the SELECT clause, and nothing else should be in the SELECT clause that is not in the GROUP BY clause and is not a group function.


12_group_by_clause_4.jpg

Screenshot 6: additional fields are displayed in the list.


Note: Now you could wonder that in the previous example I told you that by adding fields before the group function is how we define groups, but here the number of bookings did not change. The reason is that we have added fields from tables that have a 1:1 relation to the customer. A customer has only one name, a city is in one country and region and has one pair of coordinates. If we would have chosen to add the Flight Class field (First class / Business / Economy), then the result set could contain more than one line per customer: as many lines per customer as many kind of flights he/she had. We will see how this works in example 15.


Example 8: Defining the order of records in the result set


You can use the ORDER BY clause in a query to define the sort order of the records returned. Simply list all the fields that you want to use as a sort criteria. You can use keywords ASCENDING and DESCENDING for each field to specify the sort mode (ASCENDING is the default so it can be omitted).


Business requirement: display the customers with the most bookings.


What do we do now? We sort the list by the result of the COUNT function in descending order and then by the name of the customer.

 

13_order_by_clause.jpg

Screenshot 7: the result set is sorted using the ORDER BY clause.


As you can see, there are several customers who have more than ten bookings (non-cancelled, Lufthansa).


Note: If all key fields are in the field list and a single database table is specified after FROM (not a view or join expression), the addition PRIMARY KEY can be used to sort the result set in ascending order based on the primary key of the table.


Example 9: Filtering based on a group function


Business requirement: the boss is only interested in customers having more than ten non-cancelled Lufthansa bookings.


How do we do this? I guess the first idea would be to add a new condition to the WHERE clause to filter records where the COUNT function is higher than ten. However, this will not work because of the OpenSQL (and SQL in general) language.


The reason is that the WHERE clause filters the database records before the groups are created by the database engine. After the groups are created, the group functions are calculated and the result set is created. The WHERE clause cannot be used to filter based on the group function results.


In cases like these, the HAVING clause must be used. This is similar to the WHERE clause, but the difference is that it is evaluated after the groups are created and group functions are performed. To simply put: to filter based on the result of group functions, the HAVING clause must be used (also called as group condition).



14_having_clause.jpg

Screenshot 8: using the HAVING clause to filter the result set based on group functions.


As you can see on the screenshot, now only 23 records returned, although we have allowed to have 100 records by using the UP TO N ROWS addition. So this means that there are 23 customers having more than ten non-cancelled Lufthansa bookings.


Note: If you don’t specify any groups using the GROUP BY clause, the HAVING clause will consider the whole result set grouped into one line. For a quick example assume we have 10 records in table SCARR. The query “SELECT COUNT( * ) FROM scarr HAVING count( * ) GT 0” will return one single line with the number of records in the table, but the query “SELECT COUNT( * ) FROM scarr HAVING count( * ) GT 10” will not return any lines (empty result set).


Example 10: using subqueries


The possibility to combine multiple SELECT statements into one is a very handy feature in OpenSQL. This is mostly used when you don’t know for exactly the filter criteria during design time or when you have to use a comparison to a dynamically calculated value.


Business requirement: the list must include only customers who have never cancelled a flight (any airline, not only Lufthansa). At the first glimpse, you could logically ask that this is already done, since we have “cancelled EQ space’ in our WHERE clause. This is not correct, because this only influences our group function so that only non-cancelled bookings are counted. This means that if a customer has 20 bookings with one cancelled, he/she will be on our list with 19 bookings. According to our requirement, we don’t want this customer to be on our list, so how do we achieve that?


An easy way to solve this is to add a so called subquery to our WHERE clause which checks if there is a cancelled booking for the customer.


Subqueries in general


Basically a subquery is a SELECT statement in parentheses used in a logical expression. There is no need to have an INTO clause because the result of the subquery will be processed by the database engine.


How are the results of subqueries evaluated? This depends on if the subquery is correlated or not. If a subquery uses fields from the surrounding SELECT statement in its WHERE condition, is called a correlated subquery. In this case, the result of the subquery will be evaluated for each line in the result set of the surrounding SELECT statement (in which’s WHERE clause the subquery is placed). This implies that the subquery is for each record in the result set of the surrounding SELECT statement. On the other hand, a subquery without any reference to the surrounding SELECT statement is executed only once.


There are different operators which one can use with subqueries, we will use the “EXISTS” operator (negated) in this example. I’ll discuss the others in Example 13.


Subqueries can be nested which means that you can put a subquery in the WHERE clause of a subquery. We will see an example of this later in Example 13.


Now take the current example: we need to check if a customer has a cancelled booking, so we have to create a relation between the customer ID in the result set of our outer SELECT statement and the subquery (so we use a correlated subquery). This is done by adding a condition to the WHERE clause of the subquery to match the customer IDs.


15_subquery.jpg
Screenshot 9: using a subquery. As you can see, now only 20 records are returned, so three customers had a cancelled flight on our previous list.


Table aliases


Notice that here we must use so called “table aliases” because we select from the same table in both the surrounding query and the subquery. This means that we must somehow explicitly define the table name for the fields to be compared, otherwise the DB engine would not know which table field do we refer to. This is done with the use of table aliases. Basically you can give a name to your tables and refer to them using the alias. Here I’ve defines “sq” as an alias for the subquery. You have to use the ~ character between the table alias and the field name (and of course between the table name and field name as in the previous examples).


Notes:

  • Subqueries cannot be used when accessing pool tables or cluster tables.
  • the ORDER BY clause cannot be used in a subquery.
  • If a subquery is used, the Open SQL statement bypasses SAP buffering.


Note: subqueries can be used in the HAVING clause too as seen in example 14.


Hint: we could have solved the requirement without a correlated subquery. In this case, the subquery would select all the customers who had a cancelled booking, and the surrounding SELECT statement would check every customer if it is in the result set of the subquery:


Simplified example:


SELECT ...WHERE customid NOT IN ( select customid from sbook where cancelled EQ 'X' ).
is equal to
SELECT ...WHERE not exists ( select bookid from sbook as sq where sq~customid EQ sbook~customid and cancelled EQ 'X' )

Example 11: Special operators


In all the previous examples we have only used the ‘EQ’ (same as ‘=’) operator and the ‘GT’ (>=) to compare field values. However, there are more complex ones too.


BETWEEN


This operator is very simple, it is basically “<=” and “>=” together.


Business requirement: exclude only customers from our list, who have a cancelled bookings in the first quarter of 2005.


The syntax for this is “field BETWEEN value1 AND value2”.


Hint: from the
business perspective we expect more customers in the result set, since we exclude less customers due to a more restrictive subquery.


16_between_operator.jpg

Screenshot 10: using the BETWEEN operator. As you can see, there are 21 customers on our list, so there is one customer who appears again (had cancellation before or after Q1 of 2005).


LIKE


This expression is a very flexible tool for character string comparisons based on a pattern. The pattern can be defined using wildcard characters: "%" represents any character string (even an empty one) and "_" represents any single character. The LIKE expression is case sensitive, and blank characters at the end of the pattern are ignored (LIKE ‘__’ is equal to LIKE ‘__   ‘).


Business requirement (quite strange…): restrict our list further to only contain customers with a name starting with “A”.


We add “name LIKE ‘A%’” to the WHERE clause to achieve this.


17_like_operator.jpg

Screenshot 11: using the LIKE operator to filter the list based on the name of the customer.


Note: What to do if we want to search for records that contain ‘_’ in a specific field? Since this is a reserved symbol, we have to use the addition ESCAPE, which allows an escape character can be defined. This escape character cancels the special functions of wildcard characters (simply place the escape character before the wildcard character to be cancelled).


A quick example: select all material numbers which contain an underscore:


Wrong approach (returns every material number):
SELECT matnr FROM marawhere matnr LIKE '%_%'


Good approach:
SELECT matnr FROM marawhere matnr LIKE '%~_%' ESCAPE '~'


Hint: It is not possible to specify a table field as a pattern.


Business requirement (even more strange): now we want to only see customers with a name starting with ‘A’ and having ‘d’ as the third letter.


18_like_operator_2.jpg

Screenshot 12: using the LIKE operator with both special characters “%” and “_”. As you can see, we still have three customers who match all our selection criteria.


IN


This operator allows you to compare a field to a set of fixed values. This comes handy as it can replace a longer and much less readable expression:


“field IN (‘01, ‘03, ‘05’)” is equal to the much longer “field EQ ‘01’ or field EQ ‘02’ or field EQ ‘03’”


Business requirement: extend our selection to customers of American Airlines and United Airlines too.
19_in_operator.jpg

Screenshot 13: using the IN operator to count bookings of other airlines too. As expected, we now have much more customers who match the less restrictive selection criteria.


Note: the IN operator can be used with a selection table too, as seen in chapter 17.


Example 12: Other group functions


So far we have only used the COUNT group function to count the number of bookings that match our selection criteria. There are a total of five group functions available in OpenSQL. The remaining four that we haven’t seen yet are all mathematical calculations: SUM, MIN, MAX and AVG that calculate the total, minimum, maximum and average of a field respectively.


There are some restrictions related to the use of group functions:

  • If the addition FOR ALL ENTRIES is used in front of WHERE, or if cluster or pool tables are listed after FROM, no other aggregate expressions apart from COUNT( * ) can be used.
  • Columns of the type STRING or RAWSTRING cannot be used with aggregate functions.
  • Null values are not included in the calculation for the aggregate functions. The result is a null value only if all the rows in the column in question contain the null value.


Business requirement: the boss wants to see the total, average, minimum and maximum of the booking price for each customer (in the currency of the airline).


20_other_group_functions.jpg

Screenshot 14: using all group functions (MIN, MAX, SUM, AVG, COUNT).


Note: just like with the COUNT function, the DISTINCT keyword can be used to perform the group function only on distinct values (so the result of SUM( DISTINCT A ) for two records having value 10 in a field A would be 10).


Note: the data type for AVG and SUM must be numerical. The data type of MIN, MAX and SUM is the data type of the corresponding table field in the ABAP Dictionary. Aggregate expressions with the function AVG have the data type FLTP, and those with COUNT have the data type INT4.


Hint: the tool I’ve used for demonstration replaces the field types of these aggregate functions for a more user friendly display (instead of the exponential notation).



Example 13: Nesting subqueries


Subqueries can be nested which means that you can put a subquery in the WHERE clause of a subquery. A maximum of ten SELECT statements are allowed within one OpenSQL query (a SELECT statement may have maximum nine subqueries).


Business requirement: exclude only customers who have cancelled bookings in Q1 of 2005 and the language of the travel agency is English, where the cancellation was made. Pretty awkward, but I had to figure out something


We can implement this logic using a nested subquery: we add this criteria to the WHERE clause of the outer subquery (select all agency numbers where the language is English). This is different from our previous subquery, because of the keyword we use for evaluating it.


Logical expressions for subqueries


- EXISTS: this is what we have used in our first subquery. This returns TRUE if the subquery returns any records (one or more) in its result set, otherwise this returns FALSE.


- EQ, GT, GE, LT, LE: these operators can be used to compare a field with the result of the subquery. If the subquery returns more than one row, obviously the database engine will not know which one to use for comparison: a non-catchable exception will occur.


- In order to use subqueries that return multiple rows, you either have to use the IN operator (checks if the field is equal to any of the values returned by the subquery) or one of the ALL, ANY, and SOME keywords together with EQ, GT, GE, LT or LE. These will influence the comparison in a pretty self-explaining way: the comparison will be carried out with all the records returned by the subquery, and the comparison will return TRUE if all (ALL) or at least one (ANY, SOME) records return TRUE for the comparison. There is no difference between keywords SOME and ANY.


What result do we expect? Since our outer subquery is used to filter out customers with cancelled bookings, a less restrictive subquery (achieved by the nested subquery) would mean more customers on our list. Practically: the less agencies we include in our search for cancellations, the more customers we get on the list.


21_nesting_subqueries.jpg

Screenshot 15: nesting subqueries. As you can see, we actually have one more customer on our list, who cancelled his booking at an agency where the language is not English.


Example 14: HAVING and GROUP BY in a subquery


Business requirement: only exclude customers who have at least three cancellations (Lufthansa flight in Q1 of 2005 at an English speaking agency).


Since we have to count the number of these bookings, we have to use group function COUNT and group the bookings by the customer ID. This way we get the number of matching bookings per customer. Then we simply add the HAVING clause to make sure we only exclude customers having more than two cancellations from our main query.


We can expect to have more customers in our result set, since we have a more restrictive subquery that we use to filter out customers.


 

22_having_gb_in_subquery.jpg

Screenshot 16: using the GROUP BY and HAVING clauses in a subquery. As you can see, we have two more customers on our list (who have one or two matching cancelled bookings).


Example 15: Let’s extend the GROUP BY clause


Business requirement: include only customers who have more than 10 bookings for the same airline. It doesn’t matter which, but it should be more than ten.


So far we have counted all the bookings of the customers who satisfy all our criteria (for example having more than 10 bookings of any airline). This could be like someone having 5 bookings for Lufthansa, 5 for American Airlines and 2 for United Airlines (the total being higher than 10). Now we want to see something like 11 for Lufthansa.


It is very simple to solve this by adding the airline code (CARRID) to the field list of our main query. Remember, the database engine will create groups of records based on all fields listed before the first group function in the field list (SELECT clause). If we add the airline code here, groups will be made per airline for each customer and the COUNT function will count the number of bookings per customer and airline.


What changes do we expect in our result set? There should be much less customers on our list, because they must have more than ten bookings for the same airline.


23_add_carrid_to_group_by.jpg

Screenshot 17: adding the carrier ID to the GROUP BY clause (and the SELECT clause as well).


The result shows exactly this: we only have three (very loyal) customers who match our selection criteria. Notice that the highest number of bookings is now only 12, while in the previous example it was 19.


Example 16: Going back to the LEFT OUTER JOIN


In order to have some coordinates displayed in the result list, we make two changes:


- change the WHERE condition of the main query: instead of checking the name of the customer and the airline code, we select customers from the US. This way we will have much more customers on our list (100 which is limited by the UP TO N ROWS addition) and since they are from the US, we will see some region codes for the customers (coordinates are maintained for the US regions).


- remove the HAVING clause to include customers with less than 11 matching bookings.


24_remove_having_clause.jpg

Screenshot 18: first change: removing the check to have at least ten bookings.


25_change_where_clause.jpg

Screenshot 19: second change: select customers from the US.


Now you can see again the behaviour of the LEFT OUTER JOIN: coordinates are filled for all records, where the region code is filled and coordinates are found for the region in table GEOT005S.


Example 17: Using a selection table in the WHERE clause


Selection tables are used to define complex selections on a field. They are mostly used together with selection screens (using statement SELECT-OPTION). Any selection the user makes on the user interface will be converted to a complex logical expression using the operators that we have worked with in this tutorial (EQ, GT, LE, GE, LT, NE, BETWEEN, NOT etc.). This conversion is made by the OpenSQL engine automatically.


In order to compare the values of a table field with a selection table, you have to use the “IN” operator.


Business requirement: only count bookings of Business Class and Economy Class (‘C’ and ‘Y’) in a complex time range.


26_use_range_tables.jpg

Screenshot 20: performing complex selections using selection tables. Notice the “IN” keyword used as a comparison operator.


Note: the tool used for this demonstration offers a UI to define the selection tables as on selection screens. Also, the generated WHERE clause is visible on the right side, next to the selection tables R_SBOOK_FLDATE and R_SBOOK_CLASS.


Example 18: The “FOR ALL ENTRIES IN” construct


This construct is widely used in ABAP and is similar to a table join in a way that it is used to read data from a table for records we already have (typically selected from another table or returned by a Function Module). However, there are big differences between the two constructs.


The “FOR ALL ENTRIES IN internal table” construct allows you to use an internal table as the basis of your selection, but not like a selection table from Example 17. If you use this addition, you can (actually, must) refer to the fields of the internal table in the FOR ALL ENTRIES IN clause to perform comparison with the fields of the database table(s) being read. Naturally the fields used in the comparison must have compatible data types.


As this construct is ABAP specific, there is a mechanism that translates the OpenSQL command to one or more native SQL statements. Actually the WHERE clause(es) that will be passed to the database engine will be generated based on the contents of the internal table and the WHERE clause you define. There are several profile parameters that influence this conversion, which you can check in SAP Note 48230 - Parameterization for SELECT ... FOR ALL ENTRIES statement.


The main difference between table joins and this construct is that table joins are carried out by the database server and all data is passed to the application server at once. On the other hand, in case of a FOR ALL ENTRIES IN construct, the entire WHERE clause is evaluated for each individual row of the internal table. The result set of the SELECT statement is the union of the result sets produced by the individual evaluations. It is very important to note that duplicate records are automatically removed from the result set (but on the application server and not on the database server).


Syntactically, there is a difference that you have to use the “-” sign instead of the “~” sign between the internal table name and the field name of the internal table in the WHERE clause.


Very important note: If the referenced internal table is empty, the entire WHERE clause is ignored and all lines from the database are placed in the result set. Always make a check on the internal table before executing a select query using this construct.


Business requirement: read airline information for all airlines that appear on our list.


How to implement this? We already have a SELECT statement from the previous example, so create a second SELECT statement using the FOR ALL ENTRIES IN construct. Simply add the carrier ID as a link into the WHERE clause (similar to the join condition in case of table joins) and that’s it.


27_for_all_entries.jpg

Screenshot 21: using the “FOR ALL ENTRIES IN internal table” construct.


Note: the tool I’ve used for demonstration uses “outer_table” as the name of the internal table. The contents of it are coming from the select query of example 17.


Note: As of release 6.10, the same internal table can be specified after FOR ALL ENTRIES and after INTO. However, be careful because in this case all fields of the internal table that are not filled by the SELECT query will be cleared.


Note: performance-wise there are endless discussions on SCN if a table join or the FOR ALL ENTRIES IN construct is better. It really depends on the buffering settings of the tables you select from, the fields you use for joins and selections, the indexes that are available, the number of records in both tables, profile parameters of the SAP system etc. In general I prefer joins since it is a “tool” that is designed especially for the purpose of reading data from multiple tables at the same time and it is done on the database layer. Of course certain situations are against a table join. Also, you have no choice if you use a BAPI/Function Module/Class method to get records from the database, since obviously in that case you cannot use a table join but you have to use the FOR ALL ENTRIES IN construct.


Other keywords


SINGLE and FOR UPDATE


If you use the SINGLE addition, the result set will contain maximum one record. If the remaining clauses of the SELECT statement would return more than one line, only the first will be returned.


The FOR UPDATE addition can be used only with the SINGLE addition, which you can use to set an exclusive lock for the selected record. However, this is rarely used and I also prefer using lock function modules separately.


Note: The addition SINGLE is not permitted in a subquery and the ORDER BY clause can not be used together with it.


CLIENT SPECIFIED


This addition switches off the automatic client handling of Open SQL. When using the addition CLIENT SPECIFIED, the first column of the client-dependent database tables can be specified in the WHERE and ORDER BY clauses.


BYPASSING BUFFER


This addition causes the SELECT statement to bypass SAP buffering and to read directly from the database and not from the buffer on the application server.


ENDSELECT


A SELECT statement may retrieve database records one by one (functioning as a loop using keyword INTO), or together at once (this is called “array fetch” and is used with keyword INTO TABLE/APPENDING TABLE). In the first case the ENDSELECT statement closes the loop started with SELECT. Both constructs retrieve the same result.


Out of scope for this blog post


Database hints


Basically using hints is used to specify how the database engine should execute our query. If you omit this, then the DB engine will use its own optimizer to determine the best strategy to execute the SELECT statement. Using hints to override this default strategy is quite frequent outside of SAP, but it is seldom used with ABAP.


One reason is that not many developers know that it is possible to use hints with OpenSQL statements (not only with native ones). Also, there are certain drawbacks (problems during DB upgrade or change of DB server) and there is more possibility for human errors.


There is a very good overview of database hints in SAP Note 129385 - Database hints in Open SQL


Dynamic token specification


It is possible to assemble OpenSQL statements during runtime. So instead of coding a static SELECT statement, you can use character-type variables to hold the SELECT, FROM, WHERE etc. clauses. This may come handy in certain cases, but it has disadvantages regarding performance, maintainability and code readability. Also, there are certain SAP release dependent restrictions on the allowed syntax elements. There are some nice materials on SCN and other sites that deal with this topic.


Conclusion


As you can see, even though Open SQL is a very limited subset of the modern SQL language, it still allows you to execute quite complex queries. In most cases the whole business logic of a complex report cannot be mapped into a single select query, however if you know what possibilities you have, you can write much more elegant and compact program code with better performance.


Thanks for reading and have fun using Open SQL.





Viewing all articles
Browse latest Browse all 943

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>