Skip to content. | Skip to navigation

Sections
Personal tools
You are here: Home Accelerate Articles External Database Access

Use External DB Access with U2

External Database Access (EDA) has been created to allow U2 users the flexibility of storing some or all of their data in external relational database management systems (RDBMS). While EDA documentation describes all the EDA features and their usages, the purpose of this article is to explain how you can use EDA features to satisfy various requirements and to discuss some common issues that arise when converting and using EDA files.

Level: Intermediate

Yinghong Fei (yfei@rs.com), Software Engineer, Rocket Software

07 Feb 2008

Introduction

EDA allows you to store U2 data in an external RDBMS, such as DB2® or Microsoft® SQL Server, and still access it from U2 applications that use the non-first normal form (NF2) data model.

U2 provides a variety of ways to expose the U2 data to relational applications through Open Database Connectivity (ODBC), JDBC, Object Linking and Embedding Database (OLE DB), ADO.NET, and so on. Such relational applications access U2 data as if it were first normal form (1NF) data. In addition, the Basic SQL Client Interface (BCI) allows U2 applications to access external RDBMS data using Basic programs and ECL/TCL commands through the ODBC interface.

Unlike the two methods above, EDA provides a U2 user with a transparent way to store and access the data in an external RDBMS. Once an EDA file is created, mapped, and converted, the file is used as a normal U2 file in Basic programs and ECL/TCL commands. The U2 database engine does all the mapping, command translation, and optimization.

This article focuses on how the U2 engine handles the data model mapping, command translation, and query and record I/O optimization.


EDA mapping

Data model

U2 databases support the NF2 data model, also called a multi-value data model, which allows one or more values to be stored in each attribute. Most external RDBMSs support the 1NF data model only, in which each attribute can only contain one value.

In order to store NF2 data in a 1NF database, EDA needs to do the data model mapping. A group of multi-valued attributes are mapped into a table in an external RDBMS. To preserve the order of multi-values, an index to these multi-values also needs to be stored in the RDBMS.

Each U2 file record contains its record ID (@ID) and 0 or more single-valued, multi-valued, and multi-subvalued attributes. The following sections describe the mapping of these attributes to external RDBMS tables and columns.

Single-valued attribute mapping

A primary table is created in the external RDBMS to contain the record ID (@ID) and single-valued attributes. The @ID column is the primary key of this table.

Multi-valued attribute mapping

Since in a 1NF database, each column of each row can only store one value, in order to store multiple values contained in a multi-valued attribute, a separate 1NF table is created in the external RDBMS, whose primary key consists of the record ID (@ID) and the index to the values in the multi-valued attribute (MV-POS). The @ID column of this table is a foreign key to the primary table. It connects multi-values stored in this table with the corresponding row of the primary table, and the MV_POS column stores the index to the values in the multi-valued attribute.

Multi-valued association

If several multi-valued attributes are associated, then all of the well-defined attributes of the same association can be defined in one RDBMS table. Non-associated, multi-valued attributes are independent and therefore must be stored in separate tables. Note that when two or more such non-associated attributes are used in the same query, each multi-valued attribute table is equal-joined with the primary table on the @ID column and a Cartesian product is produced between the multi-valued attributes tables for each @ID. This can negatively affect the performance. For more information, refer to the Performance considerations section.

Multi-subvalued attribute mapping

In U2, multi-subvalued attributes are always considered to have a child-parent relationship with multi-valued attributes of the same association. Similar to the multi-valued attributes mapping, multi-subvalued (MS) attributes are stored in a separate external RDBMS table, whose primary key consists of the record ID (@ID) and two columns containing the index to multi-valued attributes (MV_POS) and the index to multi-subvalued attributes (MS_POS). The combination of @ID and MV_POS columns is the foreign key to the multi-valued attributes table. It connects multi-subvalues stored in this table with the corresponding row of the multi-valued attribute table, since multi-subvalued attributes are always associated with multi-valued attributes of the same association.

Outer join

In order to retrieve U2 data stored in the external RDBMS, as described above, the EDA engine must join a primary table with multi-valued and multi-subvalued attribute tables on @ID. When there are no values in a multi-valued attribute of a particular record, there is no record stored in the multi-valued attributes table for this record ID. If an inner-join is used to retrieve the data, no data for this record ID is returned. Therefore, the EDA engine needs to use OUTER-JOIN between the primary table and the multi-valued attributes table and between the multi-valued attributes table and the corresponding multi-subvalued attributes table.

The following is a modified STUDENT file in the UniData DEMO account, where the single-valued attribute ADVISOR is replaced with a non-associated multi-valued attribute ADVISORS:

               
@ID......... TYP      LOC...    CONV   MNAME.........  FORMAT    SM     ASSOC.....
ID           D        0                STUDENT         12R###-## S
                                                       -####
LNAME        D        1                 Last Name      15T       S
FNAME        D        2                 First Name     10L       S
MAJOR        D        3                 Major          4L        S
MINOR        D        4                 Minor          4L        S
ADVISORS     D        5                 Advisor        8L        MV
SEMESTER     D        6                 Term           4L        MV     CGA
C_NBR        D        7                 Crs #          5L        MS     CGA
C_GRD        D        8                 GD             3L        MS     CGA
GPA1         V SUBR('GPA1',C  MD3       GPA            5R        S
               OURSE_HOURS
               ,COURSE_GRD)
 

If all dictionary attributes are chosen to be explicitly mapped to EDA, then the S-MV-MS mapping hierarchy, will look like Figure 1.

Note: The STUDENT record may contain more than eight attributes. The remaining attributes that are not described in the dictionary are stored in an extra column of the primary table, called RECORD_BLOB, whose purpose is to store unmapped attribute values.

There are four tables created in the external RDBMS, as shown in listings 2, 3, 4, and 5:

               
STUDENT                         primary table 
                ID              p_key
                LNAME         
                FNAME
                MAJOR
                MINOR
                RECORD_BLOB
               
STUDENT_ADVISORS                MV table
                ID              p_key    f_key (STUDENT)
                MV_POS          p_key
                ADVISORS
               
STUDENT_CGA_MV                  MV table
                ID              p_key   f_key (STUDENT)
                MV_POS          p_key
                SEMESTER
               
STUDENT_CGA_MS                  MS table
                ID              p_key   f_key (STUDENT_CGA_MV)
                MV_POS          p_key   f_key (STUDENT_CGA_MV)
                MS_POS          p_key
                C_NBR
                C_GRD
               
STUDENT..... Last Name...... First Name Major Minor Advisor. Term   Crs #    GD.

414-44-6545  Offenbach       Karl       CS    PY    Otis     FA93   CS104    D
                                                    Eades           MA101    C
                                                                    FA100    C
                                                             SP94   CS105    B
                                                                    MA102    C
                                                                    PY100    C
 

A U2 record like that shown in Listing 6, is inserted into the corresponding external RDBMS tables. as follows:

               
ID      LNAME             FNAME      MAJOR     MINOR    RECORD_CLOB
414446545       Offenbach         Karl           CS        PY
               
ID                MV_POS                ADVISORS
414446545         1                     Otis
414446545         2                     Eades
               
ID              MV_POS                SEMESTER
414446545       1                     FA93
414446545       2                     SP94
               
ID            MV_POS            MS_POS       C_NBR               C_GRD
414446545     1                 1            CS104               D
414446545     1                 2            MA101               C
414446545     1                 3            FA101               C
414446545     2                 1            CS105               B
414446545     2                 2            MA102               C
414446545     2                 3            PY100               C
 

To get the data from the external RDBMS into U2, the following SQL SELECT statement is sent to the RDBMS by the EDA engine:

               
SELECT  STUDENT.ID,
        STUDENT_ADVISORS.MV_POS,
        STUDENT_CGA_MV.MV_POS,
        STUDENT_CGA_MS.MS_POS,
        STUDENT.LNAME,
        STUDENT.FNAME,
        STUDENT.MAJOR,
        STUDENT,MINOR,
        STUDENT_ADVISORS.ADVISORS,
        STUDENT_CGA_MV.SEMESTER
        STUDENT_CGA_MS.C_NBR,
        STUDENT_CGA_MS.C_GRD
FROM  STUDENT  
        LEFT OUTER JOIN 
            STUDENT_ADVISORS
            ON (STUDENT.ID = STUDENT_ADVISORS.ID)
        LEFT OUTER JOIN
            STUDENT_CGA_MV
            ON (STUDENT.ID = STUDENT_CGA_MV.ID)
        LEFT OUTER JOIN 
            STUDENT_CGA_MS
            ON (STUDENT_CGA_MV.ID = TUDENT_CGA_MS.ID
                AND
                STUDENT_CGA_MV.MVPOS = STUDENT_CGA_MS.MV_POS)
ORDER BY 1, 2, 3, 4;
 

The returned result set is shown in Listing 12.

               
414446545  1 1 1 Offenbach       Karl          CS       PY    Otis   FA93   CS104   D
414446545  1 1 2 Offenbach       Karl          CS       PY    Otis   FA93   MA101   C
414446545  1 1 3 Offenbach       Karl          CS       PY    Otis   FA93   FA100   C
414446545  1 2 3 Offenbach       Karl          CS       PY    Otis   SP94   FA100   C
414446545  1 2 2 Offenbach       Karl          CS       PY    Otis   SP94   MA101   C
414446545  1 2 3 Offenbach       Karl          CS       PY    Otis   SP94   FA100   C
414446545  2 1 1 Offenbach       Karl          CS       PY    Eades  FA93   CS104   D
414446545  2 1 2 Offenbach       Karl          CS       PY    Eades  FA93   MA101   C
414446545  2 1 3 Offenbach       Karl          CS       PY    Eades  FA93   FA100   C
414446545  2 2 3 Offenbach       Karl          CS       PY    Eades  SP94   FA100   C
414446545  2 2 2 Offenbach       Karl          CS       PY    Eades  SP94   MA101   C
414446545  2 2 3 Offenbach       Karl          CS       PY    Eades  SP94   FA100   C
 

This result set is nested by the EDA engine to construct the U2 record shown above.

Strong data types and EDA non-conforming data

A U2 table consists of a data file and an associated dictionary. The dictionary provides formatting information for the query tools, but is optional and not used by U2 to enforce either the data typing or data integrity. There may be zero, one, or multiple definitions in the dictionary for a given U2 attribute.

EDA uses the conversion, formatting, and MV and MS information of a dictionary attribute to derive the corresponding external RDBMS column data type. The formatting and conversion information determines the corresponding external RDBMS data type. The MV and MS information determines the primary table, MV tables, and MS tables structure.

It is very important to create or choose a correct dictionary attribute for each U2 attribute intended to be mapped to EDA. A well defined U2 attribute should have accurate formatting, conversion, and MV and MS information in the dictionary.

All U2 data values are stored as character strings, even for numeric or date and time attributes. Attribute values have no length limitation. To store U2 data in external RDBMS strongly data typed columns, the EDA engine needs to map U2 data into the corresponding external RDBMS data types, and to handle the exceptions arising from the use of U2 data that is considered invalid for the specified external RDBMS data type or column length. If a U2 application tries to insert invalid data, the corresponding insert operation on the external RDBMS fails. Such cases are called nonconforming data cases in EDA.

EDA provides an option to store nonconforming data in an external RDBMS table in a CLOB format, so the data is not lost and all U2 operations, such as ECL commands and Basic statements, treat nonconforming data as normal, thus allowing a U2 application to continue running properly.

Nonconforming data cannot be comprehended by the external database applications though because the data is stored in a native U2 format, with possibly embedded value and sub-value marks. Users need to fix the invalid data in the external table in order to make it available to native RDBMS applications.

An ECL command SELECT.NONCONFORMING returns IDs of all records that contain nonconforming data.

U2 attributes that are not explicitly mapped

It is not required to map all U2 file attributes to distinct external database columns. There are various reasons for mapping only a subset of dictionary attributes. A U2 dictionary may only define attributes that are used in a U2 query, while the missing dictionary locations are used by U2 Basic applications that do not require a dictionary. Some attributes are intended to contain non-string type data, such as integer or date and time, and indeed contain proper values. However, other attributes of this kind may contain both integer and string data or a multi-valued attribute may have a valid date as its first value and a string as its second value. Both cases are easily handled by U2 applications, but the latter case produces an exception when the data is stored in a strongly typed external database.

EDA supports a selective mapping strategy. A user can begin the EDA mapping task by selecting attributes that are well defined and contain all valid data first. This help with reducing the amount of nonconforming data.

After a U2 file has been converted to an EDA file, a user can still work on attributes that have not been explicitly mapped by fixing the data, finding the correct data type, and mapping more attributes to EDA by using the remapping feature of the EDA Schema Manager tool.

Choosing carefully which U2 attributes to be mapped to EDA makes the conversion easier. You should first consider those attributes whose data need to be exposed to external RDBMS applications and also those that help to improve the performance, which is the topic of the next section.

Performance considerations

The EDA engine accesses the external database through a client-server interface, similar to ODBC. When compared to the performance of the U2 engine native access to data stored in U2 files, EDA is slower with reading, writing, and querying, since a client-server interface is slower than a direct database access. The NF2-to-1NF mapping is also a key part of this slowness. As shown in the example above, reading one record in U2 turns into a complex query on the external RDBMS side with outer join operations on several tables returning a set of result rows. The more non-associated multi-values a record has, the bigger the result set is, and the more associations are mapped in the EDA schema, the more tables are joined in the query. That is why selecting U2 attributes for mapping to EDA judiciously is important for performance considerations.

EDA also provides the WHOLE RECORD option to improve the performance of READ operations. The U2 Basic READ statement requires the retrieval of the entire record, which may result in a large SQL query being issued by the EDA engine, which in turn negatively affects the performance. When the WHOLE RECORD option is turned on for a U2 file, in addition to storing the mapped attributes data in the corresponding external table columns, the EDA engine stores the entire U2 record in an extra column, called RECORD_BLOB. For a Basic READ, the EDA engine retrieves the contents of the RECORD_BLOB column instead of executing a large outer join query.

As mentioned earlier, the EDA performance is also affected by the number of MV associations mapped to EDA. It is important to only select the absolutely necessary attributes for mapping to EDA and define their SM dictionary information correctly, so that the information, whether an attribute is single-valued, multi-valued, or multi-subvalued, is accurately represented in the dictionary. Such accuracy helps to improve performance.

EDA also provides a mechanism for mapping U2 virtual attributes. Mapping virtual attributes to EDA improves the performance of U2 queries.

 

 




EDA implementation details

Map a virtual attribute as a generated field

A generated field is an EDA mechanism for mapping a virtual attribute by evaluating it in U2 and storing the values in an extra column of the external RDBMS table.

In order to map a virtual attribute, drag it from the U2 File Dictionary pane to the EDA Map Schema pane and chooses the appropriate type in the Attribute Details pane, as described in the EDA manual. Choosing the type of DATA instructs the EDA Schema Manager tool to use the generated field mechanism to map this virtual attribute.

GPA1 is a virtual-type attribute in the dictionary. The EDA Schema Manager tool maps this attribute as a generated field when the type of DATA is specified in the Attribute Details pane, as shown above. GPA1 is evaluated by the U2 engine, and the values are stored in the external RDBMS. U2 access can be optimized when GPA1 is used in a query, especially if an index on the mapped column GPA1 is created on the external RDBMS side.

U2 index

If a U2 file has an index created on an attribute, it is advisable to map this attribute to EDA.

If an index exists on a data type attribute that has been selected for EDA mapping, then the EDA Schema Manager tool automatically creates an index on the corresponding column of the external database table during the conversion process.

If a virtual attribute is mapped as a generated field, then the EDA Schema Manager tool can create an index on the corresponding column of the external database table during the conversion process, if the full mapping is selected. In selective mapping mode, user can select the V-type attribute and map it to EDA as D-type column, and check the index box.

Handle an empty string

U2 databases can store a zero length string, called an empty sting, in attributes of all data types. In an RDBMS, an empty string cannot be stored in a numeric or a date/time column. EDA translates empty strings into NULL values when storing data in an external RDBMS; translates NULL values to empty strings when returning data to the U2 engine. U2 applications do not need to handle empty string specifically.

WITH/WHEN condition

If a U2 query has a WHEN/WITH condition on a mapped attribute, the condition is translated into a WHERE clause in the SQL SELECT statement shipped to the external RDBMS, which in turn improves the performance since the condition is applied on the engine that owns the data and the amount of data passed back to U2 maybe significantly reduced. The EDA engine translates U2 conditions according to the U2 condition semantics.

Empty strings in U2 conditions require a special handling. In U2, an empty string is smaller than any other value. All comparison operations, such as <, <=, and <>, include the empty string. This is different from SQL conditions, where all operations involving the NULL value produce a false outcome, except the IS NULL clause. Therefore, the EDA engine adds the IS NULL clause in order to retrieve NULL values and convert them to empty strings.

For example, the returned result set is:

LIST CUSTOMER NAME WITH DUE_DATE  < "01/01/07"
 

U2 returns the CUSTOMER record with the due date earlier than "01/01/07" and the due date that has no value.

The EDA engine translates this U2 query into:

SELECT NAME FROM   CUSTOMER   WHERE DUE_DATE  < "01/01/07" OR DUE_DATE IS NULL;
 

Visibility and updatability of data on the external RDBMS side

As mentioned earlier, all U2 data is stored in external tables. However, only EDA-mapped fields are visible to external RDBMS applications and tools. Only the columns containing data from the D-type attributes are updatable by external database applications.

Those U2 attributes that have not been mapped are stored in their native U2 format in a RECORD_BLOB column of the primary table. These attributes are not visible to external database applications, and cannot be updated by such applications, but only by U2 applications.

U2 virtual attributes that are mapped to generated fields are visible to external RDBMS applications and can be searched and indexed, but cannot be updated by such applications.

EDA files for which the WHOLE_RECORD option is turned on cannot be updated on the external database side because such updates result in a loss of data consistency due to different data being stored in the individual columns and in RECORD_BLOB.

When an EDA file contains a primary table as well as an MV table and an MS table, you need to be very careful during an INSERT or UPDATE from the external RDBMS side when providing the MV_POS and MS_POS value, and follow the specific order. For instance, insert into the primary table first, then insert into MV tables, then the MS tables.

The different ways to use EDA

In addition to the primary goal of EDA to convert and store U2 data in the external RDBMS tables and access it by both U2 and external database applications, there are other purposes for using EDA to simplify the task of accessing data stored in external database tables.

Normalization of U2 data

EDA provides a very easy to use and effective conversion tool for converting NF2 data into the 1NF format. There can be various reasons for needing to normalize U2 data.

Replicate a U2 file to an external RDBMS

If a user wants to make a copy of U2 data in an external database, then using the UniData replication might be a good choice. Convert the U2 file into EDA in the subscriber account. The updates performed on the original U2 file in the publisher account are then replicated to the EDA file in the subscriber account. The external database tables are then updated by this replication process.

Users only need to map D-fields that they want to be visible on the external database side since the data is only updated in the original U2 file and no updates are expected by external database applications. A U2 application operates on the original U2 file and the external database application operates on the external tables. There is no negative effect on the performance of either of these applications.

Import existing external database tables or views into EDA

BCI is the primary way to access existing external RDBMS tables from U2 applications. On the other hand, it is easier to use EDA than BCI since BCI requires the use of Basic extensions and a certain level of SQL knowledge. In order to import existing external database tables or views into EDA, you need to define a U2 dictionary containing the attributes matching the external table or view columns, run the EDA Schema Manager tool to generate the runtime mapping file, and the data becomes accessible by U2 Basic applications and ECL commands.


  

 Conclusion

This article discussed EDA mapping from NF2 to 1NF, the tips for getting better EDA performance, details about EDA implementation, and several creative ways that a U2 customer can use the EDA feature his or her business.

With EDA, U2 users can access the data stored in an external database much easier than before. When a U2 file is converted into an EDA file, the U2 application accesses the external data just like it would access U2 data normally, with no ODBC or SQL, like a statement involved in a BP program.

Back to top

Resources

Learn

Get products and technologies

 

About the authors

Yinghong Fei works for Rocket Software in Denver, Colo. She has more than 15 years of experience in developing Rocket U2 data servers. She covers several development areas, including data transfer from other PICK systems, internal memory usage, QUERY/SQL engine, query optimization, report formatting, UCI, the U2 ODBC server, and NF2/INF mapping mechanism. Recently, she has focused on External Database Access for U2.

Document Actions