Skip to content. | Skip to navigation

Sections
Personal tools
You are here: Home Accelerate Articles Data Encryption in UniData and UniVerse

Data Encryption in UniData and UniVerse

Recent government regulations and market demand have increased requirements for protecting data resources. Data encryption is an important tool to help application developers meet the need for greater security. This article describes the encryption options available in UniData® and UniVerse® database products and shows how they can be used to add data encryption into a U2 application.

Securing data in U2 applications

Level: Intermediate

Bill Winter ( bwinter@rs.com), U2 Sales Engineer, Rocket Software

12 Jan 2006

Over the last few years, data security has increasingly become a topic of interest for application developers and IT staff. For many companies, this interest is the result of new government regulations. In the USA, for example, federal legislation, such as the Health Insurance Portability and Accountability Act (HIPAA), and the Gramm-Leach-Bliley Act along with state and local legislation, such as California Senate Bill 1386, have made data security a major issue. Security needs are also driven by the marked rise in identity theft, by increased concerns about the privacy of personal data, and simply by general demand from customers and vendors. Companies are looking for ways to better protect their data resources against unauthorized access. Data encryption is a popular and powerful tool to help address the data security problem.

UniData and UniVerse database products (collectively known as U2) provide sophisticated encryption technologies to aid application developers in meeting data security requirements. These requirements can conveniently be divided between securing "data in transit" (for example, data moving between client and server or across a public or private network) and securing "data at rest" (data stored on disk media, backup tapes, CD-ROM, or other static media). The U2 database products provide encryption options appropriate for addressing both of these requirements.

What is encryption?

Encryption is the process of converting data from its original form, called the "plaintext," to an unreadable form, the "ciphertext," through the application of an algorithm called a cipher. Decryption is the reverse of this process; in decryption, a ciphertext is converted back into its corresponding plaintext. The purpose of encryption is to prevent unauthorized viewers from reading sensitive information, even if they have access to the ciphertext. Encryption has been used for thousands of years to secure data. Julius Caesar, for example, is known to have employed a simple substitution cipher to protect messages from unauthorized viewers. Modern cryptography uses mathematically sophisticated algorithms implemented in hardware or software to encrypt and decrypt data. These algorithms divide into various classes, two of which are of particular interest for securing data in transit and data at rest:

 

Symmetric key cryptography

In symmetric key cryptography, the plaintext is encrypted by submitting a fixed text, the "key," and the plaintext to the encryption algorithm, which then returns the ciphertext. Ciphertext is decrypted by submitting the key and the ciphertext to a decryption algorithm. The same key is used for both encryption and decryption, hence "symmetric key cryptography." Symmetric key cryptography is sometimes called secret key cryptography, as the security of the ciphertext depends upon the key being kept secret.

Symmetric key algorithms can further be divided into block ciphers and stream ciphers. Block ciphers work on blocks of data at a time, often in increments of 64 bits (eight bytes) of text. Stream ciphers, on the other hand, work on "streams" of bits -- theoretically in single-bit increments, though as implemented they usually operate on one byte of plaintext at a time -- combining a continuously generated "keystream" with the plaintext.

Block ciphers operate in various modes:

  • Electronic Code Book (ECB): In ECB mode, each block of plaintext is encrypted independently of any other block. While this is efficient (encryption of multiple data blocks can be performed in parallel), there are significant problems with this approach. For example, in ECB mode, encrypting identical blocks of plaintext always produces identical blocks of ciphertext, facilitating certain kinds of cryptanalytic attacks. ECB mode is generally considered unsuitable for securing sensitive data.
  • Cipher Block Chaining (CBC): In CBC mode, blocks of text are encrypted sequentially, with the result of the previous block encryption being used to alter the current plaintext block before it is encrypted. This process improves the characteristics of the cipher (for example, identical blocks of plaintext will not produce identical blocks of ciphertext), but due to the serial nature of the encryption process, CBC mode cannot support parallelization of the encryption. CBC mode uses an additional text called an Initialization Vector (IV) to begin the chaining process. The IV is used to alter the first plaintext block being encrypted.
  • Cipher Feedback (CFB): In CFB mode, the previous block is encrypted and then combined with the plaintext to produce the current block, effectively altering the key used to encrypt the current block. The result is a process similar to a stream cipher, as the key value is constantly changing, though performance is nowhere near as fast as a specialized stream cipher. As with CBC mode, an Initialization Vector is used to seed the encryption process.
  • Output Feedback (OFB): In OFB mode, a seed or IV is used to start the encryption process. This seed is encrypted, and the result is combined with a block of plaintext to produce the ciphertext. The encrypted seed is then encrypted again, and the process continues, propagating its way through the plaintext. Again, the result is similar to a stream cipher.

 

Public key cryptography

In public key cryptography, also known as asymmetric key cryptography, the process of encryption and decryption involves two keys: a "public" key used to encrypt data and a "private" key used to decrypt data. The public key is given to anyone who needs to send secured data to the holder of the private key. Since the ciphertext can only be decrypted by the private key, the encryption key generally doesn't need to be kept secret. This is a huge advantage over symmetric key cryptography in that there is no need for the sender of a message to exchange a secret key with the intended receiver. Only the recipient of a message needs to keep a secret key.

The primary disadvantage of public key cryptography is speed. Public key algorithms are very slow compared to symmetric key algorithms, making them impractical for applications where response time is a factor. Accordingly, encryption applications used to secure data in transit, such as the Secure Sockets Layer protocol, often use a combination of public key algorithms and symmetric key algorithms. Sender and receiver use a public key algorithm to negotiate and exchange a mutually acceptable secret key; that key is then used with a symmetric key algorithm to transmit messages between the two. For applications that are securing data at rest, public key algorithms offer no advantages over symmetric key algorithms, as the decryption key needs to be kept secret, just as it does in a symmetric key algorithm. Since the ability to retrieve data is normally as important as the ability to store data, access to the secret key is routinely required in securing data at rest. Public key algorithms can also be easily broken by chosen plaintext attacks when the domain of the encrypted data is relatively small.

For example, say that university grades are being encrypted with a public key algorithm. The grades have the usual values: A, B, C, D, and F. Since the encryption key is public, an attacker can simply encrypt these five values with the known encryption key and build a lookup table of the corresponding ciphertext to read anyone's grade without actually decrypting it. Applying a similar technique against credit card numbers or Social Security numbers is no more difficult -- it just takes more time and a larger lookup table. Consequently, symmetric key algorithms are normally used for securing data at rest.

Back to top


Securing data in transit

Using encryption to secure data in transit has become almost commonplace due to the growth of Internet-based applications. Customers have come to expect secure communications for transmitting credit card numbers, bank account numbers, and other such sensitive information across the Net.

The protocol most frequently used to secure communications across the Internet is Secure Sockets Layer (SSL), a protocol originally developed by Netscape and since widely adopted by software developers. SSL uses a public key infrastructure to negotiate a mutually acceptable symmetric key cipher and to agree upon and exchange encryption keys between two communication sources. These keys are then used to feed the agreed upon cipher to encrypt data being transmitted between the two sources, establishing a channel for secure communications.

The U2 databases have implemented SSL support for a number of their application programming interfaces (APIs) and communication protocols. U2 support for SSL is based on the set of libraries developed by the OpenSSL project. UniData 6.1 and UniVerse 10.1 support SSL secured communications for the following:

  • UniObjects for Java (UOJ): UOJ is a record-oriented programming toolkit for Java developers. UOJ can be configured to support SSL-secured connections between Java client programs and a U2 server.
  • JDBC: JDBC is a record set-oriented interface used by Java applications to access U2 using Structured Query Language (SQL).
  • CallHTTP: CallHTTP is an interface for BASIC applications to interact with a Web server through the standard Hypertext Transfer Protocol (HTTP).
  • Sockets: The U2 sockets interface is a low-level BASIC API for developing applications using the TCP/IP sockets protocol. The CallHTTP API is built on top of the sockets interface.
  • Telnet: The U2 SSL telnet service allows for secure telnet connections to a U2 database. Connection to this utility requires an SSL-aware telnet client, such as a current version (6.0 or higher) of Rocket U2 wIntegrate® terminal emulation product. Rocket U2 SBClient™ terminal emulation client will support SSL secure telnet at the 5.3.8 release.

 

UniData 7.1 extends client SSL support to include ODBC, OLE DB, and UniObjects. SSL support for these interfaces will be added to a future release of UniVerse.

Back to top

 


Securing data at rest

The employment of cryptographic methods for securing data at rest is still relatively uncommon. In addition, the techniques used to apply encryption to static data vary considerably, both in the selection of which algorithm to use and in where and how encryption is implemented.

Algorithms in use range from unsophisticated and easily compromised ones, like ROT13 (a transposition cipher that scrambles data by shifting alphabetic characters 13 positions), to industry standard algorithms, like DES (Data Encryption Standard) and AES (Advanced Encryption Standard). These algorithms may be applied at various levels in a computer system:

  • Hardware: Hardware-based encryption options for securing data have become significantly more popular in recent years. Servers using external storage systems, such as Network Attached Storage or Storage Area Networks, for example, have the option of installing encryption appliances that transparently encrypt and decrypt data moving to or from storage. Some of these devices also claim to operate with Direct Attached Storage (RAID arrays, for example) as well. These appliances, available from companies such as Decru, NeoScale, and Vormetric, implement very fast encryption -- they typically claim to operate at or close to "wire speed." They also usually include sophisticated key management facilities for controlling the creation and distribution of cipher keys, cycling of keys, and disposition of keys, besides handling general data encryption.

    There is a downside, of course. The major drawback to implementing hardware-based encryption is cost. Encryption appliances are currently priced in the tens of thousands of dollars, not including installation and training, making them an unattractive option for smaller companies.
  • Operating system: Many operating systems now support encrypting file systems. Microsoft, for example, introduced the Windows Encrypting File System (EFS) with the Windows 2000 operating system release. Linux and UNIX systems have a variety of encrypting file systems from which to choose, including the Cryptographic File System, the Transparent Cryptographic File System (TCFS) from the University of Salerno in Italy, and NCryptfs from Stony Brook University. Once configured, these file systems transparently encrypt and decrypt data stored on them based on keys assigned and maintained at a user level. Since these file systems encrypt anything passed to them and decrypt anything read from them, including all internal file structures as well as the data requested, application performance is a significant concern.
  • Database: A number of database manufacturers have made encryption functions available as part of the database engine. IBM DB2® Universal Database™, for example, includes a scalar function to encrypt or decrypt data that can be embedded in SQL SELECT, INSERT, and UPDATE statements, allowing developers to build encryption into their applications. Similarly, Oracle provides the DBMS_OBFUSCATION_TOOLKIT, a PL/SQL package that gives developers access to various encryption algorithms and, as of the Oracle 10g release, the DBMS_CRYPTO toolkit, an updated and enhanced cryptographic package. Microsoft SQL Server 2005 includes several Transact SQL functions that can be used to add encryption to an application. In addition to native database functions, various third-party software companies market add-on utilities that support encryption at the database level.
  • Application: Probably the most common layer for applying encryption to static data is for the application software to call appropriate cryptographic functions. These may be library functions included with the development toolset or add-in libraries purchased from third-party suppliers. Implementing encryption at the application level allows developers to specify which fields are to be encrypted, thus limiting the performance impact of securing data with encryption.

 

UniData and UniVerse provide encryption services at the database/application level. A BASIC function, ENCRYPT, allows developers to encrypt and decrypt data using any of several cryptographically strong symmetric key ciphers. ENCRYPT supports either direct specification of keys or the ability to generate keys from a user-supplied pass phrase. Encrypted data is returned either as a binary string or, optionally, as a Base64 encoded string, which is useful for ensuring that the encrypted string does not accidentally contain system delimiter characters (a topic that is discussed in more detail later in this article).

ENCRYPT calls the OpenSSL cryptographic library. Algorithms supported by the currently embedded version of OpenSSL (0.9.6e) include:

  • Blowfish: Blowfish is a block cipher created by Bruce Schneier, a computer security consultant and author of the standard reference Applied Cryptography. Blowfish is the fastest block cipher supported by OpenSSL. While the Blowfish cipher supports keys of up to 448 bits in length, OpenSSL implements 128-bit keys and a block size of 64 bits with Blowfish.
  • CAST: CAST is a block cipher authored by Carlisle Adams and Stafford Tavares. The cipher supports variable length keys, although the OpenSSL implementation of CAST is restricted to 128-bit keys. CAST is perhaps best known as the default symmetric key algorithm in current versions of the PGP e-mail security program. CAST is patented by Entrust Technologies, though Entrust makes it available royalty-free for commercial or non-commercial use.
  • DES: DES, the Data Encryption Standard, is probably the most commonly used symmetric key cipher in the world. Compared to other algorithms, however, it is comparatively slow and has the significant drawback of only supporting 56-bit keys (a 64-bit key, less one "parity" bit for each byte). For these reasons, it is generally being superseded by newer algorithms, such as 3DES or AES.
  • DESX: DESX (extended DES) was created by the cryptographic security firm RSA Security as a modification of the standard DES algorithm that supports additional key material, making the algorithm more resistant to brute force attacks than standard DES. Unlike the other block cipher algorithms supported by OpenSSL, DESX only operates in CBC mode.
  • 3DES: Triple DES (usually abbreviated 3DES or DES3) uses the same algorithm as DES, but increases the algorithm's security by employing longer keys of up to 192 bits (168 bits of effective length once the "parity" bits are removed) and iterative calls of the DES algorithm. The 192-bit key is split into three 64/56-bit keys, and the plaintext is encrypted with the first key, "decrypted" with the second, and then encrypted with the third key, passing it through DES encryption three times. Assuming that none of the subkeys are identical, the effect is to encrypt the data using standard DES three times. Optionally, users can supply a double length key (128 bits reduced to 112 bits of effective length) that is split into two DES keys; data is then encrypted with the first key, decrypted with the second, and then encrypted with the first key again. 3DES is considered very secure, but very slow (obviously it takes roughly three times as long as DES itself, which is already considered to be a slow algorithm).
  • RC2: RC2 (the "RC" stands for "Rivest Cipher" or "Ron's Cipher", after its designer, Ronald Rivest of MIT and RSA Security) is a block cipher supporting keys of up to 128 bits in length. In OpenSSL, RC2 supports keys of 128, 64, or 40 bits (64- and 40-bit keys in CBC mode only). The IBM DB2 scalar encryption and decryption functions use RC2 as the encryption algorithm.
  • RC4: RC4 is a stream cipher from RSA Security. It supports keys of up to 128 bits. In OpenSSL, RC4 can be called with a key length of 128, 64, or 40 bits. As stream ciphers work on streams of bits, rather than on blocks of data, there is only one mode in which RC4 can be called.

 

All of the block cipher algorithms (with the exceptions of DESX, RC2-64, and RC2-40 as noted above) support ECB, CBC, CFB, and OFB modes. Note that current U2 documentation only references DES, 3DES, RC2, RC4, and RC5. For details on the other algorithms, see the OpenSSL documentation available at www.openssl.org or a reference such as Network Security with OpenSSL by Viega, Messier, and Chandra. (See Resources.)

The OpenSSL libraries embedded in UniData were upgraded to OpenSSL, Version 0.9.7 at the UniData 7.1 release, and will be upgraded at the UniVerse 10.2 release. This upgrade adds support for the AES (Advanced Encryption Standard) symmetric key cipher to those accessible from ENCRYPT. AES -- also known as the Rijndael (usually pronounced something like "rhine doll") algorithm, after its developers Joan Daemen and Vincent Rijmen -- is rapidly becoming the most popular symmetric key cipher used to develop new applications. The AES/Rijndael algorithm is recognized by the US National Institute of Standards and Technology as a Federal Information Processing Standard (FIPS 197). OpenSSL 0.9.7 supports AES in ECB and CBC modes only, with key sizes of 128, 192, or 256 bits; block sizes in AES are matched to key size (in other words, invoking AES with 256-bit keys sets the encryption block size to 256 bits as well).

 

Back to top

The ENCRYPT function

The format of the ENCRYPT function is:

ENCRYPT(Algorithm, Action, Data, DataLoc, Key, KeyLoc, KeyAction, Salt, IV, Result, ResultLoc)

The function arguments are defined in Table 1:

 

Table 1. Arguments for the ENCRYPT function

 

Argument Input/Output Description
Algorithm Input A string indicating which cipher algorithm to use
Action Input 1 - Encrypt
2 - Encrypt, the Base64 encode
3 - Decrypt
4 - Base64 decode, then decrypt
Data Input Either the data to be processed or the name of a file to be processed
DataLoc Input 1 - Data supplied as a string
2 - Data supplied as a filename
Key Input Either the actual key or pass phrase, or the name of a file containing the key or pass phrase
KeyLoc Input 1 - Key supplied as string
2 - Key supplied as filename
KeyAction Input 1 - Use the supplied key
2 - Derive key from pass phrase
Salt Input String containing the salt value
IV Input String containing the IV value
Result Output The resulting data or the name of a file storing the result
ResultLoc Input 1 - Return result as string
2 - Return result to file

  

The strings used to specify the different cipher algorithms are given in the product documentation. These strings are not case sensitive. The default mode for any of the block ciphers is Cipher Block Chaining (CBC).

Base64 encoding is a technique for mapping binary data onto a printable character set. Base64 encoding uses a 65 character subset of ASCII, encompassing the upper and lower case letters, the numeric digits, and a few punctuation characters. The Base64 encode/decode options are useful to protect against accidentally generated system delimiter characters. Since the string being returned from the ENCRYPT function could potentially contain any binary character, it is possible to return a ciphertext containing a system delimiter where one did not previously exist. If the function is being used to encrypt one field in the middle of a record, the insertion of an "accidental" field mark into the record will throw off the field count, making consistent retrieval of fields following the error impossible. While other methods could be used to protect against this error (by converting the ciphertext to an ASCII hexadecimal representation, for example), the Base64 encode option is both an easy and efficient means of avoiding the problem.

One caution about using Base64 encoding is that the encoded string will be returned with the line feed character (x'0A') appended to the result and, if the returned string is longer than 65 bytes, line feed characters will be embedded in the string every 65 bytes. These line feed characters can affect how the encrypted data is stored or displayed. Developers may want to either strip out the line feeds or replace them with another character that is outside of the Base64 character set. The line feed characters should be placed back in the ciphertext string before decoding and decrypting the data to avoid errors.

Encryption and decryption can either be performed on a string supplied to the function or on a file. The file option is likely only appropriate for specialized applications where entire data files will be sent to another location, as encrypting a U2 file essentially renders it inaccessible to the database since its internal structure is scrambled by the encryption. Accordingly, it is recommended that the DataLoc and ResultLoc options be set to process and return strings rather than files.

Salt is an optional parameter used to protect against certain kinds of cryptanalytic attacks. If no Salt is specified, the function will generate one internally. If specified, it must be in hexadecimal format.

IV (Initialization Vector) is used to initialize certain block cipher modes and may be required for these modes if an actual key is supplied. If the key is being derived from a pass phrase, the IV can be generated automatically. As with Salt, IV must be specified in hexadecimal format.

Generating a suitable key is one of the thornier problems associated with encryption. Keys should be generated as random binary strings, making them obviously difficult to remember. Accordingly, it is probably more common for applications to supply a pass phrase to the ENCRYPT function and have the function internally generate the actual encryption key. Pass phrases are sentences or strings of words that the cryptographic function converts into a suitable key through a process called "key crunching." The advantage of pass phrases is that instead of trying to remember a cryptographic key like "e6cl43985ae90a9b," the user needs only remember a pass phrase like

My name is Ozymandias, King of Kings: Look on my works, ye Mighty, and despair.

which is much easier to memorize. This example, taken from Schneier's Advanced Cryptography, uses two lines from Percy Bysshe Shelley's sonnet "Ozymandias" as the pass phrase. As Schneier points out, this phrase is not a good choice as a pass phrase, as Shelley's complete works are available online and thus accessible to a dictionary attack on the key.

The following is a sample code segment that encrypts and encodes a test string (note that the pass phrase in this example is obviously not strong enough to insure adequate security):

 

Listing 1. Call the ENCRYPT function

 

Plaintext  = "THIS IS PLAINTEXT"      ;* Source to encrypt.
Algorithm  = "DES"                    ;* Use DES CBC algorithm.
Action     = 2                        ;* Encrypt, then encode.
DataLoc    = 1                        ;* Data in string.
Key        = "THIS IS A PASSPHRASE"   ;* Key (as pass phrase).
KeyLoc     = 1                        ;* Key in string.
KeyAction  = 2                        ;* Derive key from pass phrase.
Salt       = ''                       ;* Default Salt.
IV         = ''                       ;* Default IV.
Result     = ''                       ;* Initialize Result.
ResultLoc  = 1                        ;* Return as string.

ReturnCode = ENCRYPT(Algorithm,Action,Plaintext,DataLoc,Key,KeyLoc,KeyAction,
             Salt,IV,Result,ResultLoc)

 

If the function call is successful, the value returned by the function (through the variable ReturnCode in this example) will be zero. A non-zero result indicates that an error occurred during the encryption or decryption process; error codes are listed in the documentation for the ENCRYPT function.

Back to top

 


Implementing encryption in U2 applications

Implementing SSL for the supported U2 interfaces is thoroughly described in the appropriate manuals and will not be otherwise addressed here. There are, however, a number of issues related to implementing symmetric key ciphers to protect data at rest that should be discussed at some length.

What should be encrypted?

Adding encryption to a database application will almost certainly decrease performance, at least minimally. Data must be encrypted before being written to disk and unencrypted every time they are read. Note that this includes operations such as SELECT or LIST statements when a selection clause references an encrypted field. It is probably advisable to only encrypt data that requires special protection. For example, it may be sufficient to encrypt a customer's credit card information while leaving company name, address, contact name, and so on unencrypted.

Encrypted data also generally increases storage requirements -- another reason for restricting the amount of data to be encrypted. Since block ciphers work on several bytes of data at a time, the binary encrypted string is typically up to one block longer than the source string. Base64 encoding will increase this overhead significantly; a rough estimate would be to assume inflation in storage requirements by about 33% for encrypted and encoded data. Be especially cautious before encrypting data in files close to maximum file size.

Making encryption "transparent" to the application

Encrypted data will likely not only need to be accessed from BASIC programs, but from query languages (UniQuery, RetrieVe, SQL) and SQL-based APIs (ODBC, OLE DB, JDBC) as well. Accordingly, encryption needs to be implemented in such a manner as to insure that data retrieved from any application utility will decrypt data as needed.

An obvious solution to this problem is to code the encryption/decryption function as a subroutine, which can be called either from BASIC or from a dictionary record. A particularly elegant means of doing this, suggested by Joel Yates of the Rocket U2 technical support group, is to use a construct called a "user exit" to implement the encryption/decryption routine. User exits are included in UniData and UniVerse to provide backward compatibility with older Pick applications. The beauty of this technique is that the encrypt/decrypt subroutine is easily referenced from any of the above utilities (BASIC, UniQuery, RetrieVe, or SQL), essentially in the same manner as a conversion code. Calling the encryption/decryption routine becomes as simple as converting dates to and from internal or external format.

For example, let's say that the encryption/decryption routine is coded as the user exit UEC00 (user exit names are always "U" followed by four hexadecimal digits). Assuming the subroutine was written so that "input" conversion does encryption and "output" conversion does decryption, data (in this example, Social Security number) would be encrypted with a statement like

SSN = ICONV(SSN,'UEC00') ;* Encrypt SSN.

Decrypting data in BASIC would call the user exit with the OCONV function:

SSN = OCONV(SSN,'UEC00') ;* Decrypt SSN.

Listing encrypted data as plaintext using UniQuery, RetrieVe, or SQL requires a dictionary record that calls the user exit as a conversion code (assume SSN is field 5 in this example):

 

Listing 2. Dictionary record referencing a user exit

 

SSN
0001 D
0002 5
0003 UEC00
0004 Social Security Number
0005 11L
0006 S

 

For simple applications not requiring role-based access control and implementing only a single encryption key, a single user exit could be used for all encryption and decryption functions. More sophisticated applications may require multiple user exits to implement different functions.

Applications using SQL or SQL-based APIs to update data in U2 databases will need to use a different technique for encrypting data. Since U2 SQL relies upon dictionary records to determine what data is accessible at the column level, updating U2 data with SQL requires that the dictionary reference an "actual" (as opposed to a "virtual") field. Virtual fields, such as fields referencing BASIC subroutines, cannot be updated using SQL, as the database cannot reverse engineer the retrieval process. One possible solution would be to employ a file trigger to encrypt data (imposing encryption automatically, regardless of how the data were written to disk) and using a user exit to decrypt data. A particular caution: do not "hard code" the encryption key as a literal in the encryption/decryption subroutine, as it could easily be recovered by someone who had access to the source or object code.

Documentation on how to write and call user exits can be found in the Using UniData (UniData) and Guide for Pick Users (UniVerse) manuals.

Functionality restrictions on encrypted fields

Encrypting data can affect how data is processed. For example, it is generally not a good idea to build an index on an encrypted field. Depending upon how the index is built, it could hold either the plaintext value of the field or the ciphertext value. Obviously, if the index contains the plaintext values, the security of the data is compromised. (What is the point of encrypting SSN in the employee file if the SSN is stored as plaintext in an index?) On the other hand, if the index stores the encrypted value, it is much more difficult to use since the application would have to query the index using encrypted values. The next value/previous value functionality of the B-tree would be lost, as the index ordering would be based on the ciphertext value, not the plaintext value.

Similarly, encrypting a key field is not advisable. SELECT processing could be affected; translates definitely would be affected. Encrypting long keys could potentially exceed key length restrictions. In applications where a key field contains sensitive data (SSN, for example), it is probably better to redesign the file to store the data in an encrypted field in the record and to use a different value for the key.

A third problem area is partial match searches (for example, a command such as SELECT EMPLOYEE WITH NAME LIKE SM...). There is simply no way to perform a partial match search on an encrypted string. Unlike a complete value match, where the value to search for could be encrypted and the encrypted value used in the SELECT statement, there is no way to encrypt a partial match string. The values "SMITH", "SMYTH", and "SM" will return three very different encrypted results.

Choosing an algorithm

The most obvious concern in selecting a symmetric key algorithm is security. The protection provided by the cipher should be appropriate for the application for which it is being used. Given that a specific cipher has no major design flaws, key length is the primary determinant of security. In general, longer keys are more secure (in other words, more resistant to attack) than shorter ones. This does not mean, however, that one should merely select the algorithm that supports the longest key -- a particular application may not require the highest level of security. But key length should be a consideration in choosing which cipher to use.

Many governments regulate the use of strong encryption algorithms. The Bureau of Industry and Security (BIS) of the US Department of Commerce, for example, must approve the inclusion of strong cryptographic functions in software that will be exported outside of the United States. This may mean that only certain algorithms can legally be used or that key length must be restricted. Before selecting an algorithm, check that its use is legal in all jurisdictions where the software will be installed.

Some of the algorithms supported by the OpenSSL library are patented or otherwise protected and may need to be licensed for use. The IDEA algorithm, for example, is patented and must be licensed for commercial and noncommercial use in the US, Japan, and most European countries. RSA Security is generally protective of its algorithms, which are normally either patented or considered trade secrets by the company, so it is advisable that developers check with RSA before using RC2, RC4, or RC5. It is probably good practice to check on licensing issues before incorporating any encryption algorithm into your application.

If encrypted data is to be exchanged with other applications, compatibility may be an issue. Legacy applications often employ standard DES, for example, dictating its choice for new applications that must exchange data with the legacy system. Customers, vendors, or government regulators may dictate that only certain algorithms, encryption modes, or key lengths be used.

With certain applications, the speed of the algorithm may be an issue. As mentioned above, DES is considered to be a slow algorithm; 3DES is even slower. However, since most database applications of encryption should be working on relatively small amounts of data, raw performance may not be a significant factor affecting the choice of an algorithm. Testing different candidate algorithms should help determine whether performance need be a consideration. Schneier provides a table showing relative encryption speeds of various block cipher algorithms in his Applied Cryptography. There is also a benchmark command built into the OpenSSL executable that can be used to test relative performance of various ciphers.

Key management

Attackers know that the simplest way to break into a computer system is not to "crack" passwords or access codes. The easiest way is to steal them -- read them off the yellow sticky note pasted on a terminal or stuck inside a desk drawer, for example, or trick someone into giving them to you.

Similarly, the easiest way to compromise the security of a modern cryptographic cipher is to steal the key. Used properly -- with a key of sufficient length and an encryption mode that doesn't leak information about the plaintext -- most modern algorithms are reasonably safe against cryptanalytic attacks. Accordingly, most attacks are directed at trying to discover the key. Once the key is obtained, even if the algorithm isn't known, each of the potential algorithms can be tried in turn on a sample of ciphertext to determine the combination used to encrypt the plaintext.

"Key Management" is the term used to encompass the various actions related to the generation and protection of cryptographic keys. Specifically, key management concerns the generation of appropriate keys, the secure distribution of keys to users, the periodic cycling of keys, and the disposition of expired keys. Key management is actually a much more difficult, and hence more important, problem than selecting which encryption algorithm to use. Consider a system in which all information is transparently encrypted and decrypted by the database. An attacker with access to a complete backup of the system could restore that backup, activate the database, and immediately have full access to all encrypted data, without having to actually figure out either the key or the algorithm.

While a complete discussion of key management is well beyond the scope of this article, let's examine some of the problems that must be addressed:

  • Key generation: Just as you shouldn't use something like your name for a password, good cryptographic keys (or pass phrases) should be difficult to guess. Actual keys, in fact, should be random strings of bits to insure that an attacker cannot improve a brute force or dictionary attack by relying upon a limited domain of possible keys. Pass phrases should not only be difficult to guess, but quite lengthy. Schneier suggests a general rule of five English words for every four bytes of key, meaning you would need a phrase containing some 30 words to generate an appropriate 192-bit key for 3DES.
  • Key distribution: Keys must be distributed to users in a secure manner. Don't post the keys on a piece of paper in the computer room or, worse, tack them on the wall of the system administrator's cubicle. That is the equivalent of the yellow sticky note problem, nearly as bad is storing them as plaintext in a file on the system named something like KEYFILE. Yates suggests that U2 developers may want to have users input keys at login, then load these keys into a named COMMON block, so they are not landed on disk. Alternatively, a key hierarchy may be used to protect keys stored on disk. In the simplest implementation, a master key is used to protect the actual keys used for encryption and decryption, called the working keys. An attacker who obtains the file containing the working keys will not be able to use them to decrypt the ciphertext, as the keys are themselves encrypted by the master key.
  • Cycling keys: Keys should periodically be changed. The longer a particular key is in use, the greater the likelihood that it will be compromised. Note that changing an encryption key may require a significant maintenance window; all existing encrypted data will require decryption with the old key and encryption with the new key. If done on a phased basis, there will need to be some way for programs to know whether data in a particular record have been converted to the new key or not.
  • Disposition of keys: Keys no longer in use should be disposed of in a secure manner. One obvious reason is that an attacker who has possession of data encrypted with an older key (for example, by stealing an old backup tape) could potentially use a discarded key to decrypt the data. Also consider the inverse of this issue -- if old keys are forgotten and destroyed, you won't be able to retrieve the encrypted data on backup or history tapes.

 

As should by now be obvious, appropriate key management procedures should be part of the core design of any encryption implementation. Developers may want to include a computer security expert as part of the design team for implementing an enterprise encryption scheme.

Back to top

 


Summary

Encryption is not a panacea for resolving data security problems. It is, however, a valuable tool to employ as part of a comprehensive data security solution. Rocket U2's premier extended relational databases, UniData and UniVerse, include several powerful cryptographic tools as part of the native database functionality, enabling developers to build robust data encryption into their U2 applications.

Back to top

Resources

Learn

 

  • Cryptography: A Very Short Introduction (Oxford University Press, 2002) by Fred Piper and Sean Murphy: Provides a succinct and clear introduction to modern cryptography and cryptographic principles.

  • Bruce Schneier's Applied Cryptography (John Wiley and Sons, 1996): The standard reference on cryptography for application developers. Schneier's essay "Why Cryptography Is Harder Than It Looks" provides an interesting discussion about the difficulties of implementing a cryptographic solution.

  • OpenSSL.org: The home page of the OpenSSL Project. Developers can find news and documentation related to OpenSSL as well as access the OpenSSL mailing lists.

  • Network Security with OpenSSL (O'Reilly, 2002) by John Viega, Matt Messier, and Pravir Chandra: An excellent resource on the OpenSSL package.

  • The US government Computer Security Resource Center provides information on the Advanced Encryption Standard algorithm, including a link to the FIPS 197 standards document.

  • The US Department of Commerce Bureau of Industry and Security has information on the legal aspects of using cryptography in software.

About the author

 

Bill Winter provides technical sales support for the Rocket U2 product line to customers in the western USA and Canada. Bill joined Vmark Software in 1996, handling technical sales support for UniVerse, and expanded his role to the entire U2 line of products when Vmark and Unidata, Inc. merged to form Ardent Software in 1997. He has over 30 years of experience in the MultiValue database industry, primarily in pre- and post-sales technical support. Bill holds a Bachelor of Science degree in psychology from the University of Puget Sound.

Document Actions