Shostack + Friends Blog Archive

 

Bad advice on SSNs

Bad advice on use of social security numbers abounds, often in technical documentation. Credit goes to reader Jonathan Conway for digging many these out.
There are a few very common errors which we can find, thank to Jonathan’s research:

  1. Social security numbers are un-changing. No, they are not. Victims of identity theft, domestic abuse, or other crimes can get new numbers. See here.
  2. Each person only has one social security number. If a person can get a new social security number, and databases are not designed with that in mind, it is entirely possible that someone has two numbers, and that those numbers are different in different databases. Given the security reasons for new numbers, you should never link numbers together.

Other reasons SSNs make bad database keys include

  1. The lack of a checksum/check digit makes it easy to mis-enter a SSN. If you do this, given that there are 300 million Americans, and a billion possible numbers, you have a roughly 1 in 3 chance of entering an issued SSN for someone else.
  2. The difficulty of validating a provided SSN if it’s not provided for employment purposes.
  3. Privacy issues. The SSN should be protected.
  4. Federal laws which apply to the private sector include GLB (summary) and HIPAA (summary). The FTC has enforced GLB, and HIPAA enforcement is probably coming soon.
  5. State laws such as California’s SB1386 and AB1950 (PDF) (summary). California law also allows a customer to request that your ongoing use of a SSN be changed, if your use is in violation of these new laws. While these only apply to California today, I expect these laws to spread, and they are centered not on businesses, but on California residents.
  6. SSNs are under the control of an external entity, and subject to change. Do you want your database keys under someone else’s control?
  7. Most people in the world are not Americans, and don’t have a social security number. Some other countries’ national ID number may also be 9 digits, and will likely overlap.

The final sort of issue is a more subtle one. It is the casual use of the SSN in samples, examples, and documentation, with no mention that this is, at best, a questionable idea. Such uses lead to discussions of the form: “Well, I was just copying the sample code. If it’s a bad idea, why is it in the docs?”

Some examples of advice that should be revised from vendors, after the jump.

  • Oracle

    In Oracle 9i daily features:

    For example, a type inheritance hierarchy with PERSON_T and STUDENT_T types can be created in the database as follows,
    CREATE TYPE Person_T (SSN NUMBER,

    or the sample code:

    This model has a Person object which is the super type with attributes which are common to all persons such as SSN, Name, Dateofbirth, sex and Address.

    Neither page calls out the issues with SSNs.
    In an explicitly international example, this Powerpoint claims that all countries have social security numbers. It never explains why they are collected, or how they are used.

    To be fair to Oracle, newer documentation such as this security manual do discuss the SSN as sensitive information, but that approach needs to spread through the company.

    However, in Using Oracle HRMS: The Fundamentals, we find:

    Select the method of creating identifying numbers for employees
    and applicants. The choices are: ” Automatic number generation ”
    Manual entry ” Automatic use of the national identifier (for
    example, the social security number in the US, and the NI number
    in the UK). This option is available for employees only.

    I believe that use of SSN as an employee number is forbidden in California under SB 168.

  • Sybase

    In this Transact-SQL User’s Guide, Sybase notices that duplicate SSNs could exist, but give no advice on solving it:

    On the other hand, a unique index on a column holding social
    security numbers is a good idea. Uniqueness is a characteristic of
    the data–each person has a different social security
    number. Furthermore, a unique index serves as an integrity
    check. For instance, a duplicate social security number probably
    reflects some kind of error in data entry or on the part of the
    government.

    In the System Management Guide Clinical Gateway 2.3, the schema’s storage of SSN and drivers license number passes without comment. The book has no mention of the word privacy. It does return [link to http://sybooks.sybase.com:80/onlinebooks/group-cg/cgg0230e/@Generic__CollectionView?DwebQuery=privacy no longer works] hits for HIPAA [link to http://sybooks.sybase.com:80/onlinebooks/group-cg/cgg0230e/@Generic__CollectionView?DwebQuery=hipaa no longer works], but the links don’t seem to take me anywhere useful.

    Another careless and carefree examples using social security numbers is here [link to http://sybooks.sybase.com/onlinebooks/group-as/asg1251e/sprocs/@Generic__BookTextView/8068 no longer works].

  • Informix

    There are very few hits, all are upwards for 4 years old. It seems that IBM is effectively purging the meme. (Same for DB2. I’ll do another post shortly with more on good advice.)

    (Interestingly, Google, given a query such as informix ssn site:www-306.ibm.com will highlight the words social security, even if SSN does not appear in the text.)

  • HP’s ALLBASE/SQL Database Administration Guide
    Chapter 2. Logical Design suggests:

    Another consideration is that the key must be unique. It can
    either be a unique single column value or a unique combination. In
    addition to being unique, a hash key should be non-volatile, that
    is, not subject to frequent update. Since you cannot use the
    UPDATE statement with a hash key column, you must do a DELETE
    followed by an INSERT when a key modification is necessary.

    An integer key such as a social security number is ideal.

However, vendors aren’t the only ones to blame:

  • O’Reilly’s OnRamp
    The “Key” to Good SQL by John Paul Ashenfelter

    The only way to make sure that each record in a given database
    table has a unique value is to designate a database field to
    contain a value that is unique across all of the records in that
    table. In some cases, you may choose an existing field in the
    database which you are guaranteed will be unique — a social
    security number would work for a U.S. citizen and an ISBN would
    work for a book.

  • SQL Essentials book by David Gallardo

    This is commonly the case, especially when we use an
    identification number of some kind to identify each record
    uniquely. In some cases, the items in the tables have a unique
    number already associated with them that make a good key—for
    individuals in the United States, a Social Security number is
    sometimes used in this way. (From the sample chapter online)


  • Illinois State University
    , University Policies, Procedures, and Guidelines

    8.2.3 USE OF SOCIAL SECURITY NUMBERS BY ILLINOIS STATE UNIVERSITY
    Additionally, the social security number is widely used as a
    “guaranteed ID” between agencies, such as other higher education
    institutions, test services, Illinois State University’s
    Retirement System, Central Management Services, and criminal
    records. By the spring of 1972, Illinois State University had
    identified the social security number as the internal individual
    unique identifier for all person-related databases.

    [Major motivation, but a stupid one and it shows how long the
    problem’s history is.]