Data Warehouse Object Naming Guidelines
It seems like a trial task, but standars naming convention are important. Good names for tables, columns and other database objects are particularly important for ad hoc users of the Data Warehouse (DW) & Business Intelligence (BI) system who need to find the objects they’re looking for. Object names should be oriented to the business users, not the data engineers and technical staff (e.g. data developers, data analyst etc). Consistency in naming is hugely valuable to your users.
A business name is an English phrase with a specific construction and length that describes a single data object (e.g., table, column name, etc.). Each business name comprises one or more prime words, optional modifying words, and one class word. It cannot exceed 100 characters in length. Systems developers assist end users in the construction of meaningful business names.
Business names should meet the following guidelines
- Singular for nouns and present tense for verbs
- As meaningful as possible
- Easily distinguishable
Abbreviations are not used in business names with few exceptions. One reason to do so is to limit the length of a business name, and then only if the abbreviation is for a universally known acronym or abbreviation such as NASA or IMF or IRS.
The class word is the highest level of qualification and the most important word in a business name. The class word is always the last word of a business name. It must be a noun identifying the general purpose of the data object. Data object types TABLE, VIEW, FACT_TABLE, etc., are also class words. Example class words for columns are NUMBER, NAME, TEXT, and CODE. A class word list is maintained in the naming repository. A class “word” may be a phrase like SQUARE_FEET or FACT_TABLE.
The class word list is carefully constructed and seldom extended, but entries can be added if there is a compelling reason to do so based on changes in the data environment.
A prime word can be a single word, or a phrase such as CAPITAL_ASSET. It is the most important modifier of the class word. It identifies the application area, major data category, table, or view, depending on the data object being named. Approximately 300 prime words are available to categorize institutional data. Some example prime words are ACCOUNT, COURSE, FINANCE, ORGANIZATION, SECTION, and STUDENT.
Modifying words are used to add important business information to a business name. Thus, addition of the modifying word PHONE to the business name CUSTOMER_NUMBER forms CUSTOMER_PHONE_NUMBER. Similarly, addition of the modifying word LAST to the business name STUDENT_NAME forms STUDENT_LAST_NAME. Modifying words can be any word or phrase needed to adequately describe a data object.
Occasionally a data object clearly belongs to more than one data category. In such cases, the business name should include multiple prime words. For example, the data element GRADE refers to the score received by a STUDENT (prime word #1) for work completed in COURSE (prime word #2). The modifier GRADE describes neither COURSE nor STUDENT, but the relationship between STUDENT and COURSE. Thus, an appropriate business name would be STUDENT_COURSE_GRADE, where STUDENT and COURSE are prime words modifying the GRADE class word.
Table Object Naming Guidelines
There are several types of tables: for example, permanent tables, temporary table, session table, transient table. On domain side we have fact tables and dimension tables. All tables have the same formatting requirements for the business name and the same length requirement for the generated technical name.
Types of tables
- Table _T
- External Table _ET
- Dimension Table _D
- Fact Table _F
Length of table name
The maximum Oracle table name length is 30. The generated technical name for a table must be 27 characters or less to reserve space for index names built from the table technical name. This requirement exists in the EDSS Data Warehouse since the original complete technical table name must be present in the names of indexes for the table.
Table names should follow the format: APPLICATION_NAME, MODIFIER words or phrases, CLASS_WORD. For example:
STUDENT_LOAN_BORROWER_TABLE STUDENT_ADVISING_CONTACT_TABLE PURCHASING_COUNTY_DIMENSION_TABLE HUMAN_RESOURCES_PERSONAL_DATA_DIMENSION_TABLE
Views Object Naming Guidelines
The Naming application recognizes two types of views. The CLASS_WORD identifies the view type and is shown with the technical name abbreviation:
- VIEW _V
- MATERIALIZED_VIEW _MV
Length of view names
The generated technical view name is limited to 27 characters or less as for generated table names.
View Formatting requirements
View names should follow the format: APPLICATION_NAME, MODIFIER words or phrases, CLASS_WORD. For example:
Here are suggestions for naming objects in your data warehouse (DW)/ business intelligence (BI) system.
Follow naming conventions for data warehouse objects
If you don’t have them, create (and document!) naming conventions that follow the rules as per your organizational need. If your organization already has naming conventions, you may be faced with a problem: most existing naming conventions were developed for technical people. But names in the DW/BI environment should be oriented to the business users. They become row and column names in ad hoc analyses and predefined reports. We will return to this issue later.
Each object has one name
Let’s not perpetuate the confusion around data definitions that already exists in our organizations. It is not OK to say that the sales team can call a column Geography and the marketing group calls the same entity Region. If it’s the same column, with the same content, it has to have the same name. If you can’t get agreement organization-wide on object names, enlist the help of your business sponsor.
Object names are descriptive
Users should not need 20 years tenure at your organization to decipher what a name means. This rule forbids a lot of silliness, like RBVLSPCD (we have more than 8 characters to work with!). It also forbids column names like NAME, which is non-descriptive outside the context of the table you are examining.
Abbreviations and acronyms are discouraged
Abbreviations and acronyms are endemic in the corporate world, and the non-corporate world is even worse. A lot of information can be encoded in an acronym, but it places a huge burden on newcomers. The most effective approach is to maintain a list of approved abbreviations, and try not to add to them without a good reason. You may even want to document that reason in the list. Examples include:
For most busienss, a samll set of approved abbreviations are reasonable.
Datatabase Object names are pretty
Remember that object names become headers in reports and analyses. Although beauty is in the eye of the beholder, I find all caps to be particularly annoying. The object names should contain a visual clue for where the words end.
- Spaces: [Column Name]
- Camel case: ColumnName
- Underscore: Column_Name or COLUMN_NAME
I recommend using spaces (unless underlying data warehouse system prohibitit). They look the best when displayed in reports. However many developer community prefer to have all lower case with underscore with abbreviations.
Datatabase Object names are unique
If two objects are different, they should have different names. This rule forbids column names such as [City]. A better name is [Customer Mailing Address City]. This rule is especially important for ad hoc use of the DW/BI system. Although the context of [City] may be obvious during the analytic process, once that analysis is saved and shared, that context is lost. Are we looking at the customer’s city or the store’s city, the mailing address or the shipping address? Although we can’t prevent users from changing object names once they export to Excel, we don’t want to force them to do so in order to be clear.
Object names are not too long
This rule conflicts with rules 3, 4, and 6. If we have unique, unabbreviated, descriptive object names, the odds are that some column names will be very long. These are reasonably descriptive column names for someone in the insurance business, but what will happen when the user or report designer drags that column into the body of a report or analysis? The name will wrap unattractively, making the header row very fat. Or it will shrink down to a font so tiny that no one can read it. And then the user will rename the column, violating our key rule that each object has one name.
I try to limit column names to 30 characters, though sometimes I go to 35. In order to achieve this goal, I have to register more abbreviations or acronyms than I would like.