Agile Data

A UML Profile for Data Modeling

www.agiledata.org: Bringing data professionals and application developers together.

Scott W. Ambler

 

This essay is taken from Chapter 3 of Agile Database Techniques

IMPORTANT: This profile is open to the public.  If you would like to contribute to it please post suggestions on The Agile Data Mailing List.  We can define this profile as a community, we do not have to wait for the OMG to do this important work.  Having said that, my hope is that at some point this work will in fact be adopted by the OMG for official inclusion in the UML.

 

The short story is that at the time of this writing I don’t have the benefit of simply adopting an industry standard, something that Agile Modeling (AM)’s Apply Modeling Standards practice advises, so I’m forced to present my own solution here.  Unfortunately data modeling is not yet covered by the Unified Modeling Language (UML), even though persistence-related issues are clearly an important aspect of object-oriented software project.  For several years I have argued that the UML needs a data model (Ambler 1997, Ambler 1997b, Ambler 2001a, Ambler 2002a) and have vacillated between various ways that it should be done.  Other methodologists have argued the same (Naiburg and Maksimchuk 2001, Rational Corporation 2000, Muller 1999) because they too recognize the clear need for a data modeling profile.  Unfortunately we have all come up with slightly different modeling notations, a problem that the UML is supposed to address if my memory serves me correctly, although the Object Management Group (OMG) has yet to address this issue.  My hope is that it will.

This page summarizes the data modeling profile for UML Class Diagrams, that I apply in Agile Database Techniques.  First some important definitions:

This profile follows the philosophy of separating core notation, the 20% that you are likely to use in practice, from supplementary notation that isn’t as common although still needed in some situations.  The notation presented here isn’t perfect but I truly believe that it’s the best source available to you today.  Nor is this profile complete – for the most part it focuses on the physical modeling of a relational database, although it does cover other aspects of data modeling as needed.  This profile also strays into style issues, something that is not appropriate for a proper UML profile, issues that in my opinion are critical to successful modeling and this in my opinion is the best place to present them.

 

Table of Contents

 

1. How Do I Indicate The Type Of Model?

The type of model should be indicated either using the appropriate stereotype listed in Table 1 or simply as free form text in a UML note.  In the case of a physical data model the type of storage mechanism should be indicated with one of the stereotypes listed in Table 2 . 

 

Table 1. Stereotypes to Indicate Model Types (Core notation).

Stereotype

Model Type

<<Class Model>>

Object-oriented or object-relational model

<<Conceptual Data Model>>

Conceptual data model

<<Domain Model>> Domain model

<<Logical Data Model>>

Logical data model (LDM)

<<Physical Data Model>>

Physical data model (PDM)

 

Table 2. Stereotypes for Various Persistent Storage Mechanisms (Supplementary Notation).

Stereotype

Storage Mechanism Type

<<File>>

File

<<Hierarchical Database>>

Hierarchical database

<<Object-Oriented Database>>

Object-oriented database (OODB)

<<Object-Relational Database>>

Object-relational database (ORDB)

<<Network Database>>

Network database

<<Relational Database>>

Relational database (RDB)

<<XML Database>>

XML database

 

2. How Do I Model Tables, Entities, And Views?

Tables, entities, and views are all modeled using class boxes, as you see in Figure 1 and Figure 2, and the appropriate stereotypes are listed in Table 3.  Class boxes that appear on conceptual and logical data models are by definition entities so the stereotype is optional.  Similarly, on a physical data model for a relational database it is assumed that any class box without a stereotype is a table.  In Figure 2 you see that views have dependencies on the table structures.  

Indices, shown in Figure 2, are also modeled using class boxes.  They are optionally dependent on either the table for which they are an index or on the actual columns that make up the index (this is more accurate although can be more complex to depict when the index implements a composite key).  In the model you see that IEmployee1 is dependent on the Employee_POID column whereas IEmployee2 is dependent on just the table, requiring you to list the columns for the index when you follow this style.  As you can see the notation used for IEmployee2 is wordier but less clumsy – if you’re going to model indices this should your preference with respect to style issues.

 

Figure 1. A logical data model.

Figure 2. A physical data model for a relational database.

 

Table 3. Stereotypes for Classes.

Stereotype

Diagram Type

Core Notation

Application

Style Issues

<<Associative Table>>

Physical

Yes

Apply this to associative tables in a PDM for a relational database.

 

<<Entity>>

Logical, Conceptual

No

Optional notation that is implied by the model type.

The stereotypes for LDMs and conceptual DMs on a diagram implies that all class boxes on the diagram are entities unless otherwise marked.

<<Index>>

Physical

No

Apply this when you are modeling an index that implements a table key within a relational database.  Doing so indicates a dependency from the index to the table or to the key column(s) that the index implements.

Indices are implied by keys, so you might not want to invest the time to model the index in the first place. 
<<Lookup Table>> Physical No Apply to tables that are used to store simple "lookup" lists.   Just because you are using a table for lookup values does not imply that everyone uses it that way.  Therefore you may not wish to mark the table with this stereotype as it may confuse people.

<<Stored Procedures>>

Physical

Yes

Apply this to a class that contains only the operation signatures for the stored procedures of the database.

 

<<Table>>

Physical

No

Optional notation that is implied by the model type.

The stereotype for PDMs on a diagram implies that all class boxes on the diagram are tables unless otherwise marked.

<<View>>

Physical

Yes

Apply this when you are modeling a view to a table.  Indicate a dependency to each table involved in the definition of the view.

 

 

3. How Do I Model Relationships?

Relationships are modeled using the notation for associations as you can see in Figure 1 and Figure 2.  Standard multiplicity (e.g. 0..1, 1..*, and 2..5) notation may be applied, as can roles.  Table 4 lists the potential stereotypes that you may apply to relationships, some of which have a common visual representation as well as a textual one.  In general I prefer to apply the visual stereotype over the textual one.

The notation for qualifiers shouldn’t be used.  Although it would be a valid option to model foreign keys in practice this often proves confusing when a single table is involved in many relationships. 

 

Table 4. Stereotypes for Associations.

Stereotype

Visual Stereotype

Diagram Type

Core Notation

Application

Style Issues

<<Subtype>>

Inheritance arrow

All

Yes

Indicate subtype/supertype or inheritance relationships between two entities.

 

<<Aggregation>>

Hollow diamond

All

No

Indicate an aggregation relationship between two entities.

Aggregation is not supported in UML 2.0.  I suspect that it will be reintroduced in a future version.

<<Composition>>

Filled diamond

All

No

Indicate a composition relationship between two entities.

 

<<Dependency>>

Dashed line with open arrowhead

Physical

Yes

Indicate a dependency of a view or index on the schema of a table.

I would model the dependency from the view/index to the table(s) it is dependent on.  I would not model the dependency at the column level, even though that is truly where it is, because your diagrams would become cluttered very quickly.

<<Identifying>>

None

Physical

No

Indicate an identifying relationship between two dependent tables (the child table cannot exist without the parent table).

Indicating whether a relationship is identifying or not really isn't all that useful in practice.

<<Uni-directional>>

Open arrowhead

All

No

Indicate that the relationship between two entities should only be traversed in a single direction.

 

<<Non-Identifying>>

None

Physical

No

Indicate a non-identifying relationship between two independent tables.

 

 

4. How Do I Model Data Attributes and Columns?

Data attributes on conceptual and logical data models, as well as columns on physical data models, are modeled using the standard attribute notation.  It is optional to model the type of an attribute on a conceptual or logical data model although in practice this is often done.  Stylistically, if the model is being used to model data requirements then the type should be indicated only when it is an actual requirement.  For example, if a customer number must be alphanumeric then indicate it as such, otherwise if it is optional how this attribute is implemented then do not indicate the type. 

Constraints, such as a column being not null, should be modeled using normal UML constraints (see below).

The notation for visibility shouldn’t be used – the assumption is that the data is publicly accessible.  Although visibility symbols could be used to indicate the need to indicate access control this is better done using constraints.

 

 

5. How Do I Model Keys?

In my opinion, the modeling of keys is the the most complicated issue addressed by this profile.  This is for several reasons:

As you can see in Figure 3 the notation for indicating keys can get quite complex.  Minimally, you should mark the attribute or column with one of the key-oriented stereotypes in Table 5 .  Although I would normally prefer stereotypes such as <<Primary Key>> over <<PK>>, I chose the abbreviated version because it reflects existing norms within the data community for indicating keys.  Furthermore, because some columns can be involved with several keys the longer form of the stereotype would become cumbersome.

It is optional to model the detailed information pertaining to keys using UML tagged values (described in in Table 6 ).  For example, in Figure 3 you see that:

In Figure 2 I indicated that Employee_POID is a surrogate key to provide an example of how to do this (had it been a natural key I would have applied the stereotype <<Natural>> instead).  I generally prefer to indicate whether a key auto generated, natural, or surrogate in the documentation instead of on the diagrams – this is an option for you although in my opinion this sort of information adds to much clutter.

 

Figure 3. Modeling keys, constraints, and behaviors on a physical data model.

Table 5. Stereotypes for columns.

Stereotype

Diagram Type

Core Notation

Application

Style Issues

<<AK>>

Physical

Yes

Indicates that a column is part of an alternate key, also known as a secondary key, for a table.

 

<<Auto Generated>>

Physical

No

Indicates that the column value is automatically generated by the database.

This is interesting information, but I don't think I'd clutter the diagram with it.

<<CK>>

Conceptual, Logical

Yes

Indicates that an attribute is part of a candidate key for an entity.

 
<<Column>> Physical No Indicates that an attribute is a column. Completely redundant information, I wouldn't even consider modeling this.

<<FK>>

Physical

Yes

Indicates that a column is part of a foreign key to another table.

 

<<Natural>>

All

No

Indicates that an attribute or column is part of a natural key.

Interesting information, but don't clutter your diagram with it.
<<Not Null>> Physical Yes Indicates at a column may not have null values.  
<<Nullable>> Physical Yes Indicates that a column can have null values.  

<<PK>>

Physical

Yes

Indicates that a column is part of a primary key for a table.

 

<<Surrogate>>

Physical

No

Indicates that a column is a surrogate key.

Interesting information, but don't clutter your diagram with it.
<<Unique Identifier>> Conceptual, Logical No Indicates that an attribute is part of a unique identifier for an entity.  Effectively an alternative to <<CK>>. Perfer <<CK>> over this stereotype.

 

Table 6. Tagged values for modeling keys (supplementary notation).

Value

Application

Examples

Style Issues

key

Indicate which candidate or alternate key an attribute/column belongs to.  When the column is part of several keys, for example it is part of two different foreign keys, then you need to indicate which one you are referring to.  In the second example the column is part of the third alternate key.

key = FK

key = AK-3

Only indicate this when the column is part of more than one key.

order

Indicate the order of appearance in which an attribute appears when it is part of a composite key.  In the example the column would be the fourth column in the key.

order = 4

 
source column Indicate the source column that a foreign key refers to. source column = SocialSecurityNumber Only use this when the names of the two columns are different.

table

Indicate the table that a foreign key refers to. 

table = Customer

This is optional as it can often be inferred from the diagram.

 

 

6. How Do I Model Constraints And Triggers?

Most constraints (domain, column, table, and database) can be modeled using the UML’s Object Constraint Language (OCL) where appropriate.  Examples of this are depicted in Figure 3, a domain constraint on the Order_Date is defined indicating that it must be later than January 1st 2000.  A column constraint is also defined, the Customer_POID column mustn’t be null.  

Table and database constraints (not shown) are be modeled the same way. For example Figure 3 depicts how a referential integrity (RI) constraint can be modeled between two tables using OCL notation.  You see that when an order is deleted the order items should also be deleted.  Although this implied by the fact that there is an aggregation relationship between the two tables the constraint makes this explicit.  However, too many RI constraints can quickly clutter your diagrams, therefore supporting documentation for your database design might be a better option for this information so as not to clutter your diagrams – remember AM’s Depict Models Simply practice.

In Figure 2 the Salary table includes an access control constraint, only people in the HR department are allowed to access this information. Other examples in this diagram include the read only constraint on the VEmployee view and the ordered by constraint on Employee_Number in this view.

Triggers are modeled using the notation for methods(operations).  In Figure 3 you see that the stereotype of <<Trigger>> was applied and tagged value of “after insert” and “before delete” were modeled to shown when the triggers would be fired.  Stereotypes for methods are listed in Table 7.

 

Table 7. Stereotypes for methods.

Stereotype

Diagram Type

Core Notation

Application

Style Issues

<<Stored Procedure>>

Physical, Relational Databases

No

Indicates that a method is a stored procedure.

Stored procedures should be modeled as part of a single class.  This class is marked with the stereotype <<Stored Procedures>>, therefore you are merely cluttering your diagram with extraneous information by also applying a stereotype to the method.

<<Trigger>>

Physical, Relational Databases

Yes

Indicates that the method is a trigger.

You should also model the event that triggers the method.  e.g. {event = before insert | after update, target = ColumnName}

 

7. How Do I Model Stored Procedures?

Stored procedures should be modeled using a single class with the stereotype <<Stored Procedures>> as shown in Figure 3 and described in Table 3. This class lists the operation signatures of the stored procedures using the standard UML notation for operation signatures.

Although it is standard UML practice for stereotypes to be singular, in this case the plural form makes the most sense.  The other alternative is to apply the stereotype <<Stored Procedure>> to each individual operation signature, something that would unnecessarily clutter the diagram.

Stylistically, the name of this class should either be the database or the name of the package within the database.

 

8. How do I Model Sections Within a Database?

Many database management systems provide the ability to segregate your database into sections.  In Oracle these sections are called tablespaces and other vendors call them partitions or data areas.  Regardless of the term, you should use a standard UML package with a stereotype which reflects the terminology used by your database vendor (e.g. <<Tablespace>>, <<Partition>>, and so on). 

 

9. How Do I Model Everything Else?

There is far more to data modeling than what is covered by this profile.  The approach that I’ve taken is to identify the type of information that you are likely to include on your diagrams, but this is only a subset of the information that you are likely to gather as you’re modeling.  For example, logical data attribute information and descriptions of relationships can be important aspects of logical data models.  Similarly replication info (e.g. which tables get replicated, how often, …), sizing information (average number of rows, growth rate, …), and archiving information can be critical aspects of your physical data model.  Complex business rules are applicable to all types of models.  Although this information is important, in my opinion it does not belong on your diagrams but instead in your documentation.  Follow AM’s practice of Depict Models Simply by keeping this sort of information out of your diagrams.

If you feel there is something missing from this profile, and there definitely is, then let's talk about it. 

 

10. Requirements For This Profile

I firmly believe that the requirements for something should be identified before it is built.  This is true of software-based systems and it should be true for UML profiles (even unofficial ones).  This section presents a bulleted list of requirements from which I worked when putting this profile together.  I have chosen to present the requirements last because I suspect most people are just interested in the profile itself and not how it came about.

This list isn’t complete nor is meant to be – it is just barely good enough to get the job done.  In other words I took an agile approach to requirements modeling. If anyone intends to extend this profile I highly suggest that they start at the requirements just as I have.  The high-level requirements are:

Need to support different types of models

Need to model entities and tables

Need to model the attributes and columns

Need to model relationships

Need to model keys

Need to model constraints and behaviors

   

11. Where Do We Go From Here: Evolving this Profile?

The way I see it, there are three viable futures for this work:

  1. Turn it into an official UML profile.  I am very eager to see this happen, although I do not want to lead this effort as I do not have the time to invest in it.  I’m happy to participate but not lead.  A good approach would be to flesh out the requirements, the Common Warehouse Metamodel (CWM) from the OMG as well as the existing notations within the data modeling community would be excellent sources of information, and then to finish the profile based on those requirements.

  2. Take a grass-roots approach.  Perhaps we don’t need to get the official blessing of the OMG for this profile.  Instead, if a wide variety of practitioners and tool vendors adopt it then we’ll have turned it into a defacto industry standard.  I’d be happy to see this happen as well and I invite other methodologists and any tool vendors to adopt this profile (I also invite them to provide feedback on the Agile Data mailing list).

  3. This profile gets ignored.  Perhaps someone else will come along and do a better job than I’ve done, or perhaps this isn’t needed at all.  I hope this isn’t the case, but time will tell.

 

12. Linking to this Page

If you find this information useful, or at least you think it is something that your colleagues may benefit from, please feel free to link to it.  This will help to get the word out within the community.  The more people that know about and use this notation the greater the chance that member of the OMG will take up and finish this work.

Suggested listing:

    Title: A UML Profile for Data Modeling (Scott W. Ambler)

    URL: www.agiledata.org/essays/umlDataModelingProfile.html 

 

13. Contributors to this Profile

On November 2, 2003 I added this section to identify who has provided input into this profile.

Best way to contribute to this profile is to post ideas on the Agile Data mailing list.  Alternatively you can contact me privately.

 

14. Interesting UML Data Modeling Products

XMI2SQL can generate relational tables in Structured Query Language (SQL) from a given model in Extensible Markup Language (XML) Metadata Interchange (XMI).


 

 

15. References and Suggested Online Readings

List of References

Agile Database Techniques This book describes the philosophies and skills required for developers and database administrators to work together effectively on project teams following evolutionary software processes such as Extreme Programming (XP), the Rational Unified Process (RUP), Feature Driven Development (FDD), Dynamic System Development Method (DSDM), or The Enterprise Unified Process (EUP).  In March 2004 it won a Jolt Productivity award.
The Object Primer 3rd Edition: Agile Model Driven Development (AMDD) with UML 2 This book presents a full-lifecycle, agile model driven development (AMDD) approach to software development.  It is one of the few books which covers both object-oriented and data-oriented development in a comprehensive and coherent manner.  Techniques the book covers include Agile Modeling (AM), Full Lifecycle Object-Oriented Testing (FLOOT), over 30 modeling techniques, agile database techniques, refactoring, and test driven development (TDD).If you want to gain the skills required to build mission-critical applications in an agile manner, this is the book for you.

 

16. Let Us Help

I work with Ronin International, Inc. helping numerous organizations to learn about and hopefully adopt agile techniques and philosophies.  Ronin offers both consulting and training offerings.  In addition we host Enterprise Unified Process (EUP) which may find of value.  You might find several of my books to be of interest, including The Object Primer 3/e, Agile Modeling, The Elements of UML Style, and Agile Database Techniques.  For more information please contact Michael Vizdos at 866-AT-RONIN (U.S. number) or via e-mail (michael.vizdos@ronin-intl.com).

 
Page last updated on January 25 2005
This site owned by Ambysoft Inc.


Copyright 2002-2005 Scott W. Ambler