Sunday, February 19, 2012

Help me design my DB

Hi all.

I'm going to create a big DB that will hold important info
(the usual stuff - Clients, Products, Orders...)

I wonder where should I use the IDENTITY field,
for example - on Orders I will have Order_ID...
(and where does SQL server the numbers of a deleted records)

My fear is that IDENTITY fields will go wrong somehow so I can
loose connections within the tables
(maybe when restoring my DB to some other locations... with DTS... or other issues)

1. When should I use IDENTITY field ?

2. If I do NOT - how can I lock a record when I add a new oneP.S.

How can I alter a table turning ON or OFF an IDENTITY field ?

for example:
if I had my DB with Client_ID as an I IDENTITY field and for some reason it has
changed to just INT (with no IDENTITY) - how can I tell it to be IDENTITY field again ?|||Identity fields are very useful for creating your primary key into the table. Order_ID is a great example. When a new record is created, it automatically gets the next number. Since SQL server handles this for you, you don't have to keep a table of IDs, and all the locking overhead that goes with assigning your own incrementing value, and there's no way to "accidently" get two records with the same ID. The downside is that though it's fairly easy to turn it off while you add a couple records with specific IDs, you'll have to lock the table while you do so to keep other people's inserts from being added while it's turned off. If you see the need to insert records with specific IDs, or the need to modify IDs happing very often, an identity field may not be the right choice for you.

Here's what I have had a lot of success with...
All data tables have an identity ID field that is the PK into that table
All relationships use that ID field as the foreign key.
All inserts are done through stored procs that return the identity value
All updates and deletes are done through stored procs with the ID passed in to be used in the WHERE part of the update/delete command

As for your question about turning identity on/off, use the command...
SET IDENTITY_INSERT [ database. [ owner. ] ] { table } { ON | OFF }
Though it's the opposite what you might think. Setting IDENTITY_INSERT ON will allow you to insert records and specify thier ID, or update records and modify thier IDs. Setting IDENTITY_INSERT OFF (the default) means that the identity fields will get their values automatically on insert and cannot be modified.

Also note, you can only turn on IDENTITY_INSERT for one table at a time.|||Great help - thanks a lot!

U gave me some real good points
and "All relationships use that ID field as the foreign key." that will solve my problem,

I'll just create a new field in some table and bind data using this field and not by using an IDENTITY field...|||be aware of the issues of using identity fields, for instance:

- limited portability.. identity/autoid fields are not standard sql.

- replication issues.

- data transfer issues (this is a big one if you need to move data).

- uniqueness (across the database and then object model).

- size. if you're using int, you could hit the limit pretty quickly if your tables get big.|||Hi m7,

in-fact - the problem started when I had problems import / export data using DTS,
I updated SP3a and all the fixes needed as Microsoft said but nothing helped,
even reInstalled SQL Server...

it was very strange, when I try to export / import using the first option (copy data) it worked
just fine (including creating the table and data), but when I try to import / export as OBJECT
all strange problems arises... and I need this for the IDENTITY fields...

my SQL Server account is on a remote computer...

Anyway - I searched for a tool to allow me import / export data but I could not find a working one...
Microsoft do haveSqlWebAdmin which is nice, but does not seem to work with UNICODE characters...
(I could not export/import tables that has some NTEXT data in it - files not saved as UTF-8 but ANSI)

so my conclusion from all this is NOT to use IDENTITY fields as my connection between tables
but create a custom unique field in all relevant tables so all tables can be binded with.|||a guid would allow your field to be unique across all tables.

it also takes a lot of headaches away when doing data transfer.

i used to think using guids would be slow.. but tested them and they're not *that* much slower than ints. you do HAVE to remember NOT to create a clustered index on the guid column tho for obvious reasons!!!|||And, of course, your last point is even more impacted by GUIDs than by INTs:

>>- size. if you're using int, you could hit the limit pretty quickly if your tables get big.|||> And, of course, your last point is even more impacted by GUIDs than by INTs

nope. i dont talk of size in bytes of your database.

i talk in the fact that what happens if you choose an int, and then end up with a billion rows?

answer: there aren't enough ints.

the problem goes away with guids.|||:: talk in the fact that what happens if you choose an int, and then end up with a billion rows?
::
::answer: there aren't enough ints.

There are. SQL Int is 4 byte signed :-)

But what do you do once you hit 4 billion and some change?|||id meant to write "...end up with a FEW billion rows"

you knew what i meant tho :)|||Hi again,

U said U did try this,
did U try to put an index on this field ?

does it not slow down things because of it's length ?
(did U use just char ?)|||i set up the fields as guids (uniqueidentifier) and let my object framework create the new guid each time i insert a record.

i found the performance to be within 10% (or less) of an int pk for everything (insert, update, retrieve) even with joins.

indexes is where guids have problems. if you dont have one, your retrieve could perform badly. if you do have one, your inserts will perform badly as it will need to re-index (guids arent incremental.. you could get any value).|||>>within 10%

Seems significant, no? Especially on high-load systems?

I've read prior posts on GUIDs as pks, but I would think this might be enough to scare me away...|||within 10% (so often less) is totally insignificant when you consider the benefits.

heck, you could use the guid to cache the object globally and save the cost of using the guid easily.

and that 10% is the performance difference for just the queries. remember, querying on a primary key (or joining on it) is only a small part of a high-load system... i.e. it wont give your performance a 10% reduction overall.

that 10% can easily be bought back (cheaply) with some extra hardware (memory or whatever).

No comments:

Post a Comment