diff --git b/9781565926424 a/9781565926424 new file mode 120000 index 0000000..14506b9 --- /dev/null +++ a/9781565926424 @@ -0,0 +1 @@ +9781565926424 \ No newline at end of file diff --git b/README.md a/README.md new file mode 100644 index 0000000..d75946f --- /dev/null +++ a/README.md @@ -0,0 +1,15 @@ +## Example files for the title: + +# Unix Backup and Recovery, by W. Preston + +[![Unix Backup and Recovery, by W. Preston](http://akamaicovers.oreilly.com/images/9781565926424/cat.gif)](https://www.safaribooksonline.com/library/view/title/1565926420//) + +The following applies to example files from material published by O’Reilly Media, Inc. Content from other publishers may include different rules of usage. Please refer to any additional usage rights explained in the actual example files or refer to the publisher’s website. + +O'Reilly books are here to help you get your job done. In general, you may use the code in O'Reilly books in your programs and documentation. You do not need to contact us for permission unless you're reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from our books does not require permission. Answering a question by citing our books and quoting example code does not require permission. On the other hand, selling or distributing a CD-ROM of examples from O'Reilly books does require permission. Incorporating a significant amount of example code from our books into your product's documentation does require permission. + +We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. + +If you think your use of code examples falls outside fair use or the permission given here, feel free to contact us at . + +Please note that the examples are not production code and have not been carefully testing. They are provided "as-is" and come with no warranty of any kind. diff --git b/badtar.tar a/badtar.tar new file mode 100755 index 0000000..5c72651 Binary files /dev/null and a/badtar.tar differ diff --git b/hostdump.tar a/hostdump.tar new file mode 100755 index 0000000..957b0c4 Binary files /dev/null and a/hostdump.tar differ diff --git b/index.html a/index.html new file mode 100755 index 0000000..4083c4e --- /dev/null +++ a/index.html @@ -0,0 +1,43 @@ + + + + + + Unix Backup & Recovery Tools CD + + +  + + + + + + +
Unix Backup & +Recovery Tools & Procedures CD +

This CD Contains: +

Database Recovery Procedures for: +

  Informix +

  Oracle +

  Sybase +

An RFI for choosing backup software +

Free Backup Utilities: +

A one-step filesystem backup script: hostdump.sh +

A hot backup script for Oracle: oraback.sh +

A hot backup script for Informix: infback.sh +

A hot backup script for Sybase: syback.tar +

A fast implementation of tar: star +

A tool to read bar tar images: badtar +

A tool to read pesky tapes: read_tape.sh +
  +
  +
 

+
  +
+

It is always a good idea to check http://www.backupcentral.com +for updated versions of the free software contained on this CD, as well +as other resources.

+
+ + + diff --git b/infback.tar a/infback.tar new file mode 100755 index 0000000..cdc77ef Binary files /dev/null and a/infback.tar differ diff --git b/informix.gif a/informix.gif new file mode 100755 index 0000000..5da9a03 Binary files /dev/null and a/informix.gif differ diff --git b/informix.html a/informix.html new file mode 100755 index 0000000..fa58c1c --- /dev/null +++ a/informix.html @@ -0,0 +1,949 @@ + + + + + + +
Recovering Informix
+ +
 Recovering +Informix is much easier than recovering other databases. One reason is +that the commands to actually perform the recovery are simple---there is +only one argument for ontape and only two arguments for onbar. +This section covers just about any recovery scenario that you might find +yourself in, yet it's only 20 main steps. +

Another reason that Informix recoveries +are simple is that the sysmaster database, physical log, and logical log +are considered critical. In order to recover one of them, you have to recover +all of them. In Oracle, for instance, there are four or five different +recovery tactics that you can take, depending on which one of these objects +is damaged. With Informix, there is only one recovery tactic. +

The Informix recovery procedure does +not assume that you know why your database went down. It does assume that +you have been making backups via ontape or onbar. It also +assumes that if you are using ontape, you know which tapes or files +contain the latest level 0 and/or level 1 backups, as well as the location +of all logical log backups since your last physical backup. If you are +using onbar, this procedure assumes you know how to use your storage +manager well enough that you know how to respond to its prompts for any +media that may not be in an autochanger. In short, all media management +is up to you. +

The examples below use ontape. +The example hostname is curtis, and the instance name is crash. +The archives are being sent to disk files named /informix/logical/crash/crash.level.level.Z, +and continuous backups are being sent to a disk file named /informix/logical/crash.log. +

You should start with Step 1, "Does +oninit work?" +

+

+ +

Step 1: Does +oninit Work? +

The obvious first step in determining +if an instance is in need of recovery is to try and start the instance. +Do this by issuing the oninit command with no options. If it works, +it just returns the prompt to you. You could also see one of two errors. +

curtis $ oninit +

WARNING: Cannot access configuration +file $INFORMIXDIR/etc/$ONCONFIG.

+This error is pretty obvious. If you see +it, you are missing your configuration file. + +If you see the error above, proceed +to Step 2. +oninit: Cannot open chunk '/informix/rootdbs.dbf'. +errno = 2 +

oninit: Cannot open chunk '/informix/rootdbs_mirror.dbf'. +errno = 2 +

oninit: Fatal error in shared memory +initialization

+If any of your critical dbspaces has damaged +or missing chunks, you may see an error like the one above. If all critical +dbspaces are mirrored, and only one-half of the mirror is damaged, you +do not see this error. It also does not appear if a non-critical dbspace +is damaged. This error appears only if all chunks in a critical +dbspace are damaged. + +If you see an error like the one +above, proceed to Step 4. If you don't see either error, proceed to Step +7. + +Step 2: Is the +onconfig File Missing? +

The oninit utility uses the +onconfig +file to determine the basic information needed to start the instance. This +includes, but is not limited to, the following instance-specific information: +

+ + +If the onconfig file is missing +or corrupted, proceed to Step 3. If this is not the reason the instance +would not start, proceed to Step 4. + +Step 3: Restore +or Recreate the onconfig File +

If you are running onbar, it +can automatically recreate the onconfig file. However, if this file +is the only one damaged, there's no need to do a full restore just to restore +this file. Restoring or recreating it is easy enough. +

If you are running infback.sh, +it makes a backup copy of the onconfig file before it changes it. +DBAs and other scripts often do the same. Look first to see if you have +such a backup copy. If not, try to restore the file from the nightly filesystem +backups. If you cannot find a backup copy, and cannot restore one from +backup, you will need to recreate it. If any of the following objects is +available, it will be easy: +

+To recreate the onconfig file from +the root dbspace chunk or an ontape archive on disk, run the following +command, where filename is the name of the chunk or archive on disk: +$ strings filename|grep +'^[A-Z][A-Z_]*' \ +

>$INFORMIX/etc/$ONCONFIG

+If you have only an archive available +on tape, the command is similar: +$ dd if=$TAPEDEV bs=$TAPEBLK +\ +

| strings |grep '^[A-Z][A-Z_]*' +\ +

> $INFORMIX/etc/$ONCONFIG +
+This creates an editable text file that +contains all the parameters and their current values. The grep command +does not completely remove extraneous lines, so you should edit it and +remove any lines that do not look like the following: +PARAMETER value +If you do not have one of these objects +available, you need to manually recreate the file. Copy the onconfig.std +file in $INFORMIXDIR/etc to the name of the onconfig file. +The values that you must change are ROOTNAME, +ROOTPATH, +DBSERVERNAME, +SERVERNUM, +and SHMBASE. These +values will allow you to restore the instance. +

Step 4: Is there +an Inaccessible or a Critical Chunk? +

If an Informix instance will not start, +the most common cause is a missing or corrupt critical chunk. (If a non-critical +chunk is damaged the instance starts and records the problem to the online +log file.) The error that you receive may look something like the following: +

oninit: Cannot open chunk '/informix/rootdbs.dbf'. +errno = 2 +According to errno.h, an "Error +2" means "No such file or directory." This means that the chunk, or the +symbolic link that Informix uses to refer to that chunk, is missing. Another +common error is 13, which means "Permission denied." This means that someone +other than Informix owns the device. Any error other than those usually +means that the physical file is all right, but the data within it is corrupted. + +If the file is missing or its permissions +are wrong, proceed to Step 5. If not, proceed to Step 6. + +Step 5: Repair +or Replace the Missing Chunk +

This step is necessary only if the +physical file is somehow damaged. If it was a filesystem file, it might +be deleted or its permissions changed. If it was a raw device, the disk +drive could be damaged or missing, or its permissions could be wrong. Another +problem could be that you are using a symbolic link to the real chunk, +and the symbolic link was accidentally deleted. +

If the missing file is a symbolic link, +you simply need to restore or recreate the file in its original location. +The only difficulty part is that Informix doesn't tell you which file it +was symbolically linked to. Restoring the symbolic link from your regular +filesystem backups is probably the easiest answer. Another method would +be to consult any documentation that you may have about how you put the +instance together. (Restoring from backup is obviously much easier.) +

If it is not a symbolic link, the damaged +file may be a filesystem file or raw device. If it is a filesystem file +and the filesystem itself is intact, simply recreate a new file with the +touch command. After doing so, make sure that the file is read/write for +the informix user +and informix group. +If the filesystem is not intact, you need to relocate the file. Hopefully, +you followed the common practice of using symbolic links to point to the +actual chunks. If you did, you can recreate the chunk file anywhere on +the system and just change the symbolic link to point to the new location. +If you did not, you need to make a symbolic link in the original location +to point to the new file. +

For example, assume that the filesystem +/data1 +is destroyed, and it contained chunk /data1/rootdbs.dbf. However, +you set up the Informix instance to point directly to /data1/rootdbs.dbf, +instead of to a symbolic link to that chunk. You create a new file called +rootdbs.dbf +in /data2, but you have to tell oninit to use the new file. +You need to unmount /data1 (although it probably is already) +and create a symbolic link in the old location with the following command: +

$ ln -s /data2/rootdbs.dbf /data1/rootdbs.dbf +This is, of course, a very bad solution +since repairing and remounting /data1 will overwrite the symbolic link. +If you have to do this, consult your IDS Adminstration Manual about permanently +relocating the file. (Use a symbolic link this time.) +

Before continuing, you may wish to +verify that all chunks are all right. If you don't have a complete list +of filenames, you can obtain them by running the strings command +on a root dbspace chunk or a ontape archive: +

$ zcat /informix/logical/crash/crash.level.1.Z +\ +

| strings | grep '^/'

+Make sure that you have checked both of +the following conditions: +

Permissions +

Ensure that someone didn't accidentally +change the ownership or permissions of any chunks. If you are using symbolic +links to point to the actual chunks, the only permissions that matter are +those for the final file to which the symbolic link is pointing. For example, +suppose that you have a symbolic link called /informix/chunk1 that +points to /dev/dsk/c0t0d0s5. If you are running Solaris, and if +you run an ls -l on /dev/dsk/c0t0d0s5, you will find this: +lrwxrwxrwx 1 root other 84 Nov +5 02:29 /dev/dsk/c0t0d0s5 -> ../.. +

/devices/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000 +

/esp@f,800000/sd@0,0:f

+It is the permissions of sd@0,0:f +that matter, not the symbolic link /dev/dsk/c0t0d0s5. To verify its permissions, +run the following command: +$ Ls -lL /dev/dsk/c0t0d0s5 +Adding the -L option causes it +to display the ownership and permissions of the source file, not of the +link. Make sure that both the owner and group are informix, +and that the file is read/write for both the owner and the group.
+Symbolic link +This problem usually happens when +the DBA is trying to "clean house." If you're using symbolic links to point +to the actual chunks, and someone deletes the symbolic link, you obviously +have a problem. If you are using symbolic links, make sure that you remake +the symbolic link instead of creating a chunk in its place. +If the instance is currently down +because a critical dbspace could not be opened, return to Step 1 and run +oninit +again. If the instance is already up and you were sent here by Step 8, +proceed to Step 9. If you have repaired or replaced any missing chunks, +and the instance still won't start, proceed to Step 6. + +Step 6: Performing +a Cold Restore + +You should be performing this step +only if directed to do so by Step 5. + +Make sure you need this step +This is not a step to be taken lightly. +Depending on the size of your instance, a cold restore may take a considerable +amount of time. Please take a moment to verify that you really do need +to perform a restore; if you do, follow the appropriate section below. +Can't I just restore the critical +dbspaces? +

Both ontape and onbar +allow you to specify a list of dbspaces to restore. This even works with +critical dbspaces. However, if you restore just the critical dbspaces, +the restore leaves all other chunks in an "inconsistent" state, as specified +by the "I" flag that they display after the restore is done. Informix support +does have a tool that will change this flag to consistent, but your mileage +will vary on this one. If you are restoring a critical dbspace, you should +really restore the whole thing.

+
+Restoring with ontape + +Make sure to remove the current logical +log "tape" prior to beginning a physical restore. If you're backing up +to disk, this means moving the file to a different location. If you're +backing up to tape, this means physically removing the current logical +log tape. When performing a physical restore with ontape, it asks +you if you want to back up the current log, and you should always say +yes. However, this performs an ontape -a backup, not an ontape +-c backup. Remember that the primary difference between these two options +is that ontape -a overwrites the "tape" currently in the +drive. If it contains any logs other than the current log, they will +disappear forever. + +After removing the current logical log +tape or file from its current location, place the latest level 0 archive +in the device specified by $TAPEDEV. If you are backing up to disk, +this may require uncompressing and moving a file to the appropriate location. +If your backing up to tape, it involves placing the latest level 0 archive +in the tape drive. After doing so, execute the following command: +$ ontape -r +Ontape prompts you for +everything that it needs. You need to know if you have a level 1 or 2 backup, +since it will ask you that. You'll also need to know the location of all +logical log backups since the latest archive was taken. + +Informix asks you if you want to back up the logs. Always say yes, +but make sure that you are giving it a fresh tape or file to back up to, +since it will overwrite it -- not append to it. + +The following example was done with an instance that is archiving to +disk using infback.sh. Infback.sh now uses named pipes, so the value for +$TAPEDEV will be a named pipe (e.g., /informix/logical/crash.level.1.fifo) +that reflects the last level that was performed. Since infback.sh will +recreate the named pipe when it needs to, we will overwrite it with symbolic +links. The example instance also used rclogs.sh to continuously back up +the logical logs to disk. The value for $LTAPEDEV is /informix/logical/crash.log, +and when it switches logs, it copies them to /informix/logical/$ONCONFIG.year.month.day.hour.minute.second. +

An ontape restore with archives on disk always requires more than +one window, and this section needs to show both windows below to fully +demonstrate the example. To reduce confusion, it uses a regular paragraph +like this one when switching windows. Since it still needs to explain the +reasoning behind certain commands or answers within a window, it uses this +font to do that. There is also a heading on each body of computer output +specifying either Restore Window or Alternate Window. The restore window +is the window where the ontape -r command is being run, and the alternate +window will be the window where we perform other commands. We will start +with Figure 13-20, the restore window. +

[Restore Window] +

#The first thing that we need +to do is +

#uncompress the archive files +

curtis$ uncompress /informix/logical/crash/crash.level.*.Z +

curtis$ ls /informix/logical/crash +

crash.level.0 crash.level.1 +

#Now we need to remove the named pipe and replace it with a +

#symbolic link +

#to the actual backup file +

curtis$ rm /informix/crash.level.0.fifo +

curtis$ ln -s /informix/logical/crash/crash.level.0 /informix/crash.level.0.fifo +

#Now we can begin the restore +

curtis$ ontape -r +

Please mount tape 1 on /informix/logical/crash/crash.level.0 and +press Return to continue ... +

Archive Tape Information +

Tape type: Archive Backup Tape +

Online version: INFORMIX-OnLine Version 7.23.UC4 +

Archive date: Thu Jan 21 00:57:14 1999 +

User id: informix +

Terminal id: ? +

Archive level: 0 +

Tape device: /informix/crash.level.0.fifo +

Tape blocksize (in k): 16 +

Tape size (in k): 1024000 +

Tape number in series: 1 +

Spaces to restore: +

1 [rootdbs ] +

2 [plogdbs ] +

3 [llogdbs ] +

4 [testdbs ] +

Archive Information +

INFORMIX-OnLine Copyright(C) 1986-1995 Informix Software, Inc. +

Initialization Time 03/04/98 20:08:25 +

System Page Size 2048 +

Version 4 +

Archive CheckPoint Time 01/21/99 00:57:17 +

Dbspaces +

number flags fchunk nchunks flags owner name +

1 2 1 1 M informix rootdbs +

2 2 2 1 M informix plogdbs +

3 2 3 1 M informix llogdbs +

4 1 4 1 N informix testdbs +
  +
  +

Chunks +

chk/dbs offset size free bpages flags pathname +

1 1 0 10000 9051 PO- /informix/rootdbs.dbf +

1 1 0 10000 0 MO- /informix/rootdbs_mirror.dbf +

2 2 0 5000 2447 PO- /informix/physlog.dbf +

2 2 0 5000 0 MO- /informix/physlog_mirror.dbf +

3 3 0 5000 3447 PO- /informix/logiclog.dbf +

3 3 0 5000 0 MO- /informix/logiclog_mirror.dbf +

4 4 0 500 191 PO- /informix/testdbs.dbf +

#Ontape displays all this information to you so that you know +

#that this is the right tape to restore the right instance. +

#It doesn't actually do +

#anything until you respond "y" to the next question. +

Continue restore? (y/n)y +

#Always say "YES" to this next question. +

Do you want to back up the logs? (y/n)y +

Please mount tape 1 on /informix/logical/crash.log and press Return +to continue ... +

Would you like to back up any of logs 65 - 67? (y/n) y +

Figure 13-20: Starting an ontape +restore +
+This next section is from another window. We need to move the old logical +log "tape" out of the way so that the salvaging of the current log does +not overwrite it. In the example in Figure 13-21, we will use the same +naming convention as the other files. +[Alternate Window] +

curtis$ cp crash.log crash.log.1999.01.21.17.04.00 +

curtis$ compress crash.log.1999.01.21.15.05.16 +

curtis$ ls -l crash.log.1999.01.21* +

total 2424 +

-rw-rw---- 1 informix informix 73961 Jan 21 01:12 crash.log.1999.01.21.01.13.02.Z +

-rw-rw---- 1 informix informix 1949 Jan 21 01:13 crash.log.1999.01.21.01.14.08.Z +

-rw-rw---- 1 informix informix 557056 Jan 22 17:04 crash.log.1999.01.22.17:04:00.Z +

Figure 13-21: Preparing disk-based +logical log backups +
+Once we've copied the "tape" to another location, it is safe to tell +Informix to salvage the current logical log. Note in Figure 13-22 that +when it asks for the oldest log that we would like to back up, we answer +with the oldest number available. (It never hurts to have too many logical +log backups. If we were to answer "66," what would happen if the restore +needed log 65 and it had not been backed up, or its backup had been damaged? +We would be out of luck, thats what. +[Restore Window] +

Logical logs 65 - 67 may be backed up. +

Enter the id of the oldest log that you would like to backup? 65 +

Please label this tape as number 1 in the log tape sequence. +

This tape contains the following logical logs: +

1 - 67 +

Log salvage is complete, continuing restore of archive. +

#we do have a level one archive, so when it asks if we have one, +

#we will answer "yes." +

Restore a level 1 archive (y/n) y +

Ready for level 1 tape +
  +
  +

Figure 13-22: Backing up the current +logical logs +
+You may recall that prior to beginning the restore, we created a symbolic +link from the level 0 archive on disk to the location that Informix expects +the archive to be. Now that we are "swapping tapes," we need to remove +that link and create another one that points to the level 1 backup. (The +commands in Figure 13-23 are obviously being done in another window.) +[Alternate Window] +

curtis$ rm /informix/crash.level.0.fifo +

curtis$ ln -s /informix/logical/crash/crash.level.1 /informix/crash.level.0.fifo +

Figure 13-23: Simulating a tape +swap +
+Now that we have swapped tapes, we can respond to the prompt shown in +Figure 13-24. +[Restore Window] +

Please mount tape 1 on /informix/logical/crash/crash.level.0 and +press Return to continue ... +

Archive Tape Information +

Tape type: Archive Backup Tape +

Online version: INFORMIX-OnLine Version 7.23.UC4 +

Archive date: Thu Jan 21 01:10:13 1999 +

User id: informix +

Terminal id: ? +

Archive level: 1 +

Tape device: /informix/crash.level.1.fifo +

Tape blocksize (in k): 16 +

Tape size (in k): 1024000 +

Tape number in series: 1 +

#We do not have a level to archive, so we will answer no to +

#following prompt. +

Restore a level 2 archive (y/n) n +

#We do want to restore log tapes, though... +

Do you want to restore log tapes? (y/n)y +

Roll forward should start with log number 65 +

Figure 13-24: Responding to ontapes +prompts +
+Again, we must move over to the other window and prepare the logical +log "tape." First, we move the salvage logs to an appropriately named file +and compress it. Then we use the log concatenation method discussed in +the previous sidebar. Since the logs are compressed, in Figure 13-25 we +create a single concatenated log using zcat. +[Alternate Window] +

curtis$ mv crash.log crash.log.1999.01.21.18.00.00 +

curtis$ compress crash.log.1999.01.21.18.00.00 +

curtis$ ls -l crash.log.1999.01.2* +

total 2424 +

-rw-rw---- 1 informix informix 73961 Jan 21 01:12 crash.log.1999.01.21.01.13.02.Z +

-rw-rw---- 1 informix informix 1949 Jan 21 01:13 crash.log.1999.01.21.01.14.08.Z +

-rw-rw---- 1 informix informix 557056 Jan 22 17:04 crash.log.1999.01.22.17:04:00.Z +

-rw-rw---- 1 informix informix 557056 Jan 22 18:00 crash.log.1999.01.22.18.00.00.Z +

curtis$ zcat *1999* >crash.log +

curtis$ chmod 664 * crash.log +

Figure 13-25: Creating one large +logical log backup +
+Now that we have created the single log, we can respond to the following +prompt in Figure 13-26. +[Restore Window] +

Please mount tape 1 on /informix/logical/crash.log and press Return +to continue ... +

#Since we put all logs into this single log, there are no +

#more logs to restore. +

Do you want to restore another log tape? (y/n)n +

Program over. +

#The next step is very important. You must bring the +

#instance online when you are done, or you will +

#need to do the restore all over again. +

curtis$ onmode -m +

Figure 13-26: Completing the restore +and starting the instance +
  +
  +
  +
  +
  +
  +

Make sure that you use onmode -m to bring the instance online after +doing a cold restore. If you do not, you will need to completely redo the +restore if you stop and start the instance before doing so.

+
+Restoring with onbar +

The first and simplest recovery with onbar is to enter onbar -r. +This specifies to do a complete restore of any offline dbspaces. It automatically +performs the following three steps for you: +

    +
  1. +onbar -l -s (salvage the logical log).
  2. + +
  3. +onbar -r -p [-w] (complete physical restore, which reads the archive +only)
  4. + +
  5. +onbar -r -l (complete logical restore, which reads all available logical +logs)
  6. +
+You may also add an optional -w flag (onbar -r -w) that specifies using +the latest whole backup when performing the restore. If you have not been +performing -w backups, then you cannot use -w on the restore. If you have +been using -w on your backups, then you can use the same option on the +restore. You also have the option of not using the -w option on the restore, +even if you did use it to back up. +

Unlike ontape, you do not even need to move files around or swap +tapes if you have an autochanger. It automatically retrieves the appropriate +volumes that it needs to write to or read from. Even if you do not have +an autochanger, it prompts you for the appropriate tapes by name. +

You also have the option of performing the three steps by yourself. +This allows you to use a number of flags to do different kinds of restores +based on your needs of the moment. The first thing you need to do, though, +is issue the onbar -l -s command to salvage any logical logs that have +not been backed up. +

After doing that, you have a number of options when performing the +physical and logical restores. (As started earlier, a physical restore +is one that just reads the archive tape. It does not apply any logical +logs. Applying the logical logs is called the logical restore. The following +onbar command represents your options when beginning the physical restore. +Please note the grouping of the options. The -p, -n and -t options are +mutually exclusive, and so are the -w, dbspace_list and noflags options. +

$ onbar -r \ +

[ -p | -n xxx +| -t time ] [ -w | dbspace_list | noflags +]

+Here is a listing of the various options +and how they affect the restore: +

-p +

Adding the -p option to the +onbar +-r command tells it to perform only the physical restore. If +you use this option, you need to run the onbar -r -l command to +perform the logical restore. If you do not specify this option, onbar +performs both a physical and a logical restore. +-n xxx or -t time +If you do not specify -p, +you can also use these flags to decide how the logical restore is performed. +For details on these flags, see the following logical restore section. +-w +This specifies to use the latest +whole backup when restoring the database. Although using this flag +will perform a restore of the whole database, this flag is actually telling +onbarwhich +backup to use, not what kind of restore to do. The reason this flag +is here is that if you do restore from a whole backup, you have the option +of not doing a logical restore, since you have restored the database +to a consistent point in time. (This option is not available to you if +you have not been making backups with the -w option.) +dbspace_list +If you do not use the -w option, +you can optionally list the dbspaces that onbar should recover, +separated by white space. +noflags +This is actually just a placeholder +to demonstrate what would happen if you used no other flags at all. If +you enter onbar -r or onbar -p without specifying +-w +or a list of dbspaces to recover, it automatically detects and recovers +any dbspaces that are offline. The noflags option, as described here, is +meant to reiterate the fact that you do not have to specify the +-w +flag to get a complete restore. The -w flag specifies the restore's +source, +not its destination. +If you specified just a physical restore, +you may now perform the logical restore. When doing so, you have three +options: +

onbar -r -l +

This is the default method, and performs +a logical restore using all logical logs that were created since the latest +archive. +onbar -r -l -n lognumber +If you know the last log that you +wish to use, you may use this option. You may specify the last log that +onbar +should read using -n lognumber. +onbar -r -l -t time +You've been waiting for this one. +You know that you accidentally deleted a major table at 14:02. You would +like to replay all transactions except that one. You may tell onbar +to do this using the new -t time option. (In the previous +deleted table example, you would probably enter onbar -r -l 14:00.) +Make sure that you use onmode +-m to bring the instance online after doing a cold restore. If you +do not bring the instance online with onmode m before the next time you +stop and start the instance, you will need to completely redo the restore + +In summary, an onbar restore offers +you the same simplicity as an ontape restore, since both have the +all-encompassing -r option that means to do a complete physical +and logical restore. However, if you need extra options like point-in-time +or point-in-log restores, they are available. It also has the added benefit +of working with your storage manager so that you no longer have to worry +about what tape has what backup. If you have not begun to look at onbar, +perhaps now is the time to start. + +Nine out of every 10 restores will +end right here. To make sure that everything is okay, return to step 1 +and restart the instance after first bringing it online with onmode +-m. + +Step 7: Are there +Errors in the Online Log? +

Perhaps the instance started on your +first try. Perhaps you needed to do a cold restore in order to get it started. +The next thing to do would be to check the online log for any errors. Examples +of the types of errors you may see are shown in Figure 13-27: +

23:27:34 Assert Failed: WARNING! +pthdrpage:ptalloc:bad bfget +

23:27:34 Who: Session(7, informix@curtis, +0, 169149316) +

Thread(13, fast_rec, a12ccd8, 1) +

23:27:34 Results: Cannot use TBLSpace +page for TBLSpace 4194305 +

23:27:34 Action: Run 'oncheck -pt +4194305' +

23:27:34 See Also: /tmp/af.d79e5 +

23:27:34 Cannot Open DBspace 4. +

Figure 13-27: Example errors in +the online log +
+If you see any errors like this, you should run an onstat -d to see +which chunk is having a problem: +Chunks +

address chk/dbs offset size free +bpages flags pathname +

# Output abbreviated... +

a12a508 4 4 0 500 191 PD- /informix/testdbs.dbf

+The flags above show you that the /informix/testdbs.dbf +chunk +is down. What you need to find out now is why it is down. + +If you start the instance and there +are no errors in the online log, then proceed to Step 16. If there are +errors in the log, proceed to Step 8. + +Step 8: Is There +an Inaccessible Non-Critical Chunk? +

If oninit is able to access +all critical chunks, it brings the instance online. If any non-critical +chunks are inaccessible, it just logs the problem in the online log. If, +after checking the online log and running an onstat -d, you have +verified that a non-critical chunk is inaccessible to Informix, you need +to repair or replace it. +

+If a non-critical chunk is inaccessible, +return to Step 5. If you have verified that the problem chunk is now accessible +and has the correct permissions, proceed to Step 10. + +Step 9: Is There +a Corrupted Non-Critical Chunk? + +You should be performing this step +only if directed to do so by Steps 5 or 8. + +You might not need a restore +

The best way to find out if your non-critical +chunks are corrupted is to try to bring them online. In order to be able +to do that, the following conditions must be true: +

+If all these conditions are true, you +can probably save yourself a restore by bringing the dbspaces online using +onspaces. +Run the following command for each chunk that was marked down: +$ onspaces -s dbspacename +-p chunkname -o offset -O +

Warning: Bringing chunk back online. +

Do you really want to continue? (y/n)y +

Verifying physical disk space, please +wait ... +

Chunk status successfully changed.

+If you see successful messages like this +one, you won't even need to do a restore. If it complains that the chunk +is inconsistent, you have to do a restore to bring it to a consistent state. + +If you were able to bring all down +dbspaces online, proceed to Step 11. If not, proceed to Step 10 to restore +them. + +Step 10. Perform +a dbspace Restore +

There isn't much that can be said in +this step that wasn't already covered in Step 6. However, there are a few +differences between the restore discussed in Step 6 and this one: +

+Read Step 6 in detail, then run one of +the following commands: +$ ontape -r -D dbspace dbspace#Will +recovery any dbspaces +

$ onbar -r #Will recover +all down dbspaces +

$ onbar -r dbspace#Will +recover any dbspaces listed

+Both onbar and ontape prompt +you with the same standard questions. The main differences are that you +may be warned that the affected dbspaces will be taken offline, and you +will not be asked the "Do you want to back up the logs?" question. A sample +output from an ontape restore can be found in Figure 13-28: +curtis$ onstat -d|grep testdbs.dbf +

a12a508 4 4 0 500 191 PD- /informix/testdbs.dbf +

curtis$ ontape -r -D testdbs +

DBspace 'testdbs' is online; restoring +'testdbs' will bring all chunks +

comprising the DBspace OFFLINE and +will terminate all active +

transactions and queries accessing +the DBspace. +

OK to continue?y +

Please mount tape 1 on /informix/logical/crash/crash.level.0 +and press Return to continue ... +

Archive Tape Information +

Tape type: Archive Backup Tape +

Online version: INFORMIX-OnLine Version +7.23.UC4 +

Archive date: Thu Jan 21 00:57:14 +1999 +

User id: informix +

Terminal id: ? +

Archive level: 0 +

Tape device: /informix/crash.level.0.fifo +

Tape blocksize (in k): 16 +

Tape size (in k): 1024000 +

Tape number in series: 1 +

Continue restore? (y/n)y +

Spaces to restore:1 [testdbs ] +

Restore a level 1 archive (y/n) n +

Do you want to restore log tapes? +(y/n)y +

Roll forward should start with log +number 65 +

Please mount tape 1 on /informix/logical/crash.log +and press Return to continue ... +

Do you want to restore another log +tape? (y/n)n +

Program over. +

curtis$ onstat -d|grep testdbs.dbf +

a12a508 4 4 0 500 191 PO- /informix/testdbs.dbf +

Figure 13-28: Sample ontape output +
+Once the restore of all down dbspaces is complete, they will be brought +online. + +Once all down dbspaces are restored and online, proceed to Step +11. + +Step 11: Are there +Wrong Values in the onconfig File? +

If you were forced to use an old onconfig +file backup, or create one from scratch, you may have some potentially +wrong values. Depending on which values are wrong, they may prevent the +instance from operating properly. If so, oninit logs them in the +online log. +

02:13:58 Onconfig parameter +LTAPEBLK modified from 32 to 16. +

02:14:46 Onconfig parameter +MIRROROFFSET modified from 1 to 0 +

If you see any errors like this, +proceed to Step 12. If not, then proceed to Step 13. +
+Step 12: Change +the Bad Values in the onconfig File +

This one is about as easy as they come. +Change any bad values in the onconfig file back to their original +values. For example, if you saw the errors displayed in Step 11, you need +to change LTAPEBLK +to 32 and MIRROROFFSET +to 1. Unfortunately, most of these values are read only at startup. +

+Once you have changed any incorrect +values, proceed to Step 13. + +Step 13: Ensuring +that the Instance will Restart +

If you changed any values in Step 12, +you need to restart the instance to have oninit read the new values. Also, +depending on the number of steps that you had to follow to get to this +step, you may want to make sure that everything will start correctly the +next +time. The only way to be sure of that is to restart the instance now. +

+If you wish to restart the instance, +proceed to Step 14. If not, proceed to Step 15. + +Step 14: Taking +the Instance Offline +

If you had to do any restores to get +to this step, make sure that you bring the instance online before you take +it offline again. To make sure that it is online, run the following command: +

$ oninit - +

INFORMIX-OnLine Version 7.23.UC4 +-- On-Line -- Up 00:00:29 -- 8976 Kbytes

+If you see the above output, then the +instance was brought online. If you see the output below, you need to bring +the instance online by running the command onmode -m. +INFORMIX-OnLine Version 7.23.UC4 +-- Quiescent -- Up 00:01:17 -- 8976 Kbytes +Once you are sure that the instance is +online, you can take it offline: +$ onmode -ky +Once you have done so, return to +Step 1. + +Step 15: Confirming +that dbspaces and Chunks are Online +

If you don't restart the database, +you should make doubly sure that all dbspaces and chunks are online. To +do so, run the command onstat d, as shown in Figure 13-29: +

curtis$ onstat -d +

INFORMIX-OnLine Version 7.23.UC4 +-- On-Line -- Up 00:06:45 -- 8976 Kbytes +

Dbspaces +

address number flags fchunk nchunks +flags owner name +

a12a100 1 2 1 1 M informix rootdbs +

a12a790 2 2 2 1 M informix plogdbs +

a12a800 3 2 3 1 M informix llogdbs +

a12a870 4 1 4 1 N informix testdbs +

4 active, 2047 maximum +

Chunks +

address chk/dbs offset size free +bpages flags pathname +

a12a170 1 1 0 10000 9307 PO- /informix/rootdbs.dbf +

a12a248 1 1 0 10000 0 MO- /informix/rootdbs_mirror.dbf +

a12a358 2 2 0 5000 2447 PO- /informix/physlog.dbf +

a12a5e0 2 2 0 5000 0 MO- /informix/physlog_mirror.dbf +

a12a430 3 3 0 5000 3447 PO- /informix/logiclog.dbf +

a12a6b8 3 3 0 5000 0 MO- /informix/logiclog_mirror.dbf +

a12a508 4 4 0 500 191 PO- /informix/testdbs.dbf +

4 active, 2047 maximum +

Figure 13-29: A sample onstat +output +
+Check the flags column in the Dbspaces section for flags P, L, or R. +Also check the flags column of the Chunks section for flags D or I. The +meanings of these flags are: +

P +

The dbspace has been physically recovered and is awaiting logical +recovery. +L +The dbspace is being logically recovered. +R +The dbspace is being physically recovered. +D +The chunk is down. +I +The chunk is in an inconsistent state. +Once you know if there are any of these flags, proceed to step +18. + +Step 16: Recovering +a Deleted Table or Database +

Perhaps the instance started OK, but +there is a different problem. If a DBA accidentally deleted a dbspace, +or a user accidentally deleted an important table, there is really only +one way to recovery that -- a point-in-time restore. +

+If you need a point-in-time restore, +proceed to Step 17. If not, proceed to Step 18. + +Step 17: Performing +a Point-in-Time Restore +

In order to do a point-in-time restore, +you need to do a cold restore of the entire database. (Details on how to +do that are in Step 6.) +

If you are using ontape, you +will need to apply all logical logs until you reach the one during which +the user/DBA error occurred. Do not apply that logical log. +

If you are using onbar, you +can use the -n xxx or -t time features of onbar -r +to recover up to a point in time just prior to the user/DBA error. +

+Once you have done this, proceed +to Step 19. + +Step 18: Is Everything +OK? + +If you saw any of the flags mentioned +in Step 15, return to Step 8. If not, proceed to Step 19. + +Step 19: Making +a Backup +

Every restore should be followed immediately +by a full backup. Of course, Informix allows you do so online. Dont consider +the restore finished until you have completed this backup. + + diff --git b/logo.png a/logo.png new file mode 100644 index 0000000..a086b7f Binary files /dev/null and a/logo.png differ diff --git b/oraback.tar a/oraback.tar new file mode 100755 index 0000000..2162431 Binary files /dev/null and a/oraback.tar differ diff --git b/oracle.gif a/oracle.gif new file mode 100755 index 0000000..17dc52e Binary files /dev/null and a/oracle.gif differ diff --git b/oracle.html a/oracle.html new file mode 100755 index 0000000..99a270b --- /dev/null +++ a/oracle.html @@ -0,0 +1,1858 @@ + + + + Oracle Recovery Procedure + + +

Recovering Oracle
+ +

Since an Oracle database consists of +several interrelated parts, recovering such a database is done through +a process of elimination. Identify which pieces work, then recover the +pieces that don't work. The following recovery guidefollows that logic +and works regardless of the chosen backup method. It consists of a flowchart +and a procedure whose numbered steps correspond to the elements in the +flowchart. +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Step 1: Try +Startup Mount +

The first step in verifying the condition +of an Oracle database is to attempt to mount it. This works because mounting +a database (without opening it) reads the control files but does not open +the data files. If the control files are mirrored, Oracle attempts to open +each of the control files that are listed in the initORACLE_SID.ora +file. If any of them is damaged, the mount fails. +

To mount a database, simply run svrmgrl, +connect to the database, and enter startup mount. +

$ svrmgrl +

SVRMGR > connect internal; +

Connected. +

SVRMGR > startup mount; +

Statement processed.

+If it succeeds, the output looks something +like this: +SVRMGR > startup mount; +

ORACLE instance started. +

Total System Global Area 5130648 +bytes +

Fixed Size 44924 bytes +

Variable Size 4151836 bytes +

Database Buffers 409600 bytes +

Redo Buffers 524288 bytes +

Database mounted. +

If the attempt to mount the database +succeeds, proceed to Step 10. +
+If the attempt to mount the database fails, +the output looks something like this: +SVRMGR > startup mount; +

Total System Global Area 5130648 +bytes +

Fixed Size 44924 bytes +

Variable Size 4151836 bytes +

Database Buffer to s 409600 bytes +

Redo Buffers 524288 bytes +

ORACLE instance started. +

ORA-00205: error in identifying controlfile, +check alert log for more info +

If the attempt to mount the database +fails, proceed to Step 2. +
+Step 2: Are All +Control Files Missing? +

Don't panic if the attempt to mount +the database fails. Control files are easily restored if they were mirrored, +and can even be rebuilt from scratch if necessary. The first important +piece of information is that one or more control files are missing. +

Unfortunately, since Oracle aborts +the mount at the first failure it encounters, it could be missing one, +two, or all of the control files, but so far you know only about the first +missing file. So, before embarking on a course of action, determine the +severity of the problem. In order to do that, do a little research. +

First, determine the names of all of +the control files. Do that by looking at the configORACLE_SID.ora file +next to the word control files. +It looks something like this: +

control_files = (/db/Oracle/a/oradata/crash/control01.ctl, +

/db/Oracle/b/oradata/crash/control02.ctl, +

/db/Oracle/c/oradata/crash/control03.ctl)

+It's also important to get the name of +the control file that Oracle is complaining about. Find this by looking +for the phrase control file: +in the alert log. (The alert log can be found in the location specified +by the background_dump_dest +value in the configinstance.ora file. (Typically, it is in the ORACLE_BASE/ORACLE_SID/admin/bdump +directory.) In that directory, there should be a file called alert_ORACLE_SID.log. +In that file, there should be an error that looks something like this: +Sat Feb 21 13:46:19 1998 +

alter database mount exclusive +

Sat Feb 21 13:46:20 1998 +

ORA-00202: controlfile: '/db/a/oradata/crash/control01.ctl' +

ORA-27037: unable to obtain file +status +

SVR4 Error: 2: No such file or directory +

Warning! Some of the following procedures +may say to override a potentially corrupted control file. Since one never +knows which file may be needed, always make backup copies of all of the +control files before doing any of this. That offers an "undo" option that +isn't possible otherwise. (Also make copies of the online redo logs as +well.) +
+With the names of all of the control files +and the name of the damaged file, it's easy to determine the severity of +the problem. Do this by listing each of the control files and comparing +their size and modification time. (Remember the game "Which one of these +is not like the other," on Sesame Street?) The following scenarios assume +that the control files were mirrored to three locations, which is a very +common practice. The possible scenarios are: +

The damaged file is missing, and at +least one other file is present +

If the file that Oracle is complaining +about is just missing, that's an easy thing to fix. +If this is the case, proceed to Step +3. + +The damaged file is not missing. It is +corrupted + +This is probably the most confusing +one, since it's hard to tell if a file is corrupted. What to do in this +situation is a personal choice. Before going any farther, make backup copies +of all control files. Once you do that, try a "shell game" with the different +control files. The shell game consists of taking one of the three control +files and copying it to the other two files' locations. Then attempt to +mount the database again. The "shell game" is covered in Step 3. +However, if all the online redo logs are +present, it's probably easier at this point to just run the "create controlfile" +script discussed in Steps 6 and 7. This rebuilds the control file to all +locations automatically. (Before that, though, follow Steps 4 and 5 to +verify if all the data files and log files are present.) +To rebuild the control file using +the "create controlfile" script, proceed to Steps 4 through 7. + +All of the control files are missing, +or they are all different sizes and/or times. +If all of the control files are corrupt +or missing, they must be rebuilt or the entire database must be restored. +Hopefully your backup system has been running the backup control file +to trace command on a regular basis. (The output of this command is +a SQL script that will rebuild the control files automatically.) +If the backup control file to +trace command has been running, proceed to Steps 4 through 7. If not, +then proceed to Step 8. + +Step 3: Replace +Missing Control File +

If the file that Oracle is complaining +about is either missing or appears to have a different date and time than +the other control files, this will be easy. Simply copy another one of +the mirrored copies of the control file to the damaged control file's name +and location. (The details of this procedure are below.) Once this is done, +just attempt to mount the database again. +

+Warning! Make sure to make backup +copies of all of the control files before overwriting them! + +The first thing to do is to get the name +of the damaged control file. Again, this is relatively easy. Look in the +alert log for a section like the one below: +Sat Feb 21 13:46:19 1998 +

alter database mount exclusive +

Sat Feb 21 13:46:20 1998 +

ORA-00202: controlfile: '/db/a/oradata/crash/control01.ctl' +

ORA-27037: unable to obtain file +status +

SVR4 Error: 2: No such file or directory

+Always make backups of all the control +files before copying any of them on top of each other. The next step would +be to copy a known good control file to the damaged control file's location. + +Once that is done, return to Step +1 and try the startup mount again. + +"But I don't have a good control file!" +

It's possible that there may be no +known good control file, which is what would happen if the remaining control +files have different dates and/or sizes. If this is the case, it's probably +best to use the "create controlfile" script. +

+To use the create controlfile script, +proceed to Steps 4 through 7. + +If that's not possible or probable, try +the following procedure. First, make backups of all of the control files. +Then, one at a time, try copying every version of each control file to +all the other locations -- excluding the one that Oracle has already complained +about, since it's obviously damaged. + +Each time a new control file is copied +to multiple locations, return to Step 1. + +For example, assume there are three control +files: /a/control1.ctl, /b/control2.ctl, and /c/control3.ctl. +The alert log says that the /c/control3.ctl is damaged, and since +/a/control1.ctl +and /b/control2.ctl have different modification times, there's no +way to know which one is good. Try the following steps: +

First, make backup copies of all the +files: +

$ cp /a/control1.ctl /a/control1.ctl.sav +

$ cp /b/control2.ctl /b/control2.ctl.sav +

$ cp /c/control3.ctl /c/control3.ctl.sav

+Second, try copying one file to all locations. +Skip control3.ctl, since it's obviously damaged. Try starting with control1.ctl: +$ cp /a/control1.ctl /b/control2.ctl +

$ cp /a/control1.ctl /c/control3.ctl

+Now attempt a startup mount: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > startup mount +

Sat Feb 21 15:43:21 1998 +

alter database mount exclusive +

Sat Feb 21 15:43:22 1998 +

ORA-00202: controlfile: '/a/control3.ctl' +

ORA-27037: unable to obtain file +status

+This error says that the file that was +copied to all locations is also damaged. Now try the second file, control2.ctl: +$ cp /b/control2.ctl /a/control1.ctl +

$ cp /b/control2.ctl /a/control3.ctl

+Now attempt to do a startup mount: +SVRMGR > startup mount; +

ORACLE instance started. +

Total System Global Area 5130648 +bytes +

Fixed Size 44924 bytes +

Variable Size 4151836 bytes +

Database Buffers 409600 bytes +

Redo Buffers 524288 bytes +

Database mounted.

+It appears that control2.ctl was a good +copy of the control file. + +Once the attempt to mount the database +is successful, proceed to Step 10. + +Step 4: Are All +Data Files and Redo Logs OK? + +Steps 4 and 5 are required only prior +to performing Step 6. + +The "create controlfile" script described +in Step 7 works only if all the data files and online redo logs are in +place. The data files can be older versions that were restored from backup, +since they will be rolled forward by the media recovery. However, the online +redo logs must be current and intact for the "create controlfile" script +to work. +

The reason that this is the case is +that the rebuild process looks at each data file as it is rebuilding the +control file. Each data file contains a System Change Number (SCN) that +corresponds to a certain online redo log. If a data file shows that it +has an SCN that is more recent than the online redo logs that are available, +the control file rebuild process will abort. +

+If it's likely that one or more of +the data files or online redo logs is damaged, go to Step 5. If it's more +likely that they are all intact, go to Step 6. + +Step 5: Recover +Damaged Data Files or Redo Logs +

If one or more of the data files or +online redo logs are definitely damaged, follow all the instructions below +to see if there are any other damaged files. (A little extra effort now +will save a lot of frustration later.) If it's possible that all the data +files and online redo logs are okay, another option would be to skip this +step and try to recreate the control file now. (An unsuccessful attempt +at this will not cause any harm.) If it fails, return to this step. If +there is plenty of time, go ahead and perform this step first. +

+To try and recreate the control files +now, proceed to Step 6. + +The first thing to find out is where all +of the data files and redo logs are. To determine this, run the following +command on the mounted, closed database: +SVRMGR > connect internal; +
Connected. +

SVRMGR > select name from v$datafile; +

(Example output below) +

SVRMGR > select group#, member +from v$logfile; +

(Example output below)

+Figure B contains sample output from these +commands: +SVRMGR > select name from v$datafile; +

NAME +

-------------------------------------------------------------------------------- +

/db/Oracle/a/oradata/crash/system01.dbf +

/db/Oracle/a/oradata/crash/rbs01.dbf +

/db/Oracle/a/oradata/crash/temp01.dbf +

/db/Oracle/a/oradata/crash/tools01.dbf +

/db/Oracle/a/oradata/crash/users01.dbf +

/db/Oracle/a/oradata/crash/test01.dbf +

6 rows selected. +

SVRMGR > select group#, member +from v$logfile; +

MEMBER +

-------------------------------------------------------------------------------- +

1 /db/Oracle/a/oradata/crash/redocrash01.log +

3 /db/Oracle/c/oradata/crash/redocrash03.log +

2 /db/Oracle/b/oradata/crash/redocrash02.log +

1 /db/Oracle/b/oradata/crash/redocrash01.log +

2 /db/Oracle/a/oradata/crash/redocrash03.log +

3 /db/Oracle/c/oradata/crash/redocrash02.log +

6 rows selected. +

SVRMGR > +

+
Figure B: Sample v$datafile +and v$logfile output
+
+
+Look at each of the files shown by the above command. First, look at +the data files. Each of the data files probably has the same modification +time, or there might be a group of them with one modification time and +another group with a different modification time. The main thing to look +for is a missing file or a zero length file. Something else to look for +is one or more files that have a modification time that is newer than the +newest online redo log file. If a data file meets any one of these conditions, +it must be restored from backup. +

Redo log files, however, are a little different. Each redo log file +within a log group should have the same modification time. For example, +the output of the example command above shows that /db/Oracle/a/oradata/crash/redocrash01.log +and /db/Oracle/a/oradata/crash/redocrash01.log are in log group one. They +should have the same modification time and size. The same should be true +for groups two and three. There are a couple of possible scenarios: +

One or more log groups has at least one good and one damaged log +

This is why redo logs are mirrored! Copy the good redo log to the +damaged redo log's location. For example, if /db/Oracle/a/oradata/crash/redocrash01.log +was missing, but /db/Oracle/a/oradata/crash/redocrash01.log was intact, +issue the following command: +

$ cp /db/Oracle/a/oradata/crash/redocrash01.log +\ +

/db/Oracle/a/oradata/crash/redocrash01.log

+All redo logs in at least one log group +are damaged +This is a bad place to be. The "create +controlfile" script in Step 6 requires that all online redo logs be present. +If even one log group is completely damaged, it will not be able to rebuild +the control file. This means that the only option available now is to proceed +to Steps 23 and 24 -- a complete recovery of the entire database followed +by an alter database open resetlogs. +Warning! This is a drastic step! +Make sure that all members of at least one log group are missing. +(In the example above, if both /db/Oracle/a/oradata/crash/redocrash01.log +and /db/Oracle/a/oradata/crash/redocrash01.log were damaged, this +database would require a complete recovery.) +

If all the redo logs in at least one +group are damaged, and all the control files are damaged, proceed to Steps +23 and 24. +

If the redo logs are all right, but +all the control files are missing, proceed to Step 6. +

If the database will not open for some +other reason, proceed to Step 10.

+
+Step 6: Is There +a "create controlfile" Script? + +Steps 4 and 5must be taken prior +to this Step. + +The svrmgrl l command alter +database backup control file to trace creates a trace file that contains +a "create controlfile" script. This command should be run from cron on +a regular basis. To find out if there is such a script available, follow +the instructions below. The first thing to find out is the destination +of the trace files. This is specified by the user_dump_dest +value in the configinstance.ora file, usually located in $ORACLE_HOME/dbs. +(Typically, it is $ORACLE_BASE/$ORACLE_SID/admin/udump.) First cd +to that directory, then grep for the phrase CREATE +CONTROLFILE. For example: +$ cd $ORACLE_HOME/dbs; grep +user_dump_dest configcrash.ora +

user_dump_dest = /db/Oracle/admin/crash/udump +

$ cd /db/Oracle/admin/crash/udump +; grep 'CREATE CONTROLFILE' * \ +

|awk -F: '{print $1}'|xargs ls +-ltr +

-rw-r----- 1 Oracle dba 3399 Oct +26 11:25 crash_ora_617.trc +

-rw-r----- 1 Oracle dba 3399 Oct +26 11:25 crash_ora_617.trc +

-rw-r----- 1 Oracle dba 1179 Oct +26 11:29 crash_ora_661.trc +

+
Figure C: Locating the most +recent create controlfile script
+
+
+In the example in Figure C, crash_ora_661.trc is the most recent file +to contain the "create controlfile" script. + +If there is a create controlfile script, proceed to Step 7. If +there is not a create controlfile script, and all the control files are +missing, proceed to Step 8. + +Step 7: Run the +'create controlfile' Script +

First, find the trace file that contains +the script. The instructions on how to do that are in Step 6. Once you +find it, copy it to another filename, such as rebuild.sql. Edit the file, +deleting everything above the phrase # The +following commands will create, and anything +after the last SQL command. The file should then look something like the +one in Figure D: +

# The following commands will create +a new controlfile and use it +

# to open the database. +

# Data used by the recovery manager +will be lost. Additional logs may +

# be required for media recovery +of offline data files. Use this +

# only if the current version of +all online logs are available. +

STARTUP NOMOUNT +

CREATE CONTROLFILE REUSE DATABASE +"CRASH" NORESETLOGS ARCHIVELOG +

MAXLOGFILES 32 +

MAXLOGMEMBERS 2 +

MAXDATAFILES 30 +

MAXINSTANCES 8 +

MAXLOGHISTORY 843 +

LOGFILE +

GROUP 1 '/db/a/oradata/crash/redocrash01.log' +SIZE 500K, +

GROUP 2 '/db/b/oradata/crash/redocrash02.log' +SIZE 500K, +

GROUP 3 '/db/c/oradata/crash/redocrash03.log' +SIZE 500K +

DATAFILE +

'/db/a/oradata/crash/system01.dbf', +

'/db/a/oradata/crash/rbs01.dbf', +

'/db/a/oradata/crash/temp01.dbf', +

'/db/a/oradata/crash/tools01.dbf', +

'/db/a/oradata/crash/users01.dbf' +

; +

# Recovery is required if any of +the data files are restored backups, +

# or if the last shutdown was not +normal or immediate. +

RECOVER DATABASE +

# All logs need archiving and a log +switch is needed. +

ALTER SYSTEM ARCHIVE LOG ALL; +

# Database can now be opened normally. +

ALTER DATABASE OPEN; +

# Files in read only tablespaces +are now named. +

ALTER DATABASE RENAME FILE 'MISSING00006' +

TO '/db/a/oradata/crash/test01.dbf'; +

# Online the files in read only tablespaces. +

ALTER TABLESPACE "TEST" ONLINE; +

+
Figure D: Example create controlfile +script
+
+
+Once the file looks like the above example, add the following line just +above the "STARTUP NOMOUNT" line: +connect internal; +After you add this line, run the following +command on the mounted, closed database, substituting rebuild.sql +with the appropriate name: +$ svrmgrl < rebuild.sql +If all of the data files and online redo +log files are in place, this will work without intervention and completely +rebuild the control files. + +If any of this instance's data files +are missing, return to Step 4. However, if any of this instance's online +redo logs are damaged or missing, this option will not work. Proceed +to Step 8. + +Step 8: Restore +Control Files and Prepare the Database for Recovery + +This Step is required only if Steps +2 through 7 have failed. + +If the precautions mentioned elsewhere +in this chapter were followed, there is really only one scenario that would +result in this position -- loss of the entire system due to a cataclysmic +event. Loss of a disk drive (or even multiple disk drives) is easily handled +if the control files are mirrored. Even if all control files are lost, +they can be rebuilt using the trace file created by running the backup +control file to trace command. The only barrier to using that script +is if all members of an online log group are missing. The only time that +you could lose all mirrored control files and all members of a mirrored +log group would be a complete system failure, such as a fire or other natural +disaster. And if that is the case, then a complete database recovery would +be more appropriate. +

But I didn't mirror my control files +or my online redo logs +

Follow the steps below, starting with +restoring the control files from backup. Chances are that the database +files will need to be restored as well. This is because one cannot use +a control file that is older than the most recent database file. (Oracle +will complain and abort if this happens.) To find out if the control file +is newer than the data files, try the following steps without overwriting +the database files and see what happens. +

Restore control files from backup +

The very first step in this process +is to find and restore the most recent backup of the control file. This +would be the results of a backup control file to filename command. +This is the only supported method of backing up the control file. Some +people (oraback.sh included) also copy the control file manually. +If there is a manual copy of the control file that is more recent than +an "official" copy, try to use it first. However, if it doesn't work, use +a backup copy created by the backup control file to filename command. +Whatever backup control file is used, copy it to all of the locations and +filenames listed in the configORACLE_SID.ora file after the phrase +control_files: +

control_files = (/db/Oracle/a/oradata/crash/control01.ctl, +

/db/Oracle/b/oradata/crash/control02.ctl, +

/db/Oracle/c/oradata/crash/control03.ctl) +

Again, this backup control file must +be more recent than the most recent database file in the instance. If this +isn't the case, Oracle will complain.

+Startup mount +To find out if the control file is +valid and has been copied to all of the correct locations, attempt to start +up the database with the mount option. (This is the same command +from Step 1.) To do this, run the following command on the mounted, closed +database: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > startup mount; +
Statement processed. +

SVRMGR > quit +
  +
 

+
+Take read-only tablespaces offline +Oracle does not allow read-only data +files to be online during a recover database using backup control file +action. Therefore, if there are any read-only data files, take them offline. +To find out if there are any read-only data files, issue the following +command on the mounted, closed database: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select enabled, name +from v$data file; +
Statement processed. +

SVRMGR > quit

+For each read-only data file, issue the +following command on a mounted, closed database: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database data file +'filename' offline; +
Statement processed. +

SVRMGR > quit

+
+Step 9: Recover the +Database + +This step is required only if Steps +2 through 7 have failed. + +Once the control file is restored with +a backup copy, attempt to recover the database using the backup control +file. +

Attempt to recover database normally +

Since recovering the database with +a backup control file requires the alter database open resetlogs +option, it never hurts to try recovering the database normally first: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > recover database;

+If the backup control file option is required, +Oracle will complain: +SVRMGR > recover database +

ORA-00283: Recover session cancelled +due to errors +

... +

ORA-01207: file is more recent than +controlfile - old controlfile +

If the recover database command works, +then proceed to Step 10. If it doesnt, proceed to the next heading, "Attempt +to recover database using backup control file." +
+Attempt to recover database using backup +control file +

Attempt to recover the database using +the following command on the mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > recover database using +backup controlfile

+If it works, the output looks will something +Figure E: +ORA-00279: change 38666 generated +at 03/14/98 21:19:05 needed for thread 1 +

ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_494.dbf +

ORA-00280: change 38666 for thread +1 is in sequence #494 +

+
Figure D: Sample output of +recover +database command
+ +


+
+
+
+
+

If Oracle complains, there are probably some missing or corrupted +data files. If so, return to Steps 4 and 5. Once any missing or corrupted +data files are restored, return to this step and attempt to recover the +database again. +

Sometimes one can get in a catch-22 when recovering databases, where +Oracle is complaining about data files being newer than the control file. +The only way to get around this is to use a backup version of the data +files that is older than the backup version of the control file. Media +recovery will roll forward any changes that this older file is missing.

+
+Apply all archived redo logs +

Oracle will request all archived redo logs since the time of the +oldest restored data file. For example, if the backup that was used to +restore the data files was from three days ago, Oracle will need all archived +redo logs created since then. Also, the first log file that it asks for +is the oldest log file that it wants. +

The most efficient way to roll through the archived redo logs is +to have all of them sitting uncompressed in the directory that it suggests +as the location of the first file. If this is the case, simply enter auto +at the prompt. Otherwise, specify alternate locations or hit enter as it +asks for each one, giving time to compress or remove the files that it +no longer needs. +

Apply online redo logs if they are available +

If it is able to do so, Oracle will automatically roll through all +the archived redo logs and the online redo log. Then it says, "Media +recovery complete." +

However, once Oracle rolls through +all the archived redo logs, it may prompt for the online redo log. It does +this by prompting for an archived redo log with a number that is higher +than the most recent archived redo log available. This means that it is +looking for the online redo log. Try answering its prompt with the names +of the online redo log files that you have. Unfortunately, as soon as you +give it a name it doesn't like, it will make you start the recover database +using backup controlfile command again. +

For example, suppose that you have +the following three online redo logs: +

/oracle/data/redolog01.dbf +

/oracle/data/redolog02.dbf +

/oracle/data/redolog03.dbf

+When you are prompting for an archived +redo log that has a higher number than the highest numbered archived redo +log that you have, answer the prompt with one of these files (e.g., /oracle/data/redolog01.dbf). +If the file that you give it does not contain the recovery thread it is +looking for, you will see a message like the following: +ORA-00310: archived log contains +sequence 2; sequence 3 required +

ORA-00334: archive log: '/oracle/data/redolog01.dbf'

+Oracle will cancel the recovery database, +requiring you to start it over. Once you get to the same prompt again, +respond with a different filename, such as /oracle/data/redolog02.dbf. +If it contains the recovery thread it is looking for, it will respond with +a message like the following: +Log applied. +

Media recovery complete.

+If after trying all the online redo logs +it is still asking for a log that you do not have, simply enter cancel. +

Alter database open resetlogs +

Once the media recovery is complete, +the next step is to open the database. As mentioned earlier, when recovering +the database using a backup control file, it must be opened with the resetlogs +option. Do this by entering: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database open resetlogs; +

SVRMGR > quit

+Take a backup immediately after recovering +the database with the resetlogs option! It is best if it is a cold backup +after shutting down the database. Perform a hot backup if absolutely necessary, +but realize that there is a risk that: + + + +If the database did not open successfully, +return to Step 1 and start over. +

If the database did open successfully, +perform a backup of the entire database immediately -- preferably a cold +one. Congratulations! You're done!

+
+Step 10: Does +"alter database open" work? +

If the startup mount worked, +this is actually only the second step that you will perform. Mounting the +database only checks the presence and consistency of the control files. +If that works, opening the database is the next step. Doing so will check +the presence and consistency of all data files, online redo log files, +and any rollback segments. To open the database, run the following command +on the mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database open; +

SVRMGR > quit

+If the attempt to open the database worked, +Oracle will simply say, "Statement processed." +If this is the first attempt to open the database, and no data files or +rollback segments were taken offline, You're done! + +If directed to this step by Steps +26 or 28 (damaged log groups), and the attempt at opening the database +failed, return to Step 23 to recover the entire database. +

If the database did open, proceed to +Step 15. +
  +
 

+
+If the attempt to open the database did +not +work, the output will vary depending on the condition. Here is a listing +of what those conditions may be, accompanied by what the error might look +like when that condition occurs. +

Missing data file +

+ORA-01157: cannot identify data +file 1 - file not found +

ORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'

+
+Corrupted data file +A corrupted data file can generate +a number of different errors. For instance, it may mimic a missing data +file: +ORA-01157: cannot identify data +file 1 - file not found +

ORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'

+It may also completely confuse Oracle: +ORA-00600: internal error code, +arguments: [kfhcfh1_01], [0], [], [], [], [], [], [] +A corrupted data file may also cause a +"failed verification check" error: +ORA-01122: database file 1 failed +verification check +

ORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf' +

ORA-01200: actual file size of 1279 +is smaller than correct size of 40960 blocks

+These are just a few examples of the types +of errors that Oracle may give if a data file is corrupted.
+Missing member of any online log group +If the redo logs are mirrored, one +or more of the mirrored copies are lost, but at least one good copy of +each online redo log remains, Oracle will open the database without any +errors displayed to the terminal. The only error will be a message like +the following one in the alert log: +Errors in file /db/Oracle/admin/crash/bdump/crash_lgwr_10302.trc: +

ORA-00313: open failed for members +of log group 2 of thread 1

+
+All members of any online log group are +corrupted +However, if all members of any online +log group are corrupted, Oracle will complain and the database will +not open. The error might look something like this: +ORA-00327: log 2 of thread 1, physical +size less than needed +

ORA-00312: online log 2 thread 1: +'/db/Oracle/b/oradata/crash/redocrash02.log' +

ORA-00312: online log 2 thread 1: +'/db/Oracle/a/oradata/crash/redocrash03.log'

+
+Missing all members of any online log +group +A similar problem is if all members +of an online log group are missing. Oracle will complain and the database +will not open. The error looks something like this: +ORA-00313: open failed for members +of log group 2 of thread 1 +

ORA-00312: online log 2 thread 1: +'/db/Oracle/b/oradata/crash/redocrash02.log' +

ORA-00312: online log 2 thread 1: +'/db/Oracle/a/oradata/crash/redocrash03.log'

+
+Damaged rollback segment +If a rollback segment is damaged, +the error will be like the following one: +ORA-01545: rollback segment 'USERS_RS' +specified not available +

Cannot open database if all rollback +segments are not available.

+
+Damaged datafile +

A damaged data file is actually very easy to recover from. This is +a good thing, because this will occur more often than any other problem. +Remember that there is only one copy of each data file, unlike online redo +logs and control files that can be mirrored. So, statistically speaking, +it's easier to lose one data file than to lose all mirrored copies of a +log group or all mirrored copies of the control file. +

Oracle also has the ability to recover parts of the database while +other parts of the database are brought online. Unfortunately, this helps +only if a partially functioning database is of any use to the users in +your environment. Therefore, a database that is completely worthless unless +all tables are available will not benefit from the partial online restore +feature. However, if the users can use one part of the database while the +damaged files are being recovered, this feature may help to save face by +allowing at least partial functionality during an outage. +

There are three types of data files as far as recovery is concerned: +

+Damaged log group +

If all members of a log group are damaged, +there is a great potential for data loss. The entire database may have +to be restored, depending on the status of the log group that was damaged, +and the results of some attempts at fixing it. This may seem like a broken +record, but this is why mirroring the log groups is so important. +

+If the error refers to a damaged +log group, one option is to proceed directly to Step 17. However, to verify +that nothing else is wrong, read the following notes and proceed to the +next step. + +Damaged rollback segment +

Since Oracle has to open the data files +that contain this rollback segment before it can verify that the rollback +segment is available, this error will not occur unless a data file has +been taken offline. If Oracle encounters a damaged data file (whether or +not it contains a rollback segment), it will complain about that data file +and abort the attempt to open the database. +

Remember that a rollback segment +is +a special part of a tablespace that stores rollback information. +Rollback information is needed in order to undo (or rollback) an uncommitted +transaction. Since a crashed database will almost always contain uncommitted +transactions, recovering a database with a damaged rollback segment is +a little tricky. As previously mentioned, a damaged data file may be taken +offline, but Oracle will not open the database without the rollback segment. +

The strategy for dealing with this +is to make Oracle believe that the rollback segment doesn't exist. That +will allow the database to be brought online. However, there will be transactions +that need to be rolled back that require this rollback segment. Since Oracle +believes this rollback segment is no longer available, these rollbacks +cannot occur. This means that the database may be online, but portions +of it will not be available. +

For example, suppose that we created +a table called data1 +inside tablespace USERS. +Tablespace USERS +contains the data file /db/oracle/a/oradata/crash/users01.dbf. Unfortunately, +the database crashed before this transaction was committed, and the data +file that contains the rollback segment for this transaction was destroyed. +In the process of recovering this database, we took that data file offline, +convinced Oracle that the rollback segment it contained was not needed, +and opened the database. If we run the command select * from data1, +we will receive the error shown in Figure F: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select * from data1; +

C1 +

------------ +

ORA-00376: file 7 cannot be read +at this time +

ORA-01110: datafile 7: '/db/oracle/a/oradata/crash/users01.dbf' +

+
Figure F: Sample data file +error
+
+
+This is because Oracle does not know if the uncommitted transactions +in /db/oracle/a/oradata/crash/users01.dbf have been rolled back or not. +In order to make this database fully functional, the damaged data file +must be recovered and the rollback segment brought online. +

Be aware, therefore, that if you bring a database online without +all of its rollback segments, the database may be online -- but it probably +will not be fully functional. +

+If the error indicates that there is a damaged rollback segment, +proceed to Step 18. + +Before going any farther +

Remember that Oracle will stop attempting to open the database as +soon as it encounters an error with one file. This means, of course, that +there could be other files that are damaged. If there is at least one damaged +data file, now is a good time to check and see if there are other files +that are damaged. +

+Detailed instructions on how to do that are provided in Step 5. + +Once you know the names of all the damaged files, proceed to the next +section. +

How media recovery works +

If any data files are restored from backup, the svrmgr recover command +will be needed. This command uses the archived and online redo logs to +"redo" any transactions that have occurred since the time that the backup +of a data file was taken. You can recover a complete database, a tablespace, +or a data file by issuing the commands recover database, recover tablespace +tablespace_name and recover data file data file_name, respectively. These +commands are issued inside a svrmgr shell. For example: +

$ svrmgrl +

SVRMGR > connect internal +

SVRMGR > startup mount +

SVRMGR > recover datafile '/db/Oracle/a/oradata/crash/datafile01.dbf'

+These commands allow the restore of an +older version of a data file, and use redo to roll it forward to the point +of failure. For example, if we took a backup of a data file on Wednesday +night, and that data file was damaged on Thursday evening, we would restore +that data file from Wednesday night's backup. Of course many transactions +would have occurred since Wednesday night, making changes to the data files +that we restored. Running the command recover [database|tablespace|data +file] would reapply those transactions to the restored data +file, rolling them forward to Thursday evening. +

This recovery can work in a number +of ways. After receiving the recover command, Oracle prompts for +the name and location of the first archived redo log that it needs. If +that log, and all logs that have been made sinse that log, are online, +uncompressed, and in their original location, enter the word AUTO. +This tells Oracle to assume that all files that it needs are online. It +can therefore automatically roll through each log that it needs. +

In order to do this, all files that +Oracle will need must be online. First, get the name of the oldest file, +since that it is the first file it will need. That file name is displayed +immediately after issuing the recover command: +

ORA-00279: change 18499 generated +at 02/21/98 11:49:56 needed for thread 1 +

ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf +

ORA-00280: change 18499 for thread +1 is in sequence #481 +

Specify log: {<RET>=suggested +| filename | AUTO | CANCEL}

+In the example above, the first file that +Oracle needs is /db/Oracle/admin/crash/arch/arch.log1_481.dbf. Make +sure that this file is online and not compressed or deleted. If it is deleted, +restore it from backup. If it is compressed, uncompress it and any archived +redo log files in that directory that are newer than it. That is because +Oracle may need all of them to complete the media recovery. It might be +necessary to delete some of the older archived redo logs to make enough +room for the files that need to be uncompressed. Once all archived redo +logs that are newer than the one requested by Oracle have been restored +and uncompressed, enter AUTO at the "Specify +log" prompt. +

If there isn't enough space for all +of the archived redo logs to be uncompressed, a little creativity may be +required. Uncompress as many as possible, and then hit enter each time +it suggests the next file. (Hitting enter tells Oracle that the file that +it is suggesting is available. If it finds that it is not available, +it prompts for the same file again.) Once it has finished with one archive +log, compress that log, and uncompress a newer log, since it will be needed +it shortly. (Obviously, a second window is required, and a third window +wouldn't hurt!) +

At some point, it may ask for an archived +redo log that is not available. This could mean some of the archived redo +logs or online redo logs are damaged. If the file cannot be located or +restored, enter CANCEL. +

More detail on media recovery is available +in Oracle's documentation. +

+If the database did not open, proceed +to Step 11 after reading the preceding notes. If it did open, proceed to +Step 15. + +Step 11: Damaged +System File? +

If the damaged file is part of the +SYSTEM +tablespace, an offline recovery is required. All other missing data files +can be recovered with the database online. Unfortunately, Oracle only complains +that the data file is missing -- without saying what kind of data +file it is. Fortunately, even if Oracle is down, there is an easy way to +determine which files belong to the SYSTEM +tablespace. +(Finding out if the data file contains a rollback segment is a little more +difficult, but it is still possible.) To find out which data files are +in the SYSTEM tablespace, +run the following command on the mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select name from v$datafile +where status = 'SYSTEM' ; +

NAME +

-------------------------------------------------------------------------------- +

/db/oracle/a/oradata/crash/system01.dbf +

1 row selected.

+This example report shows that the only +file that is a member of the SYSTEM +tablespace is /db/Oracle/a/oradata/crash/system01.dbf. In your configuration, +however, there may be multiple data files in the SYSTEM +tablespace. + +If any of the damaged data files +is a member of the SYSTEM tablespace, +proceed to Step 12. If none of them is a member of the SYSTEM +tablespace, +then proceed to Step 13. + +Step 12: Restore +All Data Files in the SYSTEM Tablespace +

Unlike other tablespaces, the SYSTEM +tablespace must be available in order to open the database. Therefore, +if any members of the system tablespace are damaged, they must be restored +now. Before doing this, make sure that the database is not open. (It is +okay if it is mounted.) To make sure, run the following command on the +mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select status from v$instance; +

STATUS +

------- +

MOUNTED +

1 row selected.

+(The example above shows that this instance +is mounted, not open.) +

If the database is not open, restore +the damaged files from the most recent backup available. Once all damaged +files in the system tablespace are restored, run the following command +on the mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +

Connected. +

SVRMGR > recover tablespace system; +

SVRMGR > media recovery complete

+Once this command has completed, the system +tablespace will be recovered to the time of failure. + +If it does complete successfully, +and no other data files are damaged, return to Step 10. For more information +about the recover tablespace command, read the earlier section "How +Media Recovery works" at the end of Step 10. If there are other data files +to recover, proceed to Step 13. + +Step 13: Damaged +Non-System Data File? +

So far, we have mounted the database, +which proves that the control files are okay. It may have taken some effort +if one or more of the control files were damaged, but it succeeded. We +have also verified that the SYSTEM +tablespace is intact, even if it required a restore and recovery. Most +of the rest of this procedure concentrates on disabling damaged parts of +the database so that it may be brought online as soon as possible. The +process of elimination will identify all damaged data files once the database +is opened successfully. They can then be easily restored. +

+If there are damaged data files that +are not part of the SYSTEM tablespace, +proceed to Step 14. If there are no more damaged data files, then proceed +to Step 17. + +Step 14: Take +Damaged Data File Offline +

To open a database with a damaged, +non-system data file, take the data file offline. (If the file that is +taken offline is part of a tablespace that contains rollback segments, +there will be one other step, but we'll cross that bridge when we come +to it.) +

If this instance is operating in ARCHIVELOG +mode, just take the data file offline. It can later be restored and recovered +after the instance has been brought online. The command to do this is: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database datafile +'filename' offline;

+If the instance is operating in NOARCHIVELOG +mode, that's a different problem. Oracle does not allow the data file to +be taken offline, because it knows it can't be brought back online without +media recovery. Without ARCHIVELOG +mode, there is no media recovery. The only thing Oracle does allow is to +drop the data file entirely. This means, of course, that the tablespace +that contains this file will have to be rebuilt from scratch. This is but +one of the many reasons why a production instance should not be operating +in no archive log mode. The command to do this is: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database datafile +'filename' offline drop; +

Once any damaged files are taken +offline, return to Step 10 and attempt to open the database again. +
+Step 15: Were +Any Data Files Taken Offline? +
  +
  + +Perform this step only if the database +has been opened. + +This step is really a very simple question! + +If the database was opened without +taking any data files offline, proceed to Step 29. If some data files were +taken offline to open the database, proceed to Step 16. If unsure, proceed +to Step 16. + +Step 16: Bring +Data File(s) Back Online +

First find out which data files were +taken offline. To do this, run the following command: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select name from v$datafile +where status = 'OFFLINE' ; +

NAME +

------------------ +

/db/oracle/a/oradata/crash/temp01.dbf +

/db/oracle/a/oradata/crash/tools01.dbf +

/db/oracle/a/oradata/crash/users01.dbf

+Restore the damaged datafiles +

Once the names of the data files that need to be restored are determined, +restore them from the latest available backup. Once they are restored, +recovery within Oracle can be accomplished in three different ways. These +ways vary greatly in complexity and flexibility. Examine the following +three media recovery methods and choose whichever one is best for you. +

Datafile recovery +

If there is a small number of data files to recover, this may be +the easiest option. As each file is restored, issue the recover data +file command against it and then bring it online: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > recover datafile 'datafile_name' +; +

Statement processed. +

SVRMGR > alter database datafile +'datafile_name' online ; +
Statement processed.

+The downside to this method is that media +recovery may take a while for each data file. If recovering multiple data +files within a single tablespace, this is probably wasting time. +

Tablespace recovery +

This method is the hardest of all methods, +but it may work faster than the previous method if there are several damaged +data files within a tablespace. If forced to leave the partially functional +database open while recovering the damaged data files, and there are several +of them to recover, this is probably the best option. +

First find out the names of all data +files, and the tablespace to which they belong. Since the database is now +open, this can be done in one step, demonstrated in Figure G: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select file_name, tablespace_name +from dba_data_files; +
Statement processed. +

FILE_NAME +

TABLESPACE_NAME +

-------------------------------------------------------------------------------- +

------------------------------ +

/db/oracle/a/oradata/crash/users01.dbf +

USERS +

/db/oracle/a/oradata/crash/tools01.dbf +

TOOLS +

/db/oracle/a/oradata/crash/temp01.dbf +

TEMP +

/db/oracle/a/oradata/crash/rbs01.dbf +

RBS +

/db/oracle/a/oradata/crash/system01.dbf +

SYSTEM +

/db/oracle/a/oradata/crash/test01.dbf +

TEST +

+
Figure G: Listing of dba_data_files
+
+
+The only problem with this output is that it's not very easy to read, +and could be impossible to read if there are hundreds of data files. One +way to make it easier to read is to modify the command, as shown in Figure +H: +$ svrmgrl <<EOF |sed 's/ +*/ /' |sort >/tmp/files.txt +

connect internal; +

select file_name, tablespace_name +from dba_data_files; +

quit; +

EOF +

$ grep '^/' /tmp/files.txt +

/db/oracle/a/oradata/crash/rbs01.dbf +RBS +

/db/oracle/a/oradata/crash/system01.dbf +SYSTEM +

/db/oracle/a/oradata/crash/temp01.dbf +TEMP +

/db/oracle/a/oradata/crash/test01.dbf +TEST +

/db/oracle/a/oradata/crash/tools01.dbf +TOOLS +

/db/oracle/a/oradata/crash/users01.dbf +USERS +

+
Figure H: Readable listing +of data files
+
+
+This way, the files are sorted in alphanumeric order, making it easy +to find the necessary file(s). +

Once all of the data files are restored, and the names of all the +tablespaces that contain these data files have been determined, issue the +recover tablespace command against each of those tablespaces. Before doing +so, however, each of those tablespaces must be taken offline, as shown +in Figure I. +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter tablespace tablespace_name1 +offline; +

Statement processed. +

SVRMGR > recover tablespace tablespace_name1 +; +

ORA-00279: change 18499 generated +at 02/21/98 11:49:56 needed for thread 1 +

ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf +

ORA-00280: change 18499 for thread +1 is in sequence #481 +

Specify log: {<RET>=suggested +| filename | AUTO | CANCEL} +

Auto +

Log applied +

Media Recovery Complete +

SVRMGR > alter tablespace tablespace_name1 +online; +

Statement processed. +

SVRMGR > alter tablespace tablespace_name2 +offline; +

Statement processed. +

SVRMGR > recover tablespace tablespace_name2 +; +

ORA-00279: change 18499 generated +at 02/21/98 11:49:56 needed for thread 1 +

ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf +

ORA-00280: change 18499 for thread +1 is in sequence #481 +

Specify log: {<RET>=suggested +| filename | AUTO | CANCEL} +

Auto +

Log applied +

Media Recovery Complete +

SVRMGR > alter tablespace tablespace_name2 +online; +

Statement processed. +

+
Figure I: Tablespace-based +recovery
+
+
+It's obvious that this method is quite involved! It's not pretty, it's +not easy, but it allows recovery of multiple tablespaces while the instance +continues to operate. If a partially functioning database is of any value +to the users, this method may be their best friend. +

Database recovery +

This method is actually the easiest method, but it requires that +the database be shut down to perform it. After restoring all the database +files that were taken offline, close the database and issue the recover +database command. +

Once all the database files are restored, issued commands shown in +Figure J. +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database close +; +

Statement processed. +

SVRMGR > recover database ; +

ORA-00279: change 18499 generated +at 02/21/98 11:49:56 needed for thread 1 +

ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf +

ORA-00280: change 18499 for thread +1 is in sequence #481 +

Specify log: {<RET>=suggested +| filename | AUTO | CANCEL} +

Auto +

Log applied +

Media Recovery Complete +

SVRMGR > alter database open +

Statement processed. +

+
Figure J: Normal database recovery
+
+
+To make sure that all tablespaces and data files have been returned +to their proper status, run the commands shown in Figure K. +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select name, status from +v$datafile +

NAME +

STATUS +

-------------------------------------------------------------------------------- +

------- +

/db/oracle/a/oradata/crash/system01.dbf +

SYSTEM +

/db/oracle/a/oradata/crash/rbs01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/temp01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/tools01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/users01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/test01.dbf +

ONLINE +

6 rows selected. +

SVRMGR > select member, status +from v$logfile +

NAME +

STATUS +

-------------------------------------------------------------------------------- +

------- +

/db/oracle/a/oradata/crash/system01.dbf +

SYSTEM +

/db/oracle/a/oradata/crash/rbs01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/temp01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/tools01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/users01.dbf +

ONLINE +

/db/oracle/a/oradata/crash/test01.dbf +

ONLINE +

6 rows selected. +

SVRMGR> select * from v$controlfile; +

STATUS NAME +

------- ------------------------------------------------------------------------ +

-------- +

/db/oracle/a/oradata/crash/control01.ctl +

/db/oracle/b/oradata/crash/control02.ctl +

/db/oracle/c/oradata/crash/control03.ctl +

3 rows selected. +

+
Figure K: Obtaining the names +of all data files, control files, and log files
+
+
+The example above shows that all data files, control files, and log +files are in good condition. (In the case of the log files and control +files, no status is good status.) + +Once any data files that were taken offline have been restored +and recovered, proceed to Step 29. + +Step 17: Is There +a Damaged Log Group? +

When we refer to a damaged log group, +we mean that all members of a log group are damaged. If at least one member +of a mirrored log group is intact, Oracle opens the database and simply +put an error message in the alert log. However, if all members of a log +group are damaged, the database will not open, and the error will look +something like this: +

ORA-00313: open failed for members +of log group 2 of thread 1 +

ORA-00312: online log 2 thread 1: +'/db/Oracle/b/oradata/crash/redocrash02.log' +

ORA-00312: online log 2 thread 1: +'/db/Oracle/a/oradata/crash/redocrash03.log' +

If there is no error like this, there +is no damaged log group. Proceed to Step 18. +
+The first thing that must be determined +is the status of the damaged log group. The three possibilities are current, +active, and inactive. To determine the status of the damaged log group, +run the following command on the mounted, closed database: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select group#, status +from v$log;

+The output looks something like this: +GROUP# STATUS +

---------- ---------------- +

1 INACTIVE +

2 CURRENT +

3 ACTIVE +

3 rows selected.

+The example above shows that log group +1 is inactive, group 2 is current, and group 3 is active. What follows +is an explanation of the different statuses, and how they affect the recovery. +

Current +

The current log group is the one +to which Oracle was writing when the failure occurred. It will still be +listed as active until the server is brought online and a log switch occurs. +Active +An active log group is usually the +log group that Oracle just finished writing to. However, until a checkpoint +occurs, this group is still needed for media recovery. Since a log switch +always forces in checkpoint, a status of active is actually very rare. +In fact, the only way to see this (before the system crashed) is to run +the above command while a checkpoint is in progress. (In a properly tuned +database, this is a very short period of time.) +Inactive +An inactive log group is one that +is not being used by Oracle in any way. +To determine what action to take next, +first gave the number of the log group whose log files are damaged. In +the example error above, it reads open failed +for members of log group 2. +Reference this number against the log groups listed by the select * +from v$log command. In the example above, log group 2 was current at +the time the database crashed. + +If the damaged log group was current, +proceed to Step 22. If it was active, proceed to Step 25. If it was inactive, +proceed to Step 27. + +Step 18: Are Any +Rollback Segments Unavailable? +

If a rollback segment is damaged, Oracle +will complain when attempting to open the database. The error looks like +the following: +

ORA-01545: rollback segment 'USERS_RS' +specified not available +

Cannot open database if all rollback +segments are not available. +

If you haven't already read the note +about damaged rollback segments in Step 10, do so now. +

If the preceding error is displayed +when attempting to open the database, proceed to Step 19. If not, return +to Step 10.

+
+Step 19: Does +the Database Need to be at Least Partially Up ASAP? +

Because of the unique nature of damaged +rollback segments, there are two choices for recovery. The first is to +get the database open sooner, but that may leave it only partially functional +for a longer period of time. The second choice takes a little longer to +open the database, but once it is open it will not have data files that +are needed for this rollback segment. Which is more important: getting +even a partially functional database open as soon as possible, or not opening +the database until all rollback segments are available? The latter is more +prudent, but the former may be more appropriate to the environment. +

+If the database needs to be partially +open ASAP, proceed to Step 21. If it's more important to make sure all +rollback segments are available prior to opening the database, proceed +to Step 20. + +Step 20: Recover +Tablespace Containing Unavailable Rollback Segment + +Perform this step only if directed +to do so by Step 19. + +The first thing that must be determined +is which tablespace the damaged rollback segment is in. Unfortunately, +there is no fixed view that contains this information. That means that +it will have to be discovered through common sense and deduction. First, +remember that this error is not displayed unless a data file has been taken +offline. To get a complete list of files that were taken offline, run the +following command on a mounted, closed database: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select TS#, name from +v$datafile where status = 'OFFLINE' ; +

NAME +

-------------------------------------------------------------------------------- +

5 /db/oracle/a/oradata/crash/test01.dbf +

1 row selected.

+Then find out the name of the tablespace +that contains this data file: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select name from v$tablespace +where TS# = '5' ; +

NAME +

-------------------------------------------------------------------------------- +

TEST +

1 row selected.

+That was too easy! +

Admittedly, the previous example was easy. There was only one data +file that was offline, which made finding its tablespace pretty easy. What +if there were multiple data files that were contained within multiple tablespaces? +How do we know which one contains the rollback segment? Unfortunately, +there is no way to be sure while the database is closed. That is why it +is very helpful to put the rollback segments in dedicated tablespaces that +have names that easily identify them as such. It's even more helpful if +the data files are named something helpful as well. For example, create +a separate tablespace called ROLLBACK_DATA, +and call its data files rollback01.dbf, rollback02.dbf, etc. +That way, anyone that finds himself in this scenario will know exactly +which data files contain rollback data. +

The rest of this step is simple. Restore +any files that were taken offline, and use either the recover data file +or recover tablespace commands to roll them forward in time. If +there are only one or two damaged data files, it's probably quicker to +use the recover data file command. If there are several damaged +data files, especially if they are all in one tablespace, the recover +tablespace command is probably easiest. Either way will work. +

+Once any data files that contain +rollback segments have been restored and recovered, return to Step 10 and +attempt to open the database again. + +Step 21: Comment +Out Rollback Segment Line(s) in the init.ora File + +Perform this step only if directed +to do so by Step 19. + +There is a quicker way to open the database +with damaged rollback segments. In order for Oracle to know what rollback +segments to look for, the following line is inserted into the initORACLE_SID.ora +file: +rollback_segments = (r01,r02,r03,r04,users_rs) +(The initORACLE_SID.ora file is +usually found in $ORACLE_HOME/dbs.) Since the example error above +says that it is the USERS_RS +rollback segment that is unavailable, simply delete that part of the line. +It is wise, of course, to comment out and copy the original line. First, +shut down Oracle completely (this includes un-mounting it as well). Then +copy and comment the rollback segment line in the initORACLE_SID.ora +file: +#rollback_segments = (r01,r02,r03,r04,users_rs) +

rollback_segments = (r01,r02,r03,r04) +

Once this change has been made in +the initORACLE_SID.ora file, return to Step 1 to mount the database. +
+Step 22: Is the +Current Online Log Damaged? + +Performing this step only if instructed +to do so by Step 17. If not, return to Step 17 now. + +If the current online log group is damaged, +there would be a message like the following when attempting to open the +database: +ORA-00313: open failed for members +of log group 2 of thread 1 +

ORA-00312: online log 2 thread 1: +'/db/Oracle/b/oradata/crash/redocrash02.log' +

ORA-00312: online log 2 thread 1: +'/db/Oracle/a/oradata/crash/redocrash03.log'

+In the example above, a select group#, +status from v$log command would have also showed that log group 2 was +CURRENT +at the time of failure. +

This is the worst kind of failure to +have because there will definitely be data loss. That is because the current +online log is required to restart even a fully functioning database. The +current control file knows about the current online log and will attempt +to use it. The only way around that is to restore an older version of the +control file. Unfortunately, you can't restore only the control file because +the data files would then be more recent than the control file. The only +remaining option is to restore the entire database. +

+For the procedure to restore the +entire database, proceed to Step 23. + +Step 23: Recover +All Database Files from Backup + +Warning! There are only two reasons +to perform this step. The first is if instructed to do so by Step 22. The +other is if there was an unsuccessfull attempt to open the database after +performing either Steps 26 or 28. This step is the most drastic method +of recovery, and should not be performed unless absolutely necessary. + +Perform this step only after verifying +(or rebuilding or restoring) the control files, and verifying that all +members of the current online log group are damaged. This step is relatively +easy. Simply determine the names and locations of all of the data files +and restore them from their latest backup. + +Warning! Restore only the data files, +not the control files. Do not restore or overwrite the control files unless +instructed to do so by Step 9! + +To determine the names of all the data +files, run the following command on the mounted, closed database: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select name from v$datafile +; +

Once all data files are restored, +proceed to Step 24. +
+Step 24: Alter +Database Open reset logs + +Warning! Perform this step only if +instructed to do so by Step 23. This is another drastic step that should +only be performed if necessary! + +This command causes Oracle to open the +database after clearing all contents of the online redo log files. Since +there is no way to undo this step, it is a good idea to make copies of +the online redo log files now. To find out all their names, run the following +command on a mounted, closed database: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select member from v$logfile +;

+To create an "undo" option, copy each +of these files to filename.bak. +

After making a backup of the online +redo log files, run the following command on a mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database open resetlogs +; +
Statement processed.

+If the database opens, congratulations! + +Make a backup of this database immediately, +preferably with the database shut down. That is because Oracle cannot roll +through this point in time using the redo logs. Oracle must have +a full backup taken after using the open resetlogs command in order +to restore this database using any logs that are made after the open +resetlogs was performed. + +Once that backup is completed, you're +done! +

Step 25: Is +an Active Online Redo Log Damaged? +

+Perform this step only if instructed +to do so by Step 17. If not, return to Step 17 now. + +If an ACTIVE +online log group is damaged, there will be a message like the following +when attempting to open the database: +ORA-00313: open failed for members +of log group 2 of thread 1 +

ORA-00312: online log 2 thread 1: +'/db/Oracle/b/oradata/crash/redocrash02.log' +

ORA-00312: online log 2 thread 1: +'/db/Oracle/a/oradata/crash/redocrash03.log'

+In the example above, a select group#, +status from v$log command would have also shown that log group 2 was +ACTIVE +at the time of failure. +

Remember that an ACTIVE +log is one that is still needed for recovery. The reason that it is still +needed is because a checkpoint has not flushed all changes from shared +memory to disk. Once that happens, this log will no longer be needed. +

+To perform a checkpoint, proceed +to Step 26. + +Step 26: Perform +a Checkpoint +

The way to attempt to recover from +the scenario in Step 25 is to perform a checkpoint. If it is successful, +the database should open successfully. To perform a checkpoint, issue the +following command on the mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter system checkpoint +local ; +
Statement processed.

+Be patient. The reason that there is an +ACTIVE +log group is that the checkpoint took a long time in the first place. Wait +for Oracle to say that the checkpoint succeeded or failed. If it succeeded, +Oracle will simply say, "Statement processed." +If it fails, there could be any number of Oracle errors. + +After issuing the checkpoint, even +if it was unsuccessful, return to Step 10 and attempt to open the database. +If this attempt fails, return to Step 23 and recover the entire database. + +Step 27: Is an +Inactive Online Redo Log Damaged? + +Perform this step only if instructed +to do so by Step 17. If not, return to Step 17 now. + +If an INACTIVE +online log group is damaged, there would be a message like the following +when attempting to open the database: +ORA-00313: open failed for members +of log group 2 of thread 1 +

ORA-00312: online log 2 thread 1: +'/db/Oracle/b/oradata/crash/redocrash02.log' +

ORA-00312: online log 2 thread 1: +'/db/Oracle/a/oradata/crash/redocrash03.log'

+In the example above, a select group#, +status from v$log command would have also shown that log group 2 was +INACTIVE +at the time of failure. +

In comparison, this one should be a +breeze. An INACTIVE +log is not needed by Oracle. If it is not needed, simply drop it and add +another in its place. +

+To drop and add an INACTIVE +log group, proceed to Step 28. + +Step 28: Drop/Add +a Damaged, INACTIVE Log Group + +Perform this step only if instructed +to do so by Step 27. + +In all the above examples, the damaged +log group was group 2. Before we drop that group, we should make sure that +we can add it back easily. Ensure that all the original redo log locations +are still valid. To do this, get the names of all of the members of that +log group: +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > select member from v$logfile +where GROUP# = '2 ;

+For this example, Oracle returned the +values: +/logs1redolog01.dbf +

/logs2redolog01.dbf +

/logs3redolog01.dbf

+Verify that all these files' locations +are still valid. For this example, assume /logs3 is completely destroyed, +and we are relocating all its contents to /logs4. Therefore, the +future members of log group 2 will be /logs1redolog01.dbf, /logs2redolog01.db, +and /logs4redolog01.dbf. +

To drop log group 2, issue the following +command on a mounted, closed database: +

$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database drop logfile +group 2 ; +
Statement processed.

+Once that command completes successfully, +add the log group back to the database. To do this, issue the following +command (Remember that we have replaced /logs3redolog01.dbf with +/logs4redolog01.dbf.): +$ svrmgrl +

SVRMGR > connect internal; +
Connected. +

SVRMGR > alter database add logfile +group 2 ('/logs1redolog01.dbf', '/logs2redolog01.dbf', +'/logs4redolog01.dbf') size 500K ; +
Statement processed. +

Once this command completes successfully, +return to Step 10 and attempt to open the database. +
+Step 29: Were +Any Rollback Segment Lines Changed in init.ora? +

There was an option in Step 19 to comment +out rollback segments from the initORACLE_SID.ora file. If that +option was taken, there should be a line in that file that looks like the +following: +

#rollback_segments = (r01,r02,r03,r04,users_rs) +

rollback_segments = (r01,r02,r03,r04) +

If any rollback segments were taken +offline, proceed to Step 30. If there were not, back up the database now. +You're done! +
+Step 30: Return +Offline Rollback Segments to Normal Condition. +
  +
  +

To check which rollback segments are +offline, run the following command: +

SVRMGR> select segment_name +from dba_rollback_segs where status = 'OFFLINE' ; +

SEGMENT_NAME +

------------------------------ +

USERS_RS +

1 rows selected.

+Since all data files and redo log files +should be recovered by now, just return the offline rollback segments to +an online status: +$ svrmgrl +

SVRMGR > connect internal; +

Connected. +

SVRMGR > alter rollback segment +users_rs +online +; +

Statement processed. +
  +
 

+Once this has been completed, make sure +that any commented lines in the initORACLE_SID.ora file are put +back to their original condition. The example in Step 29 used the suggested +method of commenting out the original line, and changing a copy of it. +Return the line in initORACLE_SID.ora to its original condition: +rollback_segments = (r01,r02,r03,r04,users_rs) +This step ensures that the next time this +database is opened, the USERS_RS +rollback segment will be used. +

You're done! +

If you've made it this far, you're +done! All data files, control files, and log files should be online. Take +a backup of the entire database immediately, preferably a cold one with +the database down. If that can't be done, then perform a hot backup. + + + diff --git b/read-tape.sh a/read-tape.sh new file mode 100755 index 0000000..a9ba650 --- /dev/null +++ a/read-tape.sh @@ -0,0 +1,66 @@ +#!/sbin/sh + +touch rawfile +# The rawfile might already be there, but just in case +while [ 1 -eq 1 ] # Whatever, just +do it + +do + +size=`ls -l rawfile | awk '{print $5}'` # Speaks +for itself + +blocks=`expr "$size" / 512` # Ditto. 512 was +a good blocksize for 4mm DAT. Just be consistent + +full=`df -k . | grep + | awk '{print $6}'` # Unfortunately, this only gets +checked once per glitch. Maybe a fork? + +echo $size +# Just so I know how it's going + +echo $blocks + +echo $full + +if [ $full -gt +90 ] + +then + + echo "filesystem is filling up" # You +get the point here + + exit 1 + +fi + +mt -f /dev/tape rewind +# Let's not take chances. Start at the beginning. + +sleep 60 +# The drive hates this tape as it is. Give it a rest. + +mt -f +/dev/rmt/tps1d6nrv fsr $blocks # However big rawfile is +already, we can skip that on the tape + +dd if=/dev/rmt/tps1d6nrv bs=512 >> +rawfile # Let's get as much as we can + +if [ $? -eq 0 ] + +then +# If dd got clipped by a tape error, there's still work to do, + + +echo "dd exited cleanly" # if not, it must have +gotten to the end of the file this time + + exit 0 +# without a hitch. We're done. + +fi + +done \ No newline at end of file diff --git b/rfitable.html a/rfitable.html new file mode 100755 index 0000000..a7e2de5 --- /dev/null +++ a/rfitable.html @@ -0,0 +1,2659 @@ + + + + + + Backup Software Request for Information + + +Backup Software +RFI +

The questions in here are designed to help you find out as much as possible +about a backup software product.  The question are not perfect, and +any opinions that may be implied by the wording of the questions is unintentional.  +Please use this only as an information gathering tool, and form your own +opinions! +

It is my hope to have a "live" version of this RFI available on http://www.backupcentral.com +as soon as possible. +
  + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1.What +is the name of this product?
2.Some +backup products consist of a base product, then several "add-ons" for which +the customer must pay extra. Please list all add-ons that this product +has, and what additional benefits they provide.
3.Please +list any third-party products that are required to use the add-ons above +(E.g. St. Bernard).
4.For these +questions, a "raw device" is defined as a disk, a slice of a disk, or a +virtual disk composed of several disks (e.g. a RAID device). THIS DISK +DOES NOT CONTAIN A FILE SYSTEM. It does contain some other type of structured +data, though. For example it may be a raw partition containing Oracle or +Sybase data. If I shutdown Oracle or whatever is writing to this raw partition, +can this product back up that raw device?
5.One problem +with raw devices is that backup products see them as one big file, instead +of thousands of blocks of data. Any backup of a raw device is therefore +typically a full backup. However, a backup product could keep track of +each block of data on the raw device, and back up only those blocks that +have changed. This would allow a true incremental backup of this device. +Can this product perform an incremental backup (as defined here) of a raw +device (as defined above)?
6.If so, +does it keep track of block contents by generating a [C] checksum for each +block, do a [B] block comparison to an existing full backup, or [A] analyze +the structured data on the raw device in some way (e.g. understand how +Oracle structures data on a raw device)?
7.If a +raw device (as defined above) is bigger than a tape, can this product spread +its backup over more than one tape?
8.***LARGE +SYSTEMS/FILES*** Some operating systems are now supporting large file systems +(>1TB) and large files (>4GB). If the operation system supports large files +(>4GB), does this product support backing them up?
9.If the +operating system supports large file systems (>1TB) does this product support +backing them up?
10.If a +file system is bigger than a tape, can this product spread its backup over +more than one tape?
11.If a +FILE is bigger than a tape, can this product spread its backup over more +than one tape?
12.*** MANY +SYSTEMS TO ONE DEVICE*** Streaming tape drives work well only if you provide +them with a constant stream of data equivalent to their maximum transfer +rate. In order to do this, it is often necessary to create multiple sources +(or "threads") of data to the same drive simultaneously to keep it streaming. +This is usually done by backing up several disks or systems to the same +device simultaneously. Can this system backup several systems at once, +and send them to the same tape drive simultaneously?
13.If the +answer to the previous question was yes, each source of data getting backed +up to this device would be called a thread. What is the greatest number +of threads that can be sent to the same device at once [integer], or is +this number [U] unlimited ?
14.What +is the maximum number of threads that can run at once per backup server +[integer], or is this number [U] unlimited ?
15.What +is the maximum number of clients that can be backed up at once per backup +server [integer], or is this number [U] unlimited ?
16.What +is the maximum number of tape drives that can be used simultaneously [integer], +or is this number [U] unlimited ?
17.How does +it determine how many "threads" to send to the device? Is it [D] dynamically +determined based on how well a given device is streaming compared to its +maximum throughput? Is it a [G] global value set by the administrator (i.e. +all devices get three threads each)? Or is the administrator able to determine +the maximum number of threads for each device [I] individually (i.e. this +device gets three threads, this one gets four, etc.)?
18.For this +RFI, this feature (of sending multiple threads to a device) will be called +concurrency. This is either done on the [F] file level or [R] record level. +File level concurrency does not split up the file. These products interleave +[F] files from different hosts (or different parts of the same host) together. +Other products do this by splitting up the file into multiple blocks or +[R] records, which are sent to the tape drive asynchronously. If this product +supports concurrency, is it [F] file-level or [R] record-level concurrency, +or [B] both at the option of the administrator?
19.If done +on the record level, will all the pieces of a given file normally go to +the same tape [Y] (unless EOT is reached), or are they automatically spread +across multiple tapes [N]?
20.If so, +is the number of tapes that this file will be split across configurable +by the administrator?
21.***ONE +SYSTEM TO MANY DEVICES*** (See Note 1 at the end of the RFI.) Can an administrator +using this product take a big host, split it up into smaller subsets and +back it up to multiple tape drives simultaneously (assuming the administrator +schedules it to do so)?
22.Can this +product do the above without requiring an administrator to create the logical +subsets? In other words, can the administrator simply tell this software +"backup apollo?" The product would then automatically start multiple simultaneous +threads on apollo during the backup, each of which would back up a subset +of the system, such as a file system. The product would then automatically +send these threads to multiple tape drives simultaneously. Can this product +automatically start multiple simultaneous threads per client?
23.If this +product has this capability, does it have logic in place to NOT start more +than one thread per file system at a time?
24.If so, +this could cause a limitation with very large file systems. However, if +this file system is sitting on a virtual partition consisting of several +physical disks striped together, it could easily support more than one +thread at a time. Does this product notice when a file system is residing +on more than one disk, and then automatically generate more than one thread +for that file system?
25.Another +way to solve the problem above would be to allow the administrator to say +that a given file system can support X number of threads. Does this product +allow the administrator to specify this value on a file system level?
26.If this +product can automatically start multiple simultaneous threads per client, +is the number of threads and drives being used [D] dynamic or [S] static? +(E.g. If the product is splitting a large host's backup over 5 drives, +and another drive becomes available, it will start using that drive if +drive allocation is dynamic. If drive allocation is static, it will continue +using 5.) Is drive allocation [D] dynamic or [S] static?
27.Another +difficult problem to solve is that of a very large raw device (as defined +above in the raw device section) which must be backed up in a short amount +of time. If this product is able to back up raw partitions, can it also +start multiple simultaneous threads per raw device (This would be independent +of a program like Backup Server or Oracle's EBU. An example of how this +might be used is using a script to shut down the Oracle instance, and then +using this feature to back up its raw devices in parallel as fast as possible +to multiple tape drives. If this product could do this, the administrator +could back up relatively large Oracle databases without having to buy an +EBU interface, as long as they could afford a little down time.) Can this +product do this?
28.Suppose +there was a very large file. Can this product start multiple simultaneous +threads for this file, sending these threads to multiple tape drives SIMULTANEOUSLY?
29.What +is the maximum number of threads that each client can generate at once? 
30.***REDUCING +NETWORK TRAFFIC*** Does this product allow the administrator to specify +how many Mb/s they want to send out on the network?
31.If the +answer to the above is yes, can this value be changed based on different +days, or times of the day?
32.If this +product can change its use of the network, can it do so based on observed +network traffic [V] volume, [T] throughput, or [L] latency [V,:T,:L] (Multiple +possible answers. Answer with all that apply.)?
33.This +question is intended to determine what type(s) of backup architectures +this product supports. (See note 2 at the end of the RFI that defines four +architecture types.) Which architecture(s) does this product support: [S] +Standalone, [2] two-tier, [3] three-tier or [W] web administration (Multiple +possible answers. Answer with all that apply.)?
34.Assume +that this product supports device servers (as defined by Note 2). Will +this product allow the administrator to configure the backup definitions +in such a way that apollo's backup should normally go to a certain device +server?
35.Can this +product separate backups per subnet? (I.e. If the administrator desires, +systems on a subnet would normally backup to tape drives on that subnet.)
36.Does +this product have an algorithm that tries to determine the closest possible +server to send a given client's backups to (E.g. It might try to find a +device server on the same subnet first.) ?
37.Some +tape libraries have external SCSI connectors for each device (e.g. ATL/Odetics). +Therefore each tape drive could conceivably be connected to different systems. +Does this product allow putting each of these devices on separate hosts, +and still controlling this as one library?
38.If so, +does this library still cost the same in licensing?
39.Does +this product support compressing the files on the client before sending +them to a remote tape drive, WITHOUT MODIFYING THE ORIGINAL FILES? (It +has been suggested that one could run the compress command on the entire +file system prior to backing it up. That is not what I mean when I say +software compression.) Does this product support software compression?
40.If this +product supports software compression, is this compression done by passing +the data through a stream (or pipe) [S], or is each file compressed to +a temporary location, such as /tmp, and then sent to the tape drive [T] +Some products do not pass the data through a compressing stream, and do +not use a temporary disk location, but will still compress each file in +[M] memory as it is on its way to the backup device (assuming memory is +big enough to hold the file). Answer with the appropriate letter. Does +this product use [S] stream compression, a [T] temporary file location, +or [M] memory based compression?
41.If this +compresses files to a temporary location, what happens if that temporary +location's file system fills up? Does this product have a work around for +that?
42.***STANDARD +OR UNIQUE BACKUP FORMAT*** For this RFI, "unique backup format" is defined +as a format that cannot be read with native utilities, such as dump/restore, +ufsdump/ufsrestore, Microsoft Backup, tar or cpio. Does this product use +a standard backup format [Y], or a unique backup format [N]?
43.If so, +which does it use: [C] cpio, [T] tar, [D] dump, [M] Microsoft?
44.If not, +can a user choose not to use this unique format, and use a standard instead +(e.g. dump, tar, cpio)?
45.If it +uses (or allows the use of) a standard format, and one backs up multiple +hosts to the same device simultaneously, is the tape still readable by +the native utility?
46.If this +product allows using standard utilities, has it overcome their limitations +(e.g. 256 character path name in tar, no special files in tar and cpio, +file systems changing underneath dump)?
47.Can this +product write SIDF (System Independent Data Format) compliant tapes ?
48.Can this +product read tapes written in SIDF?
49.Does +this product have any plans to [W] write or read [R] the SIDF (Multiple +possible answers.)?
50.***NETWORK +DISK DRIVES*** Can this product backup a file system mounted via NFS?
51.If so, +can it back up or skip selected NFS partitions [Y], or must you exclude +or backup all NFS partitions [N]?
52.Can this +product back up drives on a NetWare server that are really network mounted +from another NetWare server (i.e. data is passed NW server -> NW server +-> BU system)?
53.Once +again, can it backup or skip selective NetWare network file systems [Y], +or must one exclude/include all of them [N]?
54.Can this +product back up network drives that have been mounted via the SMB (A.K.A. +CIFS) protocol (e.g. NT network drives mounted onto a UNIX server using +SAMBA, an NT drive network mounted to another NT server, or an OS/2 network +mounted drive)?
55.Once +again, can it back them up selectively [Y], or must an administrator exclude/include +all of them [N]?
56.Let's +say that I am already using this product to do a full backup of a particular +system every week. Assume I want to save one of these full backups every +month, and keep its data in the online catalog for 5 years, while allowing +the other three full backups to expire. Can one of the regular weekly full +backups be saved automatically for this period of time [Y], or must I run +a separate backup that will be kept for a long period of time [N] (In other +words, is the retention period a value that can change per backup definition +[Y], or is it a global value that always stays the same [N]?
57.If this +product does not perform regularly occurring full backups, can it create +a full backup of a system at any point in time by copying the data from +the appropriate tapes?
58.***STORAGE +MANAGEMENT*** For this question, archiving is defined as saving a group +of files together using a definition that can be searched on years later. +Years from now, I will not know the name of the system that contained the +files for a given project, but hopefully I will know the project name from +which they came. So, can I create a backup definition that contains all +the data for a given project, back up that data using that definition, +then retrieve that data years from now using that project name?
59.Is archive +data stored on separate tapes from regular backups?
60.HSM (Hierarchical +Storage Management) is defined as a process that proactively monitors my +file systems based on parameters that I have set, such as access time. +It then proactively moves them off to less expensive media, such as tape, +and leaves behind 'ghost' files that, if accessed, cause the files to be +restored by the HSM system. A true HSM would cause them to be restored +automatically. Does this product have an HSM module?
61.Can this +product's HSM module work with [A] any type of file system that the client +supports, or does it require a [S] special type of file system, such as +Veritas' vxfs?
62.Are files +that are archived by the HSM system put on tape multiple times before they +are removed from disk?
63.Is an +HSM-migrated file automatically restored if it is accessed?
64.Is the +HSM user given a choice of restoring the file? In other words, if a user +looks at a file, does the user have a choice to restore it [Y], or will +it automatically be restored [N]?
65.Is HSM +data stored on special tapes that only store HSM data [Y], or are they +stored on the same tapes as normal backup data [N]?
66.All products +that read a Unix file through the file system will change its atime. If +this product backs up through the file system, does it reset the atime +of a file after it backs it up?
67.One of +the downsides of resetting atime is that it will automatically change ctime. +Some systems, such as a firewall, might value ctime more than atime. If +this product supports resetting atime, can the administrator choose NOT +to reset it?
68.If yes, +is this option on the backup definition level [B], or is it a global value +resulting in all atime's being reset or not [G]?
69.A "snapshot" +is a picture of a system in one instant of time. To be useful, a snapshot +takes only a few seconds to make, and can then be backed up by a backup +product. It allows the product to take many hours to back up a system, +but creates a backup that looks as if it was done all at the same instant. +(e.g. Snapshot from Network Appliance, or Veritas' vxfs) Does this product +support using any snapshot software products to automatically get consistent +backups of large file systems?
70.If the +answer to the above question is yes, please list the snapshot products +that this product can automatically use between this and the next question.
71.If a +backup fails, can this product restart from that point [Y], or must it +go back to the beginning [N]?
72.Can this +product append to tapes that have already been used?
73.Can this +product automatically reuse tapes that have expired?
74.***RUNNING +USER SCRIPTS*** Can this product run a shell script or batch program automatically +before and after a backup?
75.When +that shell script fails, does that backup automatically stop [S], continue +on anyway [C], or are its reactions in this scenario configurable by the +administrator [A]?
76.***ADMINISTRATION*** +Regarding scheduling of full backups, backup products can be divided into +three categories. The first would be the traditional [S], "Scheduled Full +Backups." The administrator defines what days of the month or week that +each full backup runs. The second would be [A] "Automatic Full Backups." +The administrator defines that a full backup should occur every N days, +and the product automatically spreads all full backups out over that "rotation +period." The third would be [N] "no full backups." The product essentially +does one full backup, and then all backups from that point on are incremental +backups. Which category best fits this product, [S] Scheduled full backups, +[A] automatically scheduled full backups, or [N] no full backups?
77.Does +this product support multiple levels of incremental backups (e.g. 1-9)?
78.Can tapes +be automatically created for off-site storage?
79.If the +above is true, is it a copy of the data from one tape to another [Y], or +a second backup [N]?
80.Do the +off-site tapes have a unique name in the backup catalog?
81.Assume +that a backup tape is made by this product, and this product is told to +keep track of its data in the backup catalog for some period of time. Is +this tape protected from being overwritten by this product for that period +of time [Y], or is there some way to accidentally overwrite a backup tape +with this product [N] (Obviously, an administrator using a native backup +utility could manually overwrite the tape, but this question is making +sure that this products tapes are safe from itself.)
82.Is there +some way to tell this product that a set of tapes are off-site?
83.If the +answer to the previous question is yes, what happens during a restore? +If the version of a file being asked for resides on two tapes, one of which +is off-site, will this product ask for the on-site tape first?
84.Another +method of providing off-site copies is "saveset cloning." A given night's +backups may span over many tapes, but they might fit onto one tape if combined. +Saveset cloning would mean that these several tapes can be automatically +copied onto one tape (or however many it takes) that can then be sent off-site. +This would be done automatically by the software, as defined by the administrator. +Does this product support "saveset cloning"?
85.Another +common problem with backups is that over time a number of tapes may contain +small amounts of useful data. They cannot be re-used because part of the +tape has not expired, and may not expire for a long time. However, they +contain so little unexpired data that if you put many of them together +they could fit on one tape. What some products allow you to do is to automatically +copy all the un-expired files from a number of tapes to one or more consolidated +tapes, and automatically free up, or "reclaim" the tapes from which the +files came. Does this product support "reclaiming" several tapes by automatically +copying their unexpired data onto a smaller set of tapes and expiring them?
86.Some +customers have also expressed an interest in "cycling" data that is stored +long term. This might mean moving from an older type of media to a newer +type that is faster and has more capacity. Or this could simply be moving +data from one tape to another to ensure that the media is in good condition. +Does this product have anything that could be used to accomplish such a +task?
87.Can this +product be told to consolidate a given host's backup? This is slightly +different than saveset cloning or media reclamation. The purpose of this +is to create one tape, or a small set of tapes, that would contain all +the files for a certain system. Can this product take all files for a given +system, and create a single tape, or set of tapes, that can be used to +restore that system?
88.Assume +that this product supports device servers (As defined in Note 2.), and +apollo is backing up to elvis's tape drive. What happens if elvis's drive/library +fails tonight? Can this product automatically reroute the backup from that +failed system to another device server?
89.If so, +how can an administrator determine which system's tape drive it will reroute +that backup to? Does the software [A] automatically pick one based on some +algorithm that determines the closest drive by looking at subnet information? +Does it allow the administrator to pre-define an alternate drive [M] manually? +Or in the case of a failed drive, will the backup simply go to the next +[N] available tape drive?
90.When +a product can reroute backups around a failed device, this is called fail-over +protection. Can the administrator designate more than one level of fail-over?
91.Assume +that this product has an installation with multiple master backup servers. +Does this product have a web-based management system that could manage +these servers remotely without having to login to them (i.e. without "telnet" +or "rlogin" or logging in from the system console)?
92.If the +answer to the above question is yes, what features among the following +list does this web-based administrator support: [S] Scheduling, [P] policy +enforcement, [F] additional level of fail-over, [D] automated software +distribution (Select all that apply. Multiple possible answers.)?
93.Does +this product allow the administrator to specify aliases for a given client?
94.If the +answer to the previous question is yes, how many aliases may be specified +for each client?
95.If an +administrator wants to refer to a client by its fully qualified DNS name, +can this product do so?
96.Will +this product allow NIS, DNS, or WINS (or some other hostname to address +translation program) to resolve the hostname into an IP/IPX address [Y], +or does it need to have a host's IP/IPX address in its database [N]?
97.(See +Note 3 at the end of the RFI for a definition of the difference between +file name expansion and regular expression syntax.) Does this product include +files based on [R] regular expression syntax, [F] file-name expansion syntax, +by [G] selecting them in a GUI, or [N] none of the above?
98.Can this +product include files by some other means ([D] date, [F] 'find' results, +[M] modes, [O] ownership)?
99.Can this +product exclude files based on regular expression syntax, [F] file-name +expansion syntax, by [G] selecting them in a GUI, or [N] none of the above?
100.If not, +can you exclude files by some other means?
101.Can this +product exclude special TYPES of files (e.g. core files, tmp files, etc.)?
102.Does +this product allow internal customers (as in a group of people with machines +that are being backed up with this product) to define their own backup +definitions (i.e decentralized control)?
103.Does +this product allow internal customers to schedule their own backups (i.e +decentralized scheduling)?
104.Does +this product have a default backup schedule that can be applied to new +backup definitions?
105.Can this +product schedule its backups by [D] day (E.g. every Monday), day of the +[M} month (E.g. the third day of the month) , by [W] week (E.g. Every Sixth +week) and [C] calendar date (E.g. July 23, 1997) (Multiple possible answers)?
106.If this +product supports remote tape drives, and it is copying data from /dev/rmt/2 +of apollo to /dev/rmt/3 on apollo, does the data traffic remain local to +apollo?
107.Does +this product use the UNIX/NT/Novell login and password, or an additional +password to use the administrator GUI or commands (e.g. If this product +asks for a second password when someone logs into this product, then answer +to this question is NO.)?
108.Once +installed, does this product require a person to have root (or NT administrator) +privileges to administer backups?
109.Can this +product give the administrator a summary report for last night's backups +[Y] or must the administrator scan several log files [N]?
110.Does +this product have a GUI that will run native on Win95 [9], Windows 3.1 +[3], Windows NT [N]?
111.Does +this product have a Motif [M] interface, Openlook [O] interface or both?
112.Does +this product have a curses (i.e. vt100 terminal menu) interface?
113.Does +this product have an HTML interface?
114.Do all +the above interfaces work essentially the same?
115.Approximately +what percentage of commands that are in this product's GUI can also be +done on the command line [0-100]
116.Approximately +what percentage of commands that this product requires MUST be done on +the command line (e.g. Some products do most of their work in a GUI, but +require cron jobs to actually start the backup.
117.Are the +backups initiated by an internal scheduler of this product [I], initiated +by cron (or NT & NetWare's equivalent) [C] or [T] third party, such +as Norton Scheduler?
118.If initiated +by cron, are these cron entries automatically added by the software [Y], +or manually added by the administrator [N]?
119.If initiated +by cron, are the cron entries only on the server [S], or are they also +on the clients [N]
120.Is help +is available on-line?
121.Is the +online help in man-page [M] style, hypertext [H], Postscript [P], Adobe +PDF [A], simple text [T], or html [W] (multiple possible answers)?
122.Can this +product send messages about backups via email?
123.Does +it allow defining different email addresses for different types of messages?
124.Can this +product generate an SNMP trap?
125.If a +modem was connected to the backup server; can this product send an Alpha +page?
126.Does +this product support backing up a Windows client by an automatic connection +which "shares" the client's drive to the backup server (via the SMB/CIFS +protocol)?
127.If the +answer to the above question is yes, can it do this without needing the +administrator password on that client?
128.Does +this product support backing up a Windows client by installing software +on the client, communicating and transferring data via that software, and +NOT requiring the client to "share" (in Microsoft's definition of that +word) the client's drive with the backup server?
129.Does +this product require software to be installed on every client?
130.If this +product requires client side software, will all ports of client software +work with all ports of server software (e.g. an HP-UX client will backup +to an NT server, a Solaris server, or a NetWare server.)?
131.If this +product requires client side software, and the server software is upgraded, +will newer versions of the server software work with older versions of +the client software?
132.Assuming +that the backup server can issue an rsh command to a new client, can the +new client's software be installed automatically (e.g. the administrator +only has to hit a button or run a script that says, "Which client do you +want to install?")?
133.If the +above is not true, is the following true? The administrator could ftp or +rcp a tar file, which when un-tarred, would contain an install script. +Once this install script is run, the client software will be automatically +installed and started. This install script would automatically perform +the steps necessary to ensure that the software will be started when the +system reboots. Is this level of automation available?
134.Once +a client is installed and (if necessary) a daemon is running, can it be +upgraded automatically by this product (E.g. Several clients automatically +upgraded from a central point)?
135.Does +this product support installing software via the SVR4 package format on +operating systems that support that format?
136.Does +this product support defining preferences by backup definition [Y], or +all preferences global [N]?
137.Can I +control all backups from one central server if I use device servers?
138.What +is the biggest block size that this product supports (in KB)?
139.Can this +product's block size be changed for performance?
140.If a +new block size is selected for performance, are the tapes made with the +old block size still readable?
141.Does +this product support using different block sizes for different types of +media?
142.Does +this product use standard device drivers?
143.Can this +product operate without kernel modifications (other than normal patches)?
144.If this +product has problems backing up a file, does it let me know?
145.Can this +product backup the whole system, just by giving its name [Y], or does the +administrator need to tell it which file systems to back up [N]?
146.Can this +product gracefully handle Physical End of Tape?
147.Please +list the reports that this product can automatically generate.
148.Does +this product have any way to give an automatic delta report when the backup +configuration is changed?
149.***SECURITY*** +Does this product communicate via sockets [Y] or rsh [N] (i.e Will it work +without .rhosts entries)?
150.Can this +product communicate via Kerberos [K] or SSH [S], or neither?
151.Can this +product back up systems that are on the other side of a firewall? (If so, +please list which firewalls this product is equipped to work with in the +comment field.)
152.If it +does require .rhosts entries, can backups be done as some user other than +root?
153.If this +product uses sockets for communication, someone might "spoof" the software, +as has been done with sendmail. This might allow them to perform an unauthorized +restore. If someone knew the socket calls that this software makes, they +could theoretically open up the appropriate port and issue those calls +manually. The only way that this could be prevented is some type of authentication. +Does this product have such authentication? (Please describe what this +product does to prevent this.) 
154.Does +this product support encryption on the tape?
155.If the +encryption key ever needs to be changed, are tapes made with the old key +still readable?
156.Does +this product support encryption while data is in transit? In other words, +the tape itself is not encrypted, but the data is encrypted while in transit +across the network? (This could allow secure backups across the Internet +or a T1 line.)?
157.***RECOVERY*** +Are the tapes made by this product platform independent? THIS IS A VERY +IMPORTANT QUESTION. In order to be considered platform independent, the +tapes made on any operating system that this product supports must be readable +by any client on any other operating system that this product supports. +(e.g. A Solaris box can read a Novell tape. A tape made on NT is readable +on an AIX box. An SGI backup tape is readable on a Novell server.) Are +this product's tapes platform independent?
158.Can a +user perform his/her own restores?
159.If so, +can the administrator turn that feature off?
160.If users +can restore their own files, and a user normally logs onto apollo, but +his home directory is NFS mounted from elvis, can he restore being logged +into apollo [Y], or does he have to log into elvis directly [N]?
161.If given +privileges by the administrator, can one user (such as a helpdesk person) +restore another user's files?
162.Can this +product read from multiple tapes simultaneously during a single restore? +In other words, a backup of a given file system may be spread over 5 tapes. +Can this product read all five tapes simultaneously to restore that file +system?
163.If the +answer to the above question is yes, what if the number of tape drives +available for restore is different than the number of tape drives available +for backup? Assume that the backup wrote to five tapes. Assume also that +only three drives are available during the restore. Can this product read +three of the five tapes simultaneously, and then ask for the other two +when drives become available?
164.If the +answer to the above question is yes, each of the five tapes in the example +above would result in a mount request. Can the operator respond to these +mount requests in any order?
165.During +a restore, can I restore files to a directory other than the directory +from which the files came?
166.(You +might want to read the question more than once before answering it. It's +not meant to be tricky, but it is.) I would like to restore /home1/curtis/bin/shells/src +to /tmp. The ~/src directory contains several subdirectories. I would like +to maintain those subdirectories during the restore. Can I restore /home1/curtis/bin/shells/src +to /tmp/src, which will contain all the appropriate subdirectories [Y], +or must I restore to /tmp/home1/curtis/bin/shells/src [N]?
167.Can I +restore files to a host different from the host from which the file was +backed up?
168.To restore +apollo's files, can I be logged into the master backup server and tell +IT to restore apollo's files [Y], or must I be logged into apollo to restore +apollo's files [N]?
169.During +a restore, can I rename files one a time, like in cpio?
170.Does +this product support Fast file search on DDS/8mm/DLT, etc.?
171.Can this +product call AIX's mksysb, and place its data on this product's tapes?
172.SCO has +a similar utility. Can this product call that utility and place its data +on this product's tapes?
173.HP has +a similar utility. Can this product call that utility and place its data +on this product's tapes?
174.Has this +product made any other special arrangements for restoring an entire system? +(This is a very hot issue lately. If this product is doing anything in +this area other than the AIX's mksysb, and HP & SCO's similar utilities, +please explain!) 
175.Does +this product report to a user requesting the restore if it failed, or cannot +be completed, and why?
176.If a +directory is selected for restore, will this product automatically restore +everything underneath it?
177.Is there +a GUI that displays the directory essentially the way it would look in +UNIX or NT, and which allows the user to point and click on the file(s) +that need restoring?
178.Does +this display above include the attributes of the file in the overall display, +without single clicking on a file (e.g. the equivalent of an 'ls -l' or +a 'dir')?
179.Can files +also be restored from the command line?
180.Can files +also be restored from the curses interface?
181.Time +for another scenario. I do a full backup of apollo:/home1 on Monday. I +do incremental backups all week long. On Wednesday, I delete the file apollo:/home1/curtis/deletefile. +I continue doing incremental backups until Friday morning. Friday afternoon +the disk containing /home1 dies, and I need to restore /home1. If I select +all of /home1 to be restored, will it know that /home1/curtis/deletefile +was deleted, and thus not restore it [Y], or will it restore deletefile +because at one time it was backed up [N]? (i.e Does this product track +deleted files?) 
182.Assume +the same scenario as above. It is now Friday, but I want to restore /home1 +back to the way it looked on Wednesday night, which would include the file +that is not deleted. Can this product do that?
183.If users +can do their own restores, are there any controls that an administrator +can place on them to prevent them from interfering with backup operations?
184.Suppose +someone asks for a restore while backups are going on, will the restore +take precedence when requesting a volume [R], or will backups get priority +[B]
185.Is this +product's default action to restore a file exactly as it appeared, including +permissions, modification times, and ownership?
186.What +overwriting options do this product have?
187.If this +product uses block-level parallelism (i.e files are split into multiple +blocks which are interleaved onto the tape), how does it read those interleaved +blocks during a restore? Does it read the tape continuously, disregarding +any un-needed blocks, or does it position-read-position-read? (Position-read-position-read +means that you position the tape to a certain block, using 'ioctl fsr' +or its equivalent, then read a block of data, then position it to the next +block, then read a block of data, and so on.) Another option would be that +the product dynamically [D] chooses one of the above methods, depending +on which type of drive it is using. (There is only one possible answer +to this question, either a continuous read [C], position-read-position +[P], or a dynamic choice between the two based on drive type [D].)
188.***PROTECTING +THE BACKUP CATALOG (Database, Index)*** Where does the backup catalog reside? +Is it all in one place on a disk on the backup server [S], spread out on +multiple clients [C], or can it be configured either [E] way by the administrator?
189.Each +time this product makes a backup, it adds new data to the backup catalog. +Is this new section also placed on the tape with the backup?
190.Catalog +information can be divided into two types of information. For this RFI, +the first is called "saveset" information. This is the minimum information +required to tell the backup software that apollo:/home1 was backed up on +volume VOL001. The second type of information would be "browse" information. +This information is required to allow the administrator to browse through +a file system in the GUI. Typically, both are needed for normal restores. +Both types of information are saved to the backup catalog. Does this product +allow the administrator to expire "browse" information, but retain "saveset" +information for a longer period of time?
191.(Here +is another question you might want to read more than once.) When a backup +runs, does this product ensure that both saveset and browse information +are saved to disk? (If this product copies the appropriate section of the +catalog to each tape, and then allows the administrator to expire the browse +information, part of the catalog is essentially residing on tape, but that +is not what this question is looking for. It has been suggested that some +backup software products allow part of the catalog to reside on tape -- +during the initial save of the catalog information about a given backup. +Under this scenario, the "saveset" information would be saved to disk, +but the "catalog" information could reside on the tape WITHOUT EVER GOING +TO DISK. If that is how this product operates, then the answer to this +question is "N.") 
192.How many +bytes does the first backup of a given file add to this product's catalog
193.How many +bytes do subsequent backups of that same file add to this product's catalog
194.If the +catalog is on disk, can it span more than one file system? (i.e If my backup +catalog is bigger than the biggest file system that my OS supports, can +this product spread the catalog out over more than one file system?) 
195.If a +client changes its hostname, can this product change its name in the catalog +so that its backup history does not get lost?
196.If yes, +and the administrator gives this client a new name, is it an alias [Y], +or must the client receive an all new name, and lose all reference to the +old name [N]?
197.Can this +product compress or archive parts of the catalog to save space?
198.Is the +backup catalog platform independent? (e.g. If I have been using an NT box +for my backup server, but decide I need to go to a UNIX box for more power, +can the backup catalog be moved easily? Or if I've been on a Solaris backup +server, but all I have available for disaster recovery is a HP, can the +catalog be restored to the HP, assuming I have the proper executables?) 
199.If the +answer to the above question is NO, are there any utilities that would +allow the administrator to move the catalogs from one OS to another?
200.If the +answer to the above question is NO, can a consultant from this product's +company perform a custom move of the data between OS's [Y], or is there +simply no way to move data between dissimilar OS's [N]?
201.Assume +that this product made a tape, but its saveset and browse information has +been completely deleted from the catalog. Can this product read this tape, +and see what is on it?
202.Is there +a stand-alone command that could be used to read this product's tapes independent +of the product's catalog? If the product is compatible with dump, tar, +cpio, or Microsoft Backup, the answer to this question is obviously yes. +If it is not compatible with any standard UNIX or NT command, is there +a command that comes with the software that would allow an administrator +to read the table of contents of the tape?
203.If the +answer to the previous question is yes, could this command be used to restore +the data -- once again independent of the catalog. (i.e the catalog is +dead, and the customer needs to restore a really important file or file +system. Is there a way to do this without restoring the catalog)?
204.Assuming +this product can read the tape above, can it read the saveset and browse +information back into the catalog?
205.Can I +read in other backup tapes made with tar, cpio, or dump into this product's +catalog (If yes, please list which format this product reads.)?
206.Can this +product import tapes that were made with any other commercial backup products +into its catalog (If yes, please list them.)?
207.When +the server is down, or the backup catalog is corrupt, are there any capabilities +left [Y], or is the product completely down [N]?
208.Does +this product allow backing up the entire catalog to tape as a separate +operation. (Some products allow backing up the catalog in pieces as backups +go along. What I'm looking for here is an operation that backs up the entire +catalog to a tape or tapes, such as a Sybase 'dump database,' or an Informix +'ontape -s.' )?
209.If the +backup of the catalog is bigger than a tape, can this product spread its +backup out over several tapes?
210.Can this +product do an incremental backup of its catalog [Y], or are all catalog +backups full backups [N]?
211.Can the +catalog backup be put on a special tape or set of tapes?
212.Can the +catalog backup be restored using standard utilities?
213.Assume +that the backup server is destroyed. How many steps are necessary to recover +the catalog to another system? (Please list them in the comment field)
214.Does +the backup catalog have a transaction log (like an RDBMS) that can be used +to roll back failed backups?
215.Does +this product use a 3rd party database for its catalog?
216.Can this +catalog be automatically mirrored to another system so that if the backup +server dies, I can automatically use the other server for backups and restores?
217.Does +the catalog use a separate, sub-catalog for each system [Y], or is the +catalog one big database [N]?
218.Can an +administrator restore just one host's part of the catalog?
219.***ROBUSTNESS*** +Other than issuing SNMP traps, has this product done any work to integrate +with Tivoli?
220.Can this +product interface with the HP Openview monitoring system?
221.Backup +devices and clients sometimes cause the backup processes to hang. Does +this product notice that a process is hung, and then kill it automatically?
222.If yes, +what if that process cannot be killed, such as a process hung on a system +call to a broken tape drive. Can you restart the backup with another tape +[Y], or will this hung process cause the backup to fail at that point [N]?
223.Can this +product keep track of a tape drive that continually gets write errors, +and somehow notify the administrator?
224.Can this +product keep track of how many times a tape is used, and stop using it +after certain number of uses?
225.Can this +product keep track of how many times a certain tape (the media, not the +drive) gets write errors?
226.Can it +notify the administrator to clean a drive manually that is not in the library?
227.Does +it retry open or locked files automatically?
228.If it +does retry open files, is it a set number of times [integer], or configurable +by the administrator
229.If it +does retry open files, can it keep trying for a certain amount of time?
230.A file +may be continually changing while it is backed up. Most products try to +put something on tape anyway. If this product tries to do so, will a file +that was continually changing during backup show up as a restorable file?
231.If the +answer to the above is yes, will it be marked as a suspect file, or some +other special note to indicate that it is a suspect file?
232.Can this +product cache backup data to disk prior to sending it to tape?
233.If the +answer to the previous question is yes, can the product be configured NOT +to use this cache disk [Y], or is it required for normal operation [N]?
234.***AUTOMATION*** +If a tape library has a bar code reader, will this product use it?
235.Does +this product keep track of a cleaning tape in a library and automatically +clean the library after a certain number of hours of operation?
236.Can this +product clean a drive in a library by specifying to do so from the GUI?
237.Can this +product read the bar code of a magnetically unlabeled tape, and label it +according to what the bar code says?
238.***COST*** +Does this product charge the same for all clients (of the same OS type) +[Y], or does it charge more for more powerful clients [N] (e.g. charge +more to back up a Sparc 2000 than a 1000.)?
239.Does +the SERVER product on a given OS cost the same regardless of the power +of the server [Y], or does the server software license cost more for more +powerful backup servers [N]?
240.If the +answer to either of the previous two questions is NO, then if I upgrade +the hardware on the server or client, does the product continue to work? +(It may notify me that I need to upgrade, and give me a grace period, but +it will continue to function for some time?) 
241.Does +this product charge the same for all clients (of the same OS type) [Y], +or does it change its price for a given client based on how many USERS +would be affected if it went down [N]?
242.IF the +answer to the above question is NO, how does this product determine compliance? +Does it determine the number of users based on the [P] local password file, +the number of users [L] logged into a server at a given point, the number +of users in the [Y] YP password list, some [O] other monitoring method, +or [N] no monitoring method?
243.Are "device +servers" (a client w/a tape drive who can back up itself or other machines) +part of the base price for a client [Y], or does it cost more to turn a +client into a "device server." [N]?
244.For tape +libraries, does this product charge based on the number of [S] slots, the +number of [D] drives, or a [U] uniform price regardless of the size of +the library?
245.Backup +software companies continue to improve their products and new features. +Some of these features (e.g. HSM, or online backups for Oracle) are sometimes +considered "add-ons," and the product without the add-ons is called the +"base product." How these are priced varies from company to company. Some +companies make all upgrades available for free to customers who are paying +maintenance. Other companies would consider the new base product and its +accompanying add-ons a new version, and believe it is not covered under +maintenance. Therefore, even existing customers who are paying maintenance +would pay to upgrade to the new version. And finally, some companies would +never charge a maintenance customer to upgrade the base product, but would +charge extra to receive a new add-on. Which of following best describes +this product? All upgrades and add-ons are always [F] free to existing +customers who are paying maintenance. Some upgrades are considere! ! d +"major" upgrades and are made available to maintenance customers for a +(possibly reduced) [P] price. Base products are upgraded free for maintenance +customers, but significant new features become [A] add-ons which are available +as an extra cost to the customer. 
246.Does +the license expire? If a customer decides to stop using this product, or +to stop paying maintenance to the company, will they still be able to backup +and restore [Y], or must they continue paying maintenance in order for +the product to function normally [N]?
247.If licenses +expire, has some provision been made to provide license extensions in case +the company goes out of business?
248.***MISCELLANEOUS*** +Is Tech Support available via email?
249.Is Tech +Support available via phone?
250.What +was the date of this product's last major release
251.What +is the expected date of this product's next major release
252.What +platform was the product originally designed for, [M] MVS, [U] UNIX, [W] +NetWare, or [N] NT?
253.Is this +product (and its accompanying add-ons) this company's sole product?
254.What +are the standard support hours for this product in EST time
255.Can customers +get extended support hours if they purchase a support agreement?
256.Can customers +get on-site support?
257.If so, +is this done by contracting with a local service provider [Y], or is someone +flown out from a central site [N]?
258.Can customers +purchase the product directly from you [Y], or must they go through a VAR +[N]?
259.If the +customer purchased the product from a VAR, do they still call you for support +[Y], or must they deal with the VAR for support [N]?
260.Do you +have a knowledge base available on the web?
261.What +is the largest number of machines being backed up by one server at one +of your current clients?
262.What +is the largest size box being backed up by this product (at one of your +clients) [in GB]?
263.Is this +product sold under a different name by anyone else?
264.If the +answer to the above question is yes, are those versions simply the same +software under a different name [Y], or are the resold versions a [S] stripped +down version of your product, or and [O] older version of your product +(Multiple answers possible)?
265.Are all +porting efforts supported by you [Y], or do you allow partners to port +your software to other platforms [N]?
266.If answer +to above is yes, will these 3rd party ported versions work with the "official" +server product provided by your company ?
267.Approximately +how many years has this product been on the market?
268.I would +like to request one reference customer. This should be a customer who can +verify a lot of the functionality described in this RFI. This reference +will NOT be listed in the book, and this person will only be contacted +by me during the research phase. 
269.***OPERATING +SYSTEMS*** (See Note 4 at the end of this RFI for information about the +following table.) What type of port does this product have to each operating +system? There are three possible answers to this question. It can be a +"client only" [C]. This means that this product CAN BACK IT UP natively, +but it CANNOT HAVE TAPE DRIVES. (Some products can back up via NFS. That +is a separate issue addressed in another question. Just because a product +can NFS mount a partition from a client and then back it up, this does +not mean that this product has a client for that OS.) The next level of +functionality would be a "device server" [D]. This means that it this product +can back it up natively, but it CAN ALSO HAVE TAPE DRIVES. The final level +of functionality would be the master backup server [S]. This is the central +server that controls all the backups, and usually contains the backup catalogs. +(Please Note: if this product can be a device serve! ! r or a server, do +not list that it can also be a client. That is implied! However, not all +servers can be device servers, so if a product can be both of them, please +list both of them.)
270.Does +this product support the backup of all file types in UNIX (files, directories, +block special files, character special files, and named pipes)?
271.Is the +product compliant with the Distributed Computing Environment (DCE)?
272.Can this +product back up the Transarc DFS (Distributed File System), its ACL's, +and any other special data contained within it?
273.***DATABASES*** +(See Note 5 at the end of this RFI.) Can this product back up Informix +data to its volumes via the ontape utility ?
274.Can this +product back up Informix data to its volumes via the onbar utility ?
275.Can this +product back up Informix data to its volumes via the SQL Backtrack utility +?
276.Can this +product back up Oracle data to its volumes LIVE via standard commands (sqldba/svrmgrl) +without the use of the EBU ?
277.Can this +product back up Oracle data to its volumes via the Enterprise Backup Utility/EBU +(obackup) ?
278.Can this +product back up Oracle data to its volumes via the SQL Backtrack utility +?
279.Can this +product back up Sybase data to its volumes via the dump database utility +(4.x) ?
280.Can this +product back up Sybase data to its volumes via the Backup Server (10.x +and higher.) ?
281.Can this +product back up Sybase data to its volumes via the SQL Backtrack utility +?
282.Can this +product back up SAP/R3 data to its volumes via the SAP API ?
283.Can this +product back up Lotus Notes data to its volumes ?
284.Can this +product back up DB2 data to its volumes ?
285.Can this +product back up SQL Server data to its volumes ?
286.Can this +product back up MS Access data to its volumes ?
287.Can this +product back up MS Exchange data to its volumes ?
288.If this +product can back up any other special databases directly to its volumes, +please list them. (Please list them below the table. Also list the OS's +that this product works on.)
289.***MISCELLANEOUS +OS AND DATABASE*** (Answer the next few questions in the normal fashion.) +Is this product [S] supporting, or does it [P] plan to support the NDMP +initiative?
290.Does +this product support the backup of the NDS (Network Directory Structure) +in Novell automatically?
291.Does +this product support the backup and restore of the entire Registry in NT?
292.Does +this product support the backup and restore of the entire Registry in Windows95?
293.Assume +that a particular user only wants to have her Windows95 workstation backed +up on demand. Does this product have software that can run on that workstation +and allow the user to do this?
294.If so, +must she run a separate [G] GUI, or can she [R] right click on the drive +in "My Computer" or "Windows Explorer" and select "Backup," or some equivalent?
295.Assume +that a particular user only wants to have her NT 4.0 workstation backed +up on demand. Does this product have software that can run on that workstation +and allow the user to do this?
296.If so, +must she run a separate [G] GUI, or can she [R] right click on the drive +in "My Computer" or "Windows Explorer" and select "Backup," or some equivalent +?
297.Assume +that a particular user only wants to have her MacOS 8.0 workstation backed +up on demand. Does this product have software that can run on that workstation +and allow the user to do this?
298.If so, +must she run a separate [G] GUI, or can she [R] right click on the drive +and select "Backup," or some equivalent ?
299.In NT, +files can be locked in such a way that even a backup product can't see +them. Can this product back up those files?
300.If the +answer to the previous question is yes, does this require the customer +to purchase the "St. Bernard" package?
301.When +backing up databases, can this product figure out which tables/files/devices +are in a database [Y], or does the administrator have to do this [N]?
+ + + diff --git b/star-1.2.tar a/star-1.2.tar new file mode 100755 index 0000000..501c9e8 Binary files /dev/null and a/star-1.2.tar differ diff --git b/syback.tar a/syback.tar new file mode 100755 index 0000000..11355ee Binary files /dev/null and a/syback.tar differ diff --git b/sybase.gif a/sybase.gif new file mode 100755 index 0000000..257e665 Binary files /dev/null and a/sybase.gif differ diff --git b/sybase.html a/sybase.html new file mode 100755 index 0000000..eb16c2e --- /dev/null +++ a/sybase.html @@ -0,0 +1,1271 @@ + + + + + + +

Recovering Sybase
+ +

Recovering from a database problem +starts with diagnosing exactly what is wrong with the database. Maybe isql +will not connect to the dataserver, maybe the database is marked suspect, +or maybe an error message in the errorlog occurs when the dataserver is +started. In all of these cases and others, something has gone wrong where +it had gone right before. Fortunately, with the proper backups, many of +these problems can be fixed and the database restored to full working order. +

Sybase has many parts that are inter-related, +but like Sherlock Holmes investigating a mystery, if we eliminate from +consideration the items that are working correctly, only the error-causing +parts will remain. This section provides step-by-step directions to diagnosing +and repairing all of these error-causing parts, and when completed, will +leave a fully functional dataserver. +

A flow-chart that appears at the beginning +of these steps should help you in the recovery process. Each item in the +chart is numbered the same as the steps and procedures below. The electronic +version of this procedure contains a flowchart that is an HTML image map. +Each decision or action box in the flow chart is a hyperlink to the appropriate +section of the printed procedure. For more detailed information about individual +steps, please consult Sybase's System Administration documentation, especially +the "Backup and Recovery" chapter. +

To begin the investigation, start the +dataserver like it is normally started. If problems show up in the Sybase +error log file, or the dataserver does not come up start with Step 1 to +begin the recovery/diagnosis procedure. +
  +

+
  +

Step 1: Runfile +Ok? +

Sybase starts up by using the startserver +program. In both OSs, the program take an additional parameter -f runfilename +that is the file containing the server startup command and parameters. +If this file is missing, or the path given for the runfile is incorrect, +the startserver command will return an error "Cannot +execute file runfilename". If the path +is incorrect for the file, correct it and start again. If you are unsure +of the path being used, change your directory to the location of the runfile +then run the command, but only specify the runfile name. Sybase will then +try looking in your current default directory for the file. +

+If the runfile is OK, proceed to +step 3. If not, proceed to step 2. + +Step 2: Restore +or Recreate Runfile +

If the file is missing, it can be recreated +fairly easily. There is nothing magical about this text file. It contains +the command and parameters to start the server. Figure F contains a sample +runfile for a Unix system. +

#!/bin/sh +

# +

# SQL Server Information: +

# name: SYB_TITANIA +

# master device: /sybdata/master.dbf +

# master device size: 10752 +

# errorlog: /opt/sybase/logs/SYB_MYDB.errorlog +

# interfaces: /opt/sybase +

# +

/opt/sybase/bin/dataserver -d/sybdata/master.dbf +-sSYB_MYDB \ +

-e/opt/sybase/logs/SYB_MYDB.errorlog +-i/opt/sybase +

+
Figure F: Sample runfile
+
+
+As seen in the comment lines, dataserver takes a number of different +parameters. Again, check Sybases documentation for your OS for more details. +The most important parameter shown is the master device. The master device +is the primary device used by the master database, and the master database +is one of the keystones of a Sybase server. The master database contains +a majority of the information about all the other databases, devices, and +other dataserver objects, including the location of the master device.The +-s dataservername is how the dataserver knows what name to call itself. +If this is omitted, Sybase assumes a dataserver of the name SYBASE +is being started. +

The -e errorlog points to the +full filename and path of the dataservers errorlog. During an install, +Sybase defaults to the $SYBASE/install directory. Because this directory +contains a number of important files other than the error log, it is recommended +that this be changed to point to another directory, maybe $SYBASE/logs/. +Also, if there is more than one dataserver on the system, the errorlog +filename should be changed. Appending the ".errorlog" suffix to each dataserver +name provides each server with its individual errorlog file, making +tracking errors down much simpler. +

The next parameter to look at is interface +parameter, -i interfacedir. This is the directory where Sybase can +find the interface file. In some systems, there might be two interface +files - one or more for the users to use, and one for the Sybase system +to use. The different interface files could contain different selections +of server entries, preventing the users or the dataserver from accessing +all the Sybase servers available. If this parameter is omitted, Sybase +will look in the directory pointed to by $SYBASE environmental variable. +

One parameter that is not shown here +is the configuration file parameter -c configfile. As of version +11, Sybase can be started using a configuration file. This file contains +the text of all the configuration options on the dataserver, and their +values. When no -c parameter is specified, Sybase defaults to the file +servername.cfg +found in the directory where the dataserver is started. +

Two additional parameters can be used +with the dataserver program - the version parameter -v and the single +user mode parameter -m. The version number parameter is handy to +use when you need to see the current version of the database program, and +do not want to search for it in the errorlog. +

The single-user mode -m, or +maintenance mode is used to bring the dataserver up so only the "sa" account +can access the dataserver. This is required when doing certain recovery +procedures, and in general, when there is a need to prevent user access +while maintenance is being done. +

It is easy to recreate this file by +using a text editor to create a script like the one above, but with the +values of your dataserver. Once the file is created, and the account that +starts the dataserver has access to it, run the startserver f runfile +command. +

+Once you've replace the runfile, +return to step 1. + +Step 3: Able to +Get Shared Memory? +

Like most database products, Sybase +uses shared memory to communicate between dataserver processes, process +queries, and store sections of the database for quick access. If Sybase +is prevented from acquiring the minimum shared memory it needs, it will +fail to come up and instead will error out with a message like the following: +

00:1999/03/21 23:39:02.78 kernel +os_create_region: can't allocate 2147479552 +

bytes +

00:1999/03/21 23:39:02.80 kernel +kbcreate: couldn't create kernel region. +

00:1999/03/21 23:39:02.80 kernel +kistartup: could not create shared memory +

If you are able to get to shared +memory, proceed to Step 6. If not, proceed to Step 4. +
+Step 4: Free Up +Shared Memory or Reconfigure Memory in Configuration File +

As mentioned above, changes in configuration +parameters can cause the Sybase server to need additional memory. This +in turn can required the Sybase server to request a larger shared memory +segment from the OS. There are two things that can prevent this from happening. +One, the maximum shared memory segment size is undersized, and two, there +is not enough shared memory on the system to allow this to run. +

In the first case, the solution is +to increase the maximum shared memory segment value appropriately for the +OS. Because Sybase is supported on many different OSs, how to change this +is not shown here. Please refer to the appropriate operating system sybase +installation manual for further information on setting this value. +

In the second case, either shared memory +needs to be added to the system, or shared memory needs to be freed up +from other processes. Please contact a System Administrator for help in +accomplishing either of these. +

You can view the shared memory being +used by using the ipcs command. Check the man pages for the correct +usage of the command, but the output should look something like this: +

------ Shared Memory Segments -------- +

shmid owner perms bytes nattch status +

129 sybase 600 11964416 1 +

2 sybase 666 1024 3 +

56 curtis 606 33334342 3

+Here, the "curtis" process has also taken +some of the shared memory for itself. By stopping this process, the shared +memory should be freed up. If stopping the process does not free up the +memory, it can be freed using the Unix command ipcrm. Again, check +the Unix man pages for more information on this command on your operating +system. +

If none of these solutions helps the +problem, the changes in the configuration options will need to be reverted +to allow the dataserver to come up correctly. +

+Helpful Hint: If possible, make one +change to the configuration options at a time. Sybase configuration options +interact, some causing the system to need more memory, other less. By making +one change at a time, the configuration option that prevents the system +from restarting is known and can be adjusted accordingly. +

Once you have corrected these problems, +return to step 1.

+
+Step 5: Able to +Connect to Dataserver? +

When Sybase starts, it needs the interface file so it +can set up all the interprocess and network communication parameters. This +interfaces file is usually named interfaces, interface, or sql.ini, depending +on the operating system. +

+If you are able to connect, proceed +to step 7. If not, proceed to step 6. + +Step 6: Check Interface +File +

If Sybase cannot find the file, or +there is something wrong with it, Sybase will error-out during startup +with a message like one of the following: +

00:1999/03/22 00:17:46.54 kernel +Could not open interface file +

'/opt/sybase/interfaces' +

00:1999/03/22 00:22:16.15 kernel +Could not find name 'SYB_MYDB' in the +

interfaces file /opt/sybase/interfaces

+To correct the first error, make sure +the interface file is located where the runfile says it should be +located. If the interface file is in a different directory, either adjust +the runfile to point to the correct directory, or move the interface +file to the directory specified in the runfile. +

The second entry means the lines for +the dataserver being started (in this case SYB_MYDB) +cannot be found in the interface file. To fix this problem, either add +an entry using the appropriate method for your operating system. Please +see the Sybase Operating System specific documentation for more information. +

Once the entry is in the correct interface +file, re-run the startserver command to see if this has fixed the +problem. +

+Once you have corrected any problems +with the interfaces file, return to step 1. + +Step 7: Master +Database Initialized? +

The master database is the most important +database in the system; not only does it contain all the information about +all the other databases, but it is also the repository of all information +regarding logins, roles, devices, usage of those devices, and configuration +options on the system. It is imperative that this database comes up correctly. +Otherwise, the rest of the dataserver would not work at all. +

+If the Master database did initialize, +proceed to step 14. If it did not, proceed to step 8. + +Step 8: Master +Device/File Missing? +

The first information that something +is wrong with the master database would most likely be the following error +messages displayed during startup: +

00:1999/03/22 00:53:48.44 kernel +kdconfig: unable to read primary master +

device +

00:1999/03/22 00:53:48.44 kernel +kiconfig: read of config block failed

+These lines are preceded by a line that +will tell you why the system could not read the primary master device. +To fix these problems, go to the section on "Problems With Data Devices." +If none of the solutions in that section corrects the problem, you must +restore the master database (proceed to Step 9). +

If there are no problems accessing +the master data device, then the problem might be the master database has +been corrupted. This type of problem could show a number of strange symptoms +if the dataserver was already running. For example, isql will crash with +program errors when trying to connect, or currently running queries will +also crash. +

If strange things like these start +happening, the best thing to do is to try to restart the dataserver. Because +in situations like this, almost all access to the dataserver will, the +normal use of the isql command shutdown cannot be used. Instead, +the dataserver process must be stopped at the operating system level. +In Unix, this can be done using the kill command. This should only +ever be used during extreme situations such as when the dataserver is consuming +up all the processing of the system. Only use kill if nothing else +will help because it could cause corruption of the databases in the dataserver. +

Once the dataserver is down, check +to make sure memory used by Sybase has been freed. See Step 5.for more +information on shared memory. When the shared memory has been freed, try +starting the dataserver using normal starting procedures. +

If the dataserver fails to start, see +if there are errors on the master device in the dataserver error log. Go +to Step 15. and make sure their are no problems there. If there are, correct +the problem with the device, and try restarting again. +

If there are no problems with the physical +devices, the master database will need to be completely recovered from +backup. +

+If there are any problems with the +physical devices, proceed to step 9. Otherwise, proceed to step 10. + +Step 9: Restoring +Generic Master Database +

Follow these steps to restore the generic +master database: +

    +
  1. +Backup Server Up?
  2. + +
      +

      +
      +
      +
      +
      +

    Make sure your backup server is up +and running. It will be used later to restore any backups of the master +database and to back up the restored database when the recovery is finished. +

  3. +Create data device
  4. + +
      +

      +
      +
      +
      +
      +

    Since the master device was corrupted, +it must be created anew. Open up the runfile used to start the Sybase +server in a text editor. Note the master device path and the size of the +master device; they will be used in the buildmaster command below +to recreate the master device. +

    To recreate the master device, enter +the following buildmaster command or the OS equivalent needed by +the downed server: +

    buildmaster d /sybdata/master.dbf +s 8704 +

    The d option is the master +device path and the -s is the size in pages. For this example, the +master device being created is 17MB (8704 2K pages) on /sybdata/master.dbf. +

  5. +Startup in Master Recovery Mode
  6. + +
      +

      +
      +
      +
      +
      +

    To start the dataserver in master recovery +mode, first make a copy of the current runfile. Edit this file, +and add the -m master recovery mode parameter. When the change is +done, use this file to start the dataserver process with the startserver +f master_runfile command. This mode will allow the master databases +system tables to be updated and prevent users from accessing the system. +

  7. +Recreate Masters Entries in Usages
  8. + +
      +

      +
      +
      +
      +
      +

    Before anything more can be done to +restore the data into the master database, the master device needs to be +restored to the same usage as it was when the last backup of the master +database was performed. Using hardcopies of the system tables sysusages, +sysdevices, +and +sysdatabases, +the entries in sysusages +can be recreated with alter database and create database +commands. +

    To decide if anything needs to be done +to at this point, examine the dbid +column in sysusages. +If more than one entry for dbid +equals 1 (see sysdatabases, +dbid = 1 = master db), additional space will need to be added into the +master database to make the sysusages +in the server match the one on paper. If there is only one entry, the sysusages +entry does not need any work and the next step is "Add Backup Server Entry +into sysservers". +

    Figures G-I contain example outputs +of the three tables needed for this restore. Note: the T-SQL statement +to extract this information is provided above the table output. +

    select name,dbid from sysdatabases +order by dbid +

    name dbid +

    ------------------------------ ------ +

    master 1 +

    tempdb 2 +

    model 3 +

    sybsystemprocs 4 +

    sybsyntax 5 +

    mydb 6 +
      +
      +
    +
    +
    +

    +

    Figure G: sysdatabases

    + +


    +
    +
    +
    +
    +
    +
    +
    +
    +

    select * from sysusages order +by vstart +

    dbid segmap lstart size vstart pad unreservedpgs +

    ------ ----------- ----------- ----------- ----------- ------ ------------- +

    1 7 0 1536 4 NULL 112 +

    3 7 0 1024 1540 NULL 608 +

    2 7 0 1024 2564 NULL 608 +

    1 7 1536 1024 3588 NULL 920 +

    5 7 0 1024 4612 NULL 272 +

    1 7 2560 2048 5636 NULL 2048 +

    4 7 0 8192 16777216 NULL 864 +

    6 3 0 5120 67108864 NULL 4720 +

    6 4 5120 2560 83886080 NULL 2552 +
      +
      +
    +
    +
    +

    +

    Figure H: sysusages

    + +


    +
    +
    +
    +
    +
    +
    +
    +
    +

    select * from sysdevices +

    low high status cntrltype name phyname +

    -------- ----------- ------ --------- ------------ -------------------------- +

    0 10751 3 0 master d_master +

    67108864 67113983 2 0 mydbdev /sybdata/mydbdev.dbf +

    83886080 83888639 2 0 mydblogdev /sybdata/mydblogdev.dbf +

    16777216 16785407 2 0 sysprocsdev /sybdata/sybprocs.dbf +

    0 1280 16 3 tapedump1 /dev/st0 +

    0 20000 16 4 tapedump2 /dev/st1 +
      +
      +
    +
    +
    +

    +

    Figure I: sysdevices

    + +


    +
    +
    +
    +
    +
    +
    +
    +
    +

    Since this is a recovery of the master database, we only have to +restore up to the last master database entry in sysusages. These would +be those where "dbid = 1," and the vstart less than or equal to the high +value from the master device, master, in sysdevices. Notice that there +are entries between the first and last master db entries on the master +device. When restoring the master database, these entries must also be +duplicated. In this example, +

    Fortunately, buildmaster has already done part of the work by recreating +the first three entries that correspond with the master, tempdb, and model +databases. In this example, there are only three entries to restore. +

    Now that the entries to be restored are known, the work of restoring +them with the alter database and create database commands can begin. Use +the dbid value in sysusages +to find out which database entry needs to be added next. In this example, +the master database was extended for an additional 1024 blocks. This equates +to 2MB of spaces on the master device. Because the master database already +exists, this alter database command must be used. +

    alter database master on master += 2 +

    The next entry has a dbid that +corresponds to the sybsyntax +database. Its segment size is also 1024, so it size is 2MB. Since this +database has not been created yet, the following create database command +must be used: +

    create database sybsyntax on master += 2 +

    Continuing to the final entry, the +dbid +signifies that it is another entry for the master database. The size in +this case is 2048 Blocks that equals 4MB (2048/512). The next command to +run would therefore be: +

    alter database master on master += 4 +

    Now that all the entries for the master +database are finished, a load can be performed of the backup, but only +after the dataserver knows about the backup server. +

  9. +Add Backup Server Entry into sysservers
  10. +
+ +To be able to do the recovery, the +dataserver needs to know the name of the backup server. If it is not SYB_BACKUP, +an entry will need to be added to sysservers. +

Use the following T-SQL command +to update an entry into sysservers. +

begin transaction +

update sysservers +

set srvnetname = "BIG_BACKUP" +

where srvname = "SYB_BACKUP" +

go +

/* Make sure the change is correct +*/ +

select srvnetname,srvname from +sysservers +

where srvname = "SYB_BACKUP" +

/* If the change is correct */ +

commit transaction +

/* If the change is not correct +*/ +

rollback transaction +
  +
  +

Proceed to step 10. +
+Step 10: Recent +Dump of Master Database? + +If there is a recent dump of the +master database, proceed to step 12. If one does not exist, proceed to +step 13. + +Step 11: Restore +Master from Dump +Use the following T-SQL command +to load the backup of the master database into the system. You might have +to change the "from" section of the command to match what is needed for +your operating system and environment. Review the section on using the +load command for more information. When the backup finishes, the dataserver +will automatically shut itself down. +

load database master from "/dev/nrmt0" +

Proceed to step 12. +
+Step 12: Update +Number of Devices in the Configuration +When a buildmaster is performed, +the configuration options of the system start at their default values. +This can cause problems when the system next comes up because the "number +of devices" is one of those values. It needs to be set to the value it +was when the backup was done, or not all the original devices will come +up. +

If the Sybase version of the dataserver +is before version 11, follow the procedures in the Sybase System Administration +manual to set the value of "Number of Devices" to the value it was before +the recovery began. +

If the version is 11 and later, edit +the dataservers configuration file, and make sure the "number of devices" +is set correctly. Next, edit the runfile created earlier, adding +or modifying the -c configuration file parameter to point to the +proper configuration file. +

At this point, the recovery is nearly +done, but before the users are allowed back onto the system, it is best +to double check all the changes made during the recovery. Also, if additional +device or database changes were made since the time of the master database +backup, these changes will all need to be made. +

To prevent the users from accessing +the system, and to allow the system tables to be updated, again start the +dataserver in master recovery mode. Use the runfile edited in Step +2 in a start server command appropriate for your operating system. +

Proceed to step 13. +
+Step 13: Restore +System Tables Using disk reinit and disk refit +

Once the system comes up, check the +system tables sysusages, +sysdevices, +and +sysdatabases +against the backup hardcopies. If there are devices on your hardcopy that +are not listed in the system, then an additional device was added since +the master backup was performed. In this case, a disk reinit will +update the sysdevices +table from information on the device. If the alter database or create +database commands were run since the last backup, disk refit +must be run to resync the dataserver with the additional devices and databases +available on the system. +

disk reinit +

The command to resync devices is disk +reinit, and it will add a device back into the sysdevices +table without initializing the device itself. To run this command correctly, +you will need the parameters used when the device was first being created +with the disk reinit command. The command syntax is in Figure J: +

begin transaction +

disk reinit +

name = " device_name", +

physname = "physical_name", +

vdevno = virtual_device_number, +

size = number_of_blocks +

[, vstart = virtual_address, +

cntrltype = controller_number] +

go +

/* Review the sysdevices table +to make sure it matches with +

what is in hard copy */ +

select * from sysdevices +

go +

/* If everything matches */ +

commit transaction +

/* If there are differences */ +

rollback transaction +

+
Figure J: Syntax of disk reinit
+
+
+As you can see, the syntax is exactly like the disk init command except +for the word reinit. Make sure to run this only for devices that do not +already appear in the sysdevices +table. +

disk refit +

Once all devices that had been added +to the system after the backup have been redefined in sysdevices, +all databases additions and changes to databases after the backup can be +resynced. The command to do this is called disk refit. What it does +is visit every disk defined on the system, and use system information stored +in it to reset the sysusages +and sysdatabases +tables to what they should be. The command syntax is in Figure ?: +

disk reinit + +
Figure J: Syntax of disk refit
+
+
+There are two restrictions on this command. First, it must be run as +SA, and second, the system must be started in single-user mode or the command +will not be allowed. Once the command finishes running, the system will +automatically shutsdown. +

Once either or both of these commands have been run, compare the +values in sysusages +and sysdatabases +to make sure they match the hardcopy of these tables. If they do not, the +system could come up without all devices and databases being online. +

For more information on how to run +disk +refit and disk reinit, see the Sybase System Administration +Guide. +

Review sysusages/sysdatabases/sysdevices/syslogins/sysloginroles +

Because the master database contains +information about all the other databases, it is important to make sure +everything is in working order. Follow these checks to help confirm the +fitness of your newly restored master database. +

    +
  1. +Check the sysdevices, +sysusages and sysdatabases +system tables.
  2. + +
  3. +Review each database, checking all the +major tables. Run common selects on these tables verifying the data inside.
  4. + +
  5. +Run the dbcc checkalloc command +on all databases. See the section Database consistency checker: the +dbcc utility for information on how to run this command.
  6. + +
  7. +Double-check the ownership and permissions +of all databases. If user logins were added or deleted since the backup +of the master database, these changes will need to be run against the system +again in the same order they were originally added to the system. If these +changes are done out of order, the suids of the logins added will +be different than when they were first added to the system. This will cause +a mismatch between the suid controlling the ownership and permissions in +a database and those in the system logins. Because of this, there could +be problems with access to the databases and objects in the dataserver.
  8. +
+ +To make sure the suids are the same, +compare the hardcopy of the syslogin +system table and the online version of the table. If they do not match, +there could be problems. Refer to Sybases Security Information Guide for +more information on recreating changes to logins if the original scripts +cannot be found. +

Another art of Sybases security is +login roles. The sysloginroles +system table contains information regarding these roles. It is important +that this table also is checked to make sure all roles are the same as +before the problem started, otherwise users might not have the same abilities +as before. To make sure they are the same, compare the hardcopy of sysloginroles +system table and the online version of the table. Refer to Sybases Security +Information Guide to help recreate changes to the system roles if they +do not match. +

Proceed to step 25. +
+Step 14: Sybsystemprocs +Available? +

The sybsystemprocs +database is where Sybase locates all the system stored procedures. When +a stored procedure is created here, it is available from any database in +the dataserver. Therefore, when this database is unavailable, many important +system stored procedures will be unavailable, such as sp_help, +sp_helpdb, +and sp_helpdevices. +The rest of the dataserver databases will come up as long as they do not +have problems, but the functionality of the dataserver will be severely +degraded. +

+If sybsystemprocs is available, proceed +to Step 20. Otherwise, proceed to step 15. + +Step 15: Problem +With Data Devices? +

If the dataservers error log file shows +there are problems initializing a data device, then follow these commands +to check OS device problems, fix them, and maybe replace them. +

    +
  1. +Check ownership and permissions on devices.
  2. + +
      +

      +
      +
      +
      +
      +

    Use the OS commands to check permissions +and make sure the process running the Sybase dataserver program has permissions +to read and write to these devices. If they do not, either change the permissions +on the devices, or change the user that is running the dataserver program +to one that does have the correct permissions +

  3. +Is the device functioning correctly or +does the device even exist?
  4. + +
      +

      +
      +
      +
      +
      +

    Sometimes, because of a change in the +system, the operating system might no longer "see" the device. Also, there +might be hardware failures or configuration errors that could cause the +device to not appear. Use the appropriate OS procedure to check the devices +status and review all errorlogs. +

  5. +Unable to Detect Device at Startup?
  6. +
+ +Sometimes, when an OS cannot detect +a device when a Sybase server is started, the device will be unavailable, +and Sybase will not be able to recover the database. In this case, the +database will be marked "suspect". This flag on the database tells Sybase +not to waste anytime trying to recover this database the next time the +dataserver restarts. This is efficient, but if the devices problems are +fixed with the data intact, the database should be able to come up the +next time the system starts. +

To get around this problem, the "suspect" +flag must be cleared from the database. Unfortunately, there is no easy +stored procedure to run to do this. If you are sure the database will come +up ok on a restart, run the T-SQL commands in Figure K, replacing +yourdbname +with the name of the database the server needs to recover. +

use master +

/* tell dataserver to allow changes +to system tables */ +

sp_configure "allow updates", 1 +

reconfigure with override +

go +

/* change the suspect status bit +*/ +

begin transaction +

update sysdatdabases /* Only make +change */ +

set status = status 256 /* Turn +the suspect flag (28 bit set) off */ +

where dbname = yourdbname /* only +on this database */ +

and status & 256 = 256 /* and +only if flag is set */ +

go +

commit transaction +

/* tell dataserver to Not allow changes +to system tables */ +

sp_configure "allow updates", 0 +

reconfigure +

+
Figure K: Removing the suspect +flag
+
+Restart the server at this point. If the database still does not come +back on, then something else must be wrong with the database. Review the +dataserver error log for more information, and possible contact Sybase +support. +If there are still problems with the device files, repeat Step +15, then proceed to Step 16. If the device files are fine, proceed to step +17. +
+Step 16: Replace +Device/Disk File +

If a drive goes bad, but another one +of the same size is available, the new one can be used in the place of +the old in the Sybase database. This is because Sybase uses logical devices +that point to the actual devices. +

All databases that were using this +device will need to be restored. To find out which databases these are, +run the T-SQL command shown in Figure L: +

select sysdevices.name as DevName, +

sysdatabases.name as DBName, +

sysusages.size/512 as Size +

from sysdatabases, sysusages, +sysdevices +

where +

sysdevices.name="BadDeviceName" +and +

sysdevices.low <= sysusages.vstart +and +

sysdevices.high >= sysusages.vstart +and +

sysusages.dbid = sysdatabases.dbid +

Example Output: +

DevName DBName Size +

----------------------------- ------------------------------ ----------- +

BusDev1 BillingDB 3 +

BusDev1 ClientDB 2 +
  +
  +

+
Figure L: Locating databases +that use a particular device
+ +


+
+
+
+
+
+

This command will only work if the master database at least comes +up.

+
+With this information, the original device must first be deleted from +the system before the new device can be added. Before this, all databases +that were using this device must first be dropped from the system. Use +the drop database or the dbcc repairdb(dropdb,dbname) commands to drop +the databases. Once they are all gone, then the device can be dropped by +using the sp_dropdevice stored procedure. +

Once the device is dropped, the new device can be added back into +its place. Use the same disk init command that was used to create the bad +device, but replace the physical name of the bad device with the name of +the good device and use a different vdevno. For example, if the original +disks physical name was /dev/rdsk/c0t2d1s0 with a disk init command of: +

disk init +

name="BusDev1", +

physname="/dev/dsk/c0t2d1s0", +

vdevno=6, +

size=2048

+The new command using the replacement +device /dev/dsk/c1t3d0s1 would be: +disk init +

name="BusDev1", +

physname="/dev/dsk/c1t3d0s1", +

vdevno=10, +

size=2048

+Once the device has been restored, then +all the databases that were using that device need to be restored. + +Proceed to step 17. + +Step 17: Restore +base sybsystemprocs from sybsystemprocs script +

When the sybsystemprocs +database +needs to be restored, follow these directions: +

    +
  1. +Find out which devices sybsystemprocs +was installed on.
  2. + +
      +

      +
      +
      +
      +
      +

    First, try the command sp_helpdb +sybsystemprocs to find out how large and on what device it was created +on. Most likely this will not work. In that case, run the T-SQL +command in Figure M instead: +

    select sysdevices.name, sysusages.size/512 +

    from sysdatabases, sysusages, +sysdevices +

    where +

    sysdatabases.name = "sybsystemprocs" +and +

    sysusages.dbid = sysdatabases.dbid +and +

    sysdevices.low <= sysusages.vstart +and +

    sysdevices.high >= sysusages.vstart +

    Example output: +

    name +

    ------------------------------ ----------- +

    sysprocsdev 16 +
      +
      +
    +
    +
    +

    +

    Figure M: Finding out size of device

    + +


    +
    +
    +
    +

  3. +Drop the sybsystemprocs +database.
  4. + +
      +

      +
      +
      +
      +
      +

    In case the sybsystemprocs +database is corrupt, it is best to drop it and recreate it fresh. First +try the T-SQL command drop database sybsystemprocs. But, +if it is not allowed by the system, use dbcc dbrepair (sybsystemprocs,dropdb) +to drop the database. +

  5. +Recreate the sybsystemprocs +database on the device shown in the above command.
  6. + +
      +

      +
      +
      +
      +
      +

    1> create database sybsystemprocs +on sysprocsdev= 16MB +

  7. +Run installmaster or Restore from +Backup
  8. + +
      +

      +
      +
      +
      +
      +

    If there is a current backup of the +sybsystemprocs +database, then it can be used to restore the database with the load command +like this: +

    load database sybsystemprocs from +"device" +

    Use the appropriate OS command for +the dataserver environment. +

    If there is no backup of the sybsystemprocs +database, it will need to be recreated using T-SQL script installmaster. +This script can be run safely without worry that it will affect other databases. +To run the script, provide it as input to the isql program like +this: +

    isql U sa S SYB_MYDB +P mypasswd \ +

    i $SYBASE/scripts/installmaster +

    or +

    isql U sa S SYB_MYDB +P mypasswd \ +

    < $SYBASE/scripts/installmaster +

    There will be a lot of output from +this command, but it can be safely ignored until the script ends with the +message "Loading of the master database is complete". +

  9. +Add any additional Stored Procedures/Changes
  10. + +
      +

      +
      +
      +
      +
      +

    If additional stored procedures were +added to the system since the backup or since the initial install, recreate +them now using the the scripts used to original create them or by entering +the commands interactively. +

  11. +Check Stored Procedures
  12. + +
      +

      +
      +
      +
      +
      +

    Like with all the other recoveries, +it is important to check out the recovery. The following command will show +if the sybsystemprocs +database is restored correctly. +

    sp_helpdb sybsystemprocs +

    If output describing the sybsystemprocs +database is displayed when this is run, then system stored procedures have +been restored correctly. Run any user defined stored procedures to confirm +they run correctly. +

  13. +Dump the sybsystemprocs +database
  14. +
+ +Once the database has been restored, +make a complete backup of it with the dump command. +Proceed to step 18. + +Step 18: Recent +Dump of database? + +If there is no dump of database to +restore, go to step 25 to reapply any creation/alteration scripts and load +any bcp from database. If there is a recent dump, continue step 19. + +Step 19 : Restore +from Dump +

If the database is still offline at +this point, then something is wrong with it internally, and it should be +restored from backup. Before anything else is done, retrieve information +about the database to use in the recreation. Enter the command in Figure +N to find out about the allocations used by this database. +

select sysdevices.name, +

size as Blocks, +

size/512 as Mbytes +

from sysusages, sysdevices, sysdatabases +

where sysdatabases.name = "baddbname" +and +

sysusages.dbid = sysdatabases.dbid +and +

sysdevices.low <= sysusages.vstart +and +

sysdevices.high >= sysusages.vstart +and +

sysdevices.cntrltype = 0 +

order by vstart +

Example Output: +

name Blocks Mbytes +

---------------------- ---------- --------- +

device1 1536 3 +

logdev1 1024 2 +

device3 2048 4 +

+
Figure N: Database allocations
+
+
+ +
    +
  1. +Drop the database
  2. + +
      +

      +
      +
      +
      +
      +

    First, try to drop the database using this T-SQL command: +

    drop databasebaddbname +

    If this command fails, use this dbcc +command to drop the database: +

    dbcc repairdb(dropdb,baddbname) +

    To verify the database has been dropped, +run the stored procedure sp_helpdb. If the database is shown in +the output, something went wrong with the drop command. +

  3. +Recreate the database
  4. + +
      +

      +
      +
      +
      +
      +

    Using the information from Step 2 above, +recreate the database using the same allocations it had. Here is an example +based upon the example output in Step 2. +

    create database baddbname +

    on device1 = 3 +

    log on logdev1 = 2 +

    alter database baddbname +

    on device3 = 4 +

  5. +Bring the database online.
  6. + +
      +

      +
      +
      +
      +
      +

    At this point, the database has been +recreated, but the system will not bring it online until it is told to. +The reason the system does this is it has no way of knowing if there are +any more transaction logs to process. To tell Sybase the database should +be brought up by running the online database baddbname +command. +

  7. +Load the database from dumps.
  8. + +
      +

      +
      +
      +
      +
      +

    Now reload the database using the most +recent database and transaction dumps. First, apply the full database backup. +For our example database, here is an example Unix load command using +the tape device /dev/nrmt0. +

    load database baddbname +from /dev/nrmt0 +

    After this completes, apply each transaction +log starting with the oldest and finishing with the newest. The system +will not allow load of the transaction logs to occur out of order. In fact, +if any of the logs are missing or corrupt, the rest of the logs cannot +be applied. To load the transaction logs for our example above, enter the +following command, repeating for each transaction dump: +

    load transaction baddbname +from /dev/nrmt0 +

    For more information on how to load +dumps and transaction logs, please refer to the section on Restoring from +a hot backup. +

  9. +Dump the database.
  10. +
+ +Whenever a database is restored, +run a full backup on the database, and backup the master database too. +The reason for the backup of the master database is because there was a +removal and creation of a database. Whenever there are major changes to +the dataserver, backup the master database. Continuing our example, the +dump command would be: +

dump database baddbname +to /dev/nrmt0

+Step 20: Is tempdb +Available? +

Is the tempdb online and available? +To see this, review the messages generated when trying to start the database. +There will be error messages complaining about tempdb not being available. +The system will not be able to come up at this point. +

+If there are error messages, proceed +to Step 15 to make sure tempdbs devices are all available. If tempdb is +available, continue to Step 23 to begin checking all the user defined databases. + +Step 21: Model +Database Available? + +If the model database online and +available, proceed to step 24. If it is not, continue to step 22 to recreate +the model database. + +Step 22: Recreate +Generic Model Database +

The model database is used by the dataserver +when creating all other databases. If a specific attribute (larger size, +tables, permissions, stored procedures) is required in all databases, make +the change first in the model database. Then, from that point on, every +database created will contain this change. +

If you have a backup of the model database +and you can use the isql use command, then the database can be restored +directly from backup using the load command. If not, then the database +must be restored from scratch. Follow the following commands to accomplish +this: +

    +
  1. +Run buildmaster
  2. + +
      +

      +
      +
      +
      +
      +

    If there is no backup available of +the model database, the buildmaster command can be used to restore +the model database to the same state as when the dataserver was first installed. +Here is a Unix example: +

    buildmaster d/syback/master.dbf +x +

    In this example, the -d parameter +is the master database file path, and the -x parameter tells buildmaster +to restore the model database. Be sure to use the appropriate buildmaster +command for your OS. This can be found in the Sybase Utilities manual. +

  3. +If a backup of the model database is available, +use it in a load command to restore it from the backup.
  4. + +
      +

      +
      +
      +
      +
      +

    load database model from"/sybackups/model.dmp" +

  5. +If additional changes were made since +the last backup, apply them at this point.
  6. + +
  7. +Backup the model database using the dump +command.
  8. +
+ + +Return to step 18. + +Step 23: All Databases +Online? +

If when the dataserver is coming up, +and an error occurs with any of the devices associated with that database, +or something has corrupted the database, the non-system database could +be prevented from coming and will be marked "suspect". +

To see if all the databases after the +system databases are online, review the messages generated when trying +to start the database. There will be error messages complaining about any +databases not being available. Fortunately, because all the system databases +are available, the Sybase server will come up. If this happens, proceed +to Step 15 to make sure all the user database devices are all available. +If all the database are available, then the system is up and no recovery +is required. +

+If all the databases are online, +you are done. If not, return to step 15. + +Step 24: Contact +Sybase Support +

If you have reached this step, then +some problem that might be specific to the OS, version of Sybase, or other +factors is occuring. Please refer to the Sybase System Administration guides +for your OS, or contact Sybase support for additional help. +

+Return to step 1. + +Step 25: Re-apply +any additional scripts or bcps. +

If any additional scripts had been +run on the database since the last dump, then these scripts need to be +run again. Since they are user created, you will need to refer to the users +instructions on how to run these. +

On top of additional scripts, there +might be bcp files of data from these databases. Follow the steps needed +to restore these data files to the originating database tables. Please +refer to the Logical Backups section below on loading data into the system. +

+Proceed to step 26. + +Step 26: Review +against hardcopies +

Using hardcopies of the system tables +and any important user tables, compare the current data with the data captured +in the hardcopies. If there are discrepancies, then there could be additional +problems with the dataserver. Refer to the Sybase Administration manuals +to find out if the differences need to be repaired or are ok. Double check +all permissions on the database if any changes were made in the syslogin +or sysrole tables. If there are, correct before turning the system over +to the user population. +

+Proceed to step 27. + +Step 27: Dump +all Restored Databases +

Now that the database has been restored, +dump the database to a new dump file. This way a current dump of the database, +including all scripts and data changes, will be saved. This will help facilitate +recovery in case future problems. +

+Return to step 1. + + + + diff --git b/unixbr.gif a/unixbr.gif new file mode 100755 index 0000000..ae7f434 Binary files /dev/null and a/unixbr.gif differ