KBA-01471: Moving Cataloged Files to Archive

Question:

My Spitfire Catalog has grown quite large. How do I avoid backing it all up all the time?

Answer:

Both Spitfire and SQL have tools to deal with terabytes of data and the reality that most of it becomes effectively read-only.  For example, some editions of SQL Server support “table partitioning”—but this feature actually complicates backup and recovery scenarios. Spitfire‘s solution is “Archive Databases.”

Archive Databases

Spitfire supports a series of Archive Databases: you add each archive database as required as time goes on, perhaps one per year, but this is actually arbitrary and totally dependent upon your needs.  The “newest” archive database is the “current” archive database, and Spitfire can be configured to move versions of files in the catalog that have aged beyond a certain threshold into the current archive.

The reason you‘ll want more than one archive database is that all archives older than the current archive are marked “‘read-only” (a SQL feature), and these read-only databases need not be backed up daily and could even be stored on cheaper, slower storage devices.

Spitfire‘s main catalog keeps the indexing information, so accessing something that has been archived may take a second or two longer but is otherwise totally transparent to the end user.  The only loss of functionality is that archived versions cannot be deleted.

Version Pruning

In addition to archiving, Spitfire can prune versions of files generated from templates.  The pruning process finds all versions created within a certain number of hours (perhaps 3).  If any of these multiple versions are not routed to anyone and there are other versions before and after, then the file version is considered redundant and prunable.  Pruned versions are deleted and cannot be recovered.  A gap will appear in the file version sequence.  A file can only be pruned automatically if it is a template-generated file and other versions exist within the time window.

Q & A

Is data read-only?  Yes, archived versions are read only and cannot be deleted or removed.
Are there limitations to what we can do with archived data?   There aren’t supposed to be any, other than permanence.
What is pruning?  Pruning removes redundant versions of template generated files that were generated/refreshed in close time proximity, only if those interim versions were not used as routed content to anyone.

Narrative Example

Let’s use an example to help clarify pruning, version and final.  Lets say that a file has 8 versions, versions 1-7 may be moved to archive “quickly” (perhaps after 3 months).  Version 8 is “final” and will only be moved to archive after 13 months.   Before archiving, versions 1-7 are read only but it is possible for an admin (or user with permission) to go and remove some of those versions.  After archival, those versions are part of the permeant record.   Since it is so rare for anyone to think about manually removing file versions, the system pruning function will perhaps detect that versions 1-6 were created in series in the course of 2 hours.  Version 5 is routed.  The system would keep 1 and 5 and “prune” 2,3, and 4.   If after a couple of years someone edits Version 8, Version 9 will be created.

Archive Processing

For the job to run, the SQL Server Agent must be running.  The routine to move files into the current archive runs once a day and

  • selects the oldest versions based on configurable months of age and SQL Transaction log overhead limit, and moves them.
  • selects the oldest audit trail data and moves it. (See KBA-01587: Retention of Audit Trail Data.)
  • checks the health of statistics on the xsfFileVersion table and updates statics if necessary.  Statistics on this table are also set to not automatically recalculate to avoid long random delays.

Enabling Archiving

  1. Create an archive database using the SQL Archive tab in ICTool.
    1. Specify the database name, something like sfArchive2010.
    2. Click the CREATE button. If a prior archive databases exist, it is automatically made read-only.
  2. Review the parameters
    1. Prior Version Months—the number of months ago a file version that has been replaced by a new file version must have been cataloged before being archived.
    2. Final Version Months—the number of months ago the final (or only) version of a file must have been cataloged before being archived.
    3. Max Per Day—the number of MB of data that can be moved into archive daily.  Expect LDF file grown of 3x this value.  If the oldest eligible file exceeds this setting, only 1 file will be archived that day.
  3. On the SQL Job Tab in ICTool, create the job to move eligible file to archive daily.
  4. On the Finish Tab in ICTool,
    1. Verify that Apply sfPMS Package is checked.
    2. Notify users (or else they might be annoyed).
    3. Click the Publish Configuration button.

Adding a New Archive Target

  1. Create an new archive database using the SQL Archive tab in ICTool.
    1. Specify the database name, something like sfArchive2020.
    2. Click the CREATE button.  You will be asked for paths – these are on the SQL server, not the VM where you are running ICTool.
    3. If a prior archive databases exist, it is automatically made read-only.
  2. Use SSMS to review ownership of the new archive.  Compare the owner of this archive to the owner of the previous archive.  If necessary, use
    ALTER AUTHORIZATION ON DATABASE::new_archive_name TO new_owner_name;
  3. On the Finish Tab in ICTool,
    1. Verify that Apply sfPMS Package is checked.
    2. Notify users (or else they might be annoyed).
    3. Click the Publish Configuration button.

Enabling Pruning

In ICTool, on the SQL Archiving tab, review the parameters:

  1. Prune Version Months—the number of months before file versions can be pruned.  1-5 months can be specified.  0 disables the pruning feature.   This number must be less than the Prior Version Months value specified above.
  2. …Within hours—the number of hours within which multiple versions are considered for pruning.  A number between 1 and 23 can be specified.  Larger number will potentially allow more pruning.

For example, if you specify 4 months and 8 hours, then 4 months after a series of versions are created, the interim versions within the 8-hour time span will be discarded automatically.  Unlike strict archiving, pruning does destroy data,  but only carefully selected versions of files that are likely redundant in the big picture.

Long Term Backups and SQL Upgrades

In ICTool, on the SQL Archiving tab, all but the current archive target should be marked as is read only. This means you can make a final SQL backup and move the backup to long term storage for disaster recovery.

If you upgrade the SQL server, you will need to

  1. Make the Archive DB writable, e.g., ALTER DATABASE sfArchive2015 SET READ_WRITE.
  2. Change its compatibility level, e.g., ALTER DATABASE database_name SET COMPATIBILITY_LEVEL = xxx
  3. Rebuild its indexes (exec sp_updatestats may suffice).
  4. Make a new long-term backup and store it appropriately.
  5. Mark the database as read only, e.g., ALTER DATABASE sfArchive2015 SET READ_ONLY

Additional Comments:

Keep in mind that archiving file versions does not reduce accessibility of the data.If your test site uses a copy of your live databases, you can share the archive databases.  Just be sure *not* to enable move to archive jobs in the test site!

You can use these commands to set the archive range when creating prior year archives to ‘catch up‘:

exec dbo.p_WriteConfig @company=‘[ALL]‘, @key=‘ArchiveFinalAfterMonths‘,@NewValue=‘13‘
exec dbo.p_WriteConfig @company=‘[ALL]‘, @key=‘ArchiveVersionAfterMonths‘,@NewValue=‘12‘


KBA-01471; Last updated: January 4, 2024 at 16:14 pm;
Keywords:  backup files; file backup; file archive