In a perfect world, the data you are searching for is easily searchable in a single database and documentation is always current.  However, often you have a legacy system which has fields that are repurposed to store different types of data. For example, descriptors or memo fields can contain important business information or links to other side systems. Another issue that is commonly encountered is database fields are not consistently named – for example, OrderNo, OrderID, and OrderNumber.  This is often the result of siloed development or the functionality of the legacy system being extended beyond what the system is originally designed to do.

Based on a sample search term, I wanted a systematic way to discover what primary data entities are associated with a specific business process as well as the entity relationships. The solution also needed to account for scenarios where data can exist in mismatched in field names or in a set of information within a single database field.  For my specific case, the database was a SQL Server with 1000s of columns across many tables supporting a well-known ERP system with multiple application modules.

One way to handle this issue is to build SELECT queries using the metadata catalog and using the search term’s value and length to improve the probability of making a match.   This method works best with strings that are intuitively unique which improves the probability of the search.  So as an example, a search term “CAF-0006099” as an order number would work better than “6099”.

Below is the sample code I’ve written to aggregate the search results of dynamically generated metadata search queries.  The IsFound column will flag on the first wildcard match which is better for scanning large tables.  This method of search will not be 100% as a wildcard search can incorrectly make a positive match on both “CAF-0006099” or “CAF-0006099Z”.  But as a first pass, this code is still useful in shortlisting fewer columns from potentially 1000s of columns across many tables. From there, you can do a more detailed level of querying.

declare @search_term as varchar(255), @sql_template varchar(8000)
declare @max_rows int, @row_ctr int, @sql_stmt varchar(8000)

set @search_term ='CAF-0006099';

if object_id('tempdb..#SearchResults') is not null drop table #SearchResults

create table #SearchResults(
  IsFound int,
  ColName varchar(8000)
 );

if object_id('tempdb..#cmdlist') is not null drop table #cmdlist

create table #cmdlist(
    RowID int identity(1,1),
    cmd varchar(8000)
);

set @sql_template ='insert into #SearchResults(IsFound, ColName) '
+ 'select top 1 1 IsFound, ''[$schemaName$].[$tableName$].[$colName$] as ColName'' ColName '
+ 'from [$schemaName$].[$tableName$] with (Nolock) where [$colName$] like ''%$searchTerm$%'';'


insert into #cmdlist(cmd )
select
replace(replace(replace(replace(
    @sql_template ,'$schemaName$',c.Table_schema)
    ,'$tableName$',c.Table_name)
    ,'$colName$',c.Column_name)
	,'$searchTerm$',@search_term)
as SqlStmt
from information_schema.columns c
where c.DATA_TYPE like '%char%' and c.CHARACTER_MAXIMUM_LENGTH>= len(@search_term )

set @max_rows =@@rowcount
set @row_ctr =0

while @row_ctr < @max_rows
BEGIN
    select @sql_stmt = cmd from #cmdlist where RowID= (@row_ctr +1)
    exec (@sql_stmt);

    set @row_ctr=@row_ctr +1;
END

select ColName, IsFound
from #SearchResults;

Leave a Reply

Your email address will not be published. Required fields are marked *