An issue recently arose at work where I was asked to take a street address from one field, and split it into two for use with another system. The simple solutions available were:
- simply splitting on the first comma to appear in the string, and putting each side of this split into the respective fields.
- breaking the string down the middle.
- putting everything I could into the first field, then any remainder into the second.
However, none of these seemed to feel right; the first solution would work well, but didn’t take the destination fields’ sizes into account, and the second and third solutions could break half-way through a word / in the case of the third, may not use the second field.
As a result, I came up with the following function; it’s probably not the best solution (since this was written in half an hour on getting home from a night out, and with just a quick check that all worked OK this morning), but hopefully it’ll be useful to some people.
The function takes 3 arguments; the string to be split, the length of the first field to be populated, and the length of the second. It returns a table with an ID field to give the order of the elements (not really required), and a String field to give the two return values in separate rows. This function will always return exactly 2 rows.
The function is written in T-SQL for SQL Server, though I believe it should also work with Sybase.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION dbo.SplitSpecial
(
@stringToSplit nvarchar(max)
,@len1 int
,@len2 int
)
RETURNS @returnVal table (id int identity(1,1),string nvarchar(max))
AS
BEGIN
declare @i int = 0 --used as a counter when searching for chars
, @j int = 0 --another char counter
, @ldiff int = 0 --says how many characters we have spare to play with
, @mark int = 0 --a marker
, @tempmark int = 0 --a marker
, @back nvarchar(max) --stringToSplit in reverse
, @temp nvarchar(max) --for storing works in progress
, @forward bit = 0 --used to indicate the direction of the string from the marker
--remove any starting, ending or duplicate spaces should they exist, to try to cram in the characters
select @stringToSplit = ltrim(rtrim(@stringToSplit))
while (charindex(' ',@stringToSplit)>0)
select @stringToSplit = replace(@stringToSplit,N' ',N' ')
--find out what we have to play with / reduce max size of strings to size of input
select @tempmark = len(@stringToSplit)
select @len1 = 0.5*((@len1 + @tempmark) - abs(@len1 - @tempmark))
, @len2 = 0.5*((@len2 + @tempmark) - abs(@len2 - @tempmark))
select @ldiff = (@len1 + @len2) - @tempmark
if @ldiff > 0
begin
--get a good starting point / break point: no longer required
--select @mark = round(len(@stringToSplit)/2,0) --assume we can get away with halving the string
--select @mark = 0.5*((@mark + @len1) - abs(@mark - @len1)) --account for first field being too short (least function math hack)
--select @mark = len(@stringToSplit) - 0.5*(((len(@stringToSplit)-@mark) + @len2) - abs((len(@stringToSplit)-@mark) - @len2)) --account for 2nd field being too short (least function math hack)
--the two fields are big enough for us to try some funky stuff with
--is there a delimeter we can use?
select @temp = substring(@stringToSplit,len(@stringToSplit)-@len2,@ldiff+1)--get crossover chars; the +1 accounts for dropping the delim
while patindex('%[a-z0-9]%', @temp) > 0
set @temp = stuff(@temp, patindex('%[a-z0-9]%', @temp), 1, '')
if (len(@temp)+charindex(N' ',@temp)) > 0 --a suitable delim exists in the crossover area
begin
if charindex(N',',@temp)>0 --if commas are available, these are the preferred delimeter
select @temp = N','
else
if charindex(N' ',@temp)>0 --spaces are the second favourite
select @temp =N' '
select @back = reverse(@stringToSplit)
, @i=1
, @j = ceiling(len(@stringToSplit)/2)-1
, @forward = 1
, @mark = len(@stringToSplit)
while @i<=(len(@temp)+charindex(N' ',@temp))--try out all available delimeters (if comma or space are present, others are ignored)
begin
while @j>0
begin
select @tempmark = charindex(substring(@temp,@i,1),@stringToSplit,@j)
if @tempmark > 0 and @tempmark < @mark
select @mark = @tempmark, @forward = 1
select @tempmark = charindex(substring(@temp,@i,1),@back,@j)
if @tempmark > 0 and @tempmark < @mark
select @mark = @tempmark, @forward = 0
if(@mark<=@len1 and (len(@stringToSplit)-@mark)<=@len2)
break
select @j = @j - 1
end
select @i = @i + 1
end
--select @mark = case @forward when 0 then len(@stringToSplit)-(@mark+1) else @mark end
select @mark = case @forward when 0 then (1+len(@stringToSplit))-@mark else @mark end
end
end
--catch any issues & account for when we can't find a good break point / there isn't space
if (@mark <= 0 or @mark > @len1)
select @mark = @len1
insert into @returnVal (string) values (ltrim(rtrim(substring(@stringToSplit,1,@mark))))
insert into @returnVal (string) values (ltrim(rtrim(substring(@stringToSplit,1+@mark,@len2))))
return
END
GO
--examples of the function in action
select * from dbo.SplitSpecial('33 Rue De Maisons, Paris',100,100)
select * from dbo.SplitSpecial('33 Rue De Maisons, Paris, France 123456',100,100)
select * from dbo.SplitSpecial('33,Rue,De,Maisons,Paris,France,123456',100,100)
select * from dbo.SplitSpecial('33 Rue De Maisons Paris France 123456',100,100)
select * from dbo.SplitSpecial('33 Rue De Maisons, Paris, France 123456',8,50)
select * from dbo.SplitSpecial('this string is too big for the fields',10,10)
At some point, I hope to update this to allow it to take any number of fields of different lengths, to make it more general (e.g. should you ever have a full address to be split into 4 or 5 lines), but for now, it works (so far, at least)