Intelligently Splitting an Address

Filed under: SQL Server, Technology, Uncategorized — Tags: , , , , , , , , — Developer42 @ 08:05

An issue recently arose at work where I was asked to take a street address from one field, and split it into two for use with another system. The simple solutions available were:

  • simply splitting on the first comma to appear in the string, and putting each side of this split into the respective fields.
  • breaking the string down the middle.
  • putting everything I could into the first field, then any remainder into the second.

However, none of these seemed to feel right; the first solution would work well, but didn’t take the destination fields’ sizes into account, and the second and third solutions could break half-way through a word / in the case of the third, may not use the second field.
As a result, I came up with the following function; it’s probably not the best solution (since this was written in half an hour on getting home from a night out, and with just a quick check that all worked OK this morning), but hopefully it’ll be useful to some people.

The function takes 3 arguments; the string to be split, the length of the first field to be populated, and the length of the second. It returns a table with an ID field to give the order of the elements (not really required), and a String field to give the two return values in separate rows. This function will always return exactly 2 rows.

The function is written in T-SQL for SQL Server, though I believe it should also work with Sybase.





CREATE FUNCTION dbo.SplitSpecial


      @stringToSplit nvarchar(max)

      ,@len1 int

      ,@len2 int


RETURNS @returnVal table (id int identity(1,1),string nvarchar(max)) 



      declare @i int = 0                        --used as a counter when searching for chars

      , @j int = 0                              --another char counter

      , @ldiff int = 0                    --says how many characters we have spare to play with

      , @mark int = 0                     --a marker

      , @tempmark int   = 0                     --a marker

      , @back nvarchar(max)               --stringToSplit in reverse

      , @temp nvarchar(max)               --for storing works in progress

      , @forward bit = 0                        --used to indicate the direction of the string from the marker


      --remove any starting, ending or duplicate spaces should they exist, to try to cram in the characters

      select @stringToSplit = ltrim(rtrim(@stringToSplit))

      while (charindex('  ',@stringToSplit)>0)

            select @stringToSplit = replace(@stringToSplit,N'  ',N' ')


      --find out what we have to play with / reduce max size of strings to size of input

      select @tempmark = len(@stringToSplit)

      select @len1  = 0.5*((@len1 + @tempmark) - abs(@len1 - @tempmark))

            ,  @len2  = 0.5*((@len2 + @tempmark) - abs(@len2 - @tempmark))

      select @ldiff = (@len1 + @len2) - @tempmark


      if @ldiff > 0


            --get a good starting point / break point: no longer required

            --select @mark = round(len(@stringToSplit)/2,0) --assume we can get away with halving the string

            --select @mark = 0.5*((@mark + @len1) - abs(@mark - @len1)) --account for first field being too short (least function math hack)

            --select @mark = len(@stringToSplit) - 0.5*(((len(@stringToSplit)-@mark) + @len2) - abs((len(@stringToSplit)-@mark) - @len2)) --account for 2nd field being too short (least function math hack)


            --the two fields are big enough for us to try some funky stuff with

            --is there a delimeter we can use?

            select @temp = substring(@stringToSplit,len(@stringToSplit)-@len2,@ldiff+1)--get crossover chars; the +1 accounts for dropping the delim

            while patindex('%[a-z0-9]%', @temp) > 0

                  set @temp = stuff(@temp, patindex('%[a-z0-9]%', @temp), 1, '')

            if (len(@temp)+charindex(N' ',@temp)) > 0 --a suitable delim exists in the crossover area


                  if charindex(N',',@temp)>0 --if commas are available, these are the preferred delimeter

                        select @temp = N','


                        if charindex(N' ',@temp)>0  --spaces are the second favourite

                              select @temp =N' '

                  select @back = reverse(@stringToSplit)

                  , @i=1

                  , @j = ceiling(len(@stringToSplit)/2)-1

                  , @forward = 1

                  , @mark = len(@stringToSplit)

                  while @i<=(len(@temp)+charindex(N' ',@temp))--try out all available delimeters (if comma or space are present, others are ignored)


                        while @j>0


                              select @tempmark = charindex(substring(@temp,@i,1),@stringToSplit,@j)

                              if @tempmark > 0 and @tempmark < @mark

                                    select @mark = @tempmark, @forward = 1

                              select @tempmark = charindex(substring(@temp,@i,1),@back,@j)

                              if @tempmark > 0 and @tempmark < @mark

                                    select @mark = @tempmark, @forward = 0

                              if(@mark<=@len1 and (len(@stringToSplit)-@mark)<=@len2)


                              select @j = @j - 1


                        select @i = @i + 1


                  --select @mark = case @forward when 0 then len(@stringToSplit)-(@mark+1) else @mark end

                  select @mark = case @forward when 0 then (1+len(@stringToSplit))-@mark else @mark end



      --catch any issues & account for when we can't find a good break point / there isn't space

      if (@mark <= 0 or @mark > @len1)

            select @mark = @len1


      insert into @returnVal (string)     values (ltrim(rtrim(substring(@stringToSplit,1,@mark))))

      insert into @returnVal (string)     values (ltrim(rtrim(substring(@stringToSplit,1+@mark,@len2))))








--examples of the function in action

select * from dbo.SplitSpecial('33 Rue De Maisons, Paris',100,100)

select * from dbo.SplitSpecial('33 Rue De Maisons, Paris, France 123456',100,100)

select * from dbo.SplitSpecial('33,Rue,De,Maisons,Paris,France,123456',100,100)

select * from dbo.SplitSpecial('33 Rue De Maisons Paris France 123456',100,100)

select * from dbo.SplitSpecial('33 Rue De Maisons, Paris, France 123456',8,50)

select * from dbo.SplitSpecial('this string is too big for the fields',10,10)

At some point, I hope to update this to allow it to take any number of fields of different lengths, to make it more general (e.g. should you ever have a full address to be split into 4 or 5 lines), but for now, it works (so far, at least)


2010 Scripting Games

Filed under: Microsoft, Technology — Tags: , , , , — Developer42 @ 15:40

I’ve just discovered a coding competition for people who write small but useful utility scripts. Since I my projects tend to get affected by my short attention span, but I enjoy writing useful & helpful stuff, this seems the ideal arena for my work. If you’re also interested in the games, please see the site, linked below:

2010 Scripting Games

Grab this badge here!

Create a free website or blog at

%d bloggers like this: