Home > Blog

How to split strings and select which elements you want

Extracting First and Last Elements from Delimited Strings in Excel Using TEXTSPLIT and TEXTJOIN

If you have a string separated by a delimiter, how can you pull the first and last elements out?

What we have

What we want

AA_1234_FG

AA_FG

BBBB_DDD_22_334_AB

BBBB_AB

DD_EE

DD_EE

GG

GG

 

The old ways

One way would be to use the Text to Columns method to split the string at the “-“ and place it in separate columns.  You would then need to find a way to pull out the 1st and last from a variable number of columns.

Another way would be to use the Left and Right commands. However, you will need to find out where the right element starts.  This is covered in a blog “find the nth instance of text in a cell” so it would be

Left section=left(A1,find(“_“,A1)

Right section =RIGHT(A1,LEN(A1)-FIND(CHAR(1),SUBSTITUTE(A1,"_",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"_","")))))

Putting these together gives us:

=Left(A1,find(“_“,A1)&”_”&RIGHT(A1,LEN(A1)-FIND(CHAR(1),SUBSTITUTE(A1,"_",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"_","")))))

A more elegant way

Which is not the most elegant way to do this.   It is easier (and cleaner) to use a combination of

  • TextSplit – splits the text at the delimiter and spills the results
  • ChooseCols – allows you to choose which columns you want to keep in an array
  • TextJoin – joins text with a delimiter of your choice

= TEXTJOIN("-",,CHOOSECOLS(TEXTSPLIT(A1,"_"),1,LEN(A1)-LEN(SUBSTITUTE(A1,"_",""))+1))

So how does this work?

  • Textsplit(A1,"_") splits the AA_1234_FG into three columns using the "_" delimiter (but does so only in the formula not spilling in the sheet)
  • LEN(A1)-LEN(SUBSTITUTE(A1"-","")) determines the number of “_” by comparing the length of the string with and without the “_”.
  • ChooseCols then picks the 1st column and the last column (as counted by the above+1).
  • And finally, textjoin stitches them back together with a new delimiter.

Error handling

Notice that there will be an error if there are no “_”, as in the last example, the entry is repeated.  To avoid this we can check for the error using a Find statement:

=IF(ISERROR(FIND("_",A1)),A1,TEXTJOIN("-",,CHOOSECOLS(TEXTSPLIT(A1,"_"),1,LEN(A1)-LEN(SUBSTITUTE(A1,"_",""))+1)))

 

 

Miles Goodchild

Miles Goodchild has been a Program and Programme Planner, PMO lead and Project Manger since 1998.  He enjoys using MS Office, especially Project, Excel and Visio to make life easier and simpler.  In the course of this he created SummaryPro.  In this blog he shares some of the tips and techniques he has learnt over the years in the hope that they will be useful to you.