There’s Metadata in those File Names

By Daniel Antion posted 07-09-2013 20:31


Long before we decided to use SharePoint for document management, we knew that there would be some benefit in being able to figure out what was in a document without having to open it. Unfortunately, since we started creating our particular digital landfill back in 1988, we had to be very creative. Those of you who remember that period understand the significance of ‘8-dot-3’ and since the ‘3’ was often sacrosanct; you were really left with 8 characters to guide your successors, not to mention your future self, to the right file. You could extend or amplify that with folders, but sadly they were also limited to 8 character names. Still, intelligent people could pack a lot into those 8 characters.

When Windows 95 arrived, we felt empowered. With all those characters (255) we could almost write an abstract as a file name – “1998-annual-report-this-is-the-version-before-we-sent-it-to-the-printer.doc” but some still favored brevity. Our accountants, earliest among our staff to cozy up to the PC as a business tool, eagerly accepted characters 9, 10, 11 …15, but didn’t seem interested in characters 16 – 255; just enough to get the job done. Now that we are moving some of those historic documents into SharePoint, we are finding that they included exactly what was needed. The combination of their well-designed folder hierarchy, an intelligent naming convention and SharePoint’s workflow capability has enabled us to rapidly move several hundred reports into SharePoint libraries and set enough metadata columns to render truly useful results.

The process was pretty straightforward. The filenames were consistently named in keeping with the convention illustrated by “99999_L_Q1_2013.pdf.” We used a “substring extraction” action  to:

  • Grab the filename (or name field) of the document and set it to a local variable “File Name”  (this local variable is used in subsequent steps to extract strings)
  • Grab the first 5 characters of “File Name” and save to local variable “Member Number”
  • Starting at index 7, grab 1 character and save to local variable “Pool”
  • Starting at index 9, grab 2 characters and save to local variable “Period”
  • Starting at index 12, grab 4 characters and save to local variable “Fiscal Year”
  • Set each of the related list fields to their corresponding local variables.

One of the list columns was a “Member Name” list. To set that field, we looked up the “Member Number” local variable against a separate “Member Lookup” list that had a “Member Number” and its corresponding “Member Name”. The workflow found the Member Number in that lookup list, returned the corresponding Member Name and then we used that value to set the field name “Member Name.”

The important lesson here isn’t how to parse a filename – you can probably figure that out as easily as we did. The important thing is that naming conventions that make (or made) sense to a specific group of employees, can easily be exploited to yield great benefit to subsequent generations within a department or to employees outside the department who may not be aware of the coding scheme. Also, keep in mind that products like MetaVis can extend this benefit by mining additional metadata from the folder hierarchy that the documents are stored in. I would also point out that if you have more elaborate naming conventions, the HarePoint Workflow Extensions can parse the entire string to an array, from which the individual variables can be referenced by index. I have written about both of these products on my SharePoint Stories blog.

#sharepoint #filenames #workflow #metadata #namingconventions #SharePoint