Advanced | Flow of The Week: Convert Office documents to PDF on the fly using Microsoft Flow

Whats up Flow Fans! 

This week we have a special FOTW From one of our MVP's – Paul Culmsee!

Paul Culmsee () is a management and technology consultant, Microsoft MVP and award winning author from Perth, Western Australia. He co-founded Seven Sigma Business Solutions () and specialises in sensemaking, helping organisations (re)discover their purpose, knowledge management, strategic planning, IT governance, facilitation and all facets of SharePoint and Office365 delivery

 

Sympathy for the on-site technician

Hi everyone.

Imagine a company where service technicians perform on-site repair of equipment. Inevitably, in doing this sort of work, the technician will need to refer to equipment drawings, service history, past photos, specifications and/or operating manuals.

These days PowerApps is fast-becoming a great option for such a scenario because many field workers prefer to use their phone or a tablet. I have made many apps like this and PowerApps is a great solution for this use-case. But PowerApps also has some limitations, and right now that is around the display of documents from SharePoint. For a start, it is impossible to display office documents natively in PowerApps at this time, and there are authentication-related issues in certain circumstances when pulling content from SharePoint.

But fear not… with a 6-step flow, it is possible to solve this problem. This flow allows a remote user to securely request a document from SharePoint, but importantly, converts that document to a PDF on the fly.

There are two big benefits from this:

  1. A reduction in time and effort for document controllers. If a document frequently changes, it is most likely in word, excel or PowerPoint format. They do not have to worry about converting it to PDF.
  2. It allows the document to be viewed in PowerApps natively (As a result of #2, on top of some Flow kung-fu, we will learn some PowerApp tricks in this article too :-).

Now in the (admittedly large article to follow, I go into detail on how to set this up, but if you prefer to see this in video form, we also have you covered…

Step 1: Setting up a SharePoint library

Since this is a field-worker scenario, let’s make a SharePoint library and be good digital citizens by throwing in an extra couple of columns so we can tag documents by their type. I’ll keep it deliberately simple for the purpose of this post, but you can extend this type of library setup in whatever manner suits you.

I created a new SharePoint site, and added the following columns to the default document library:

  • Equipment – Single Line of text
  • DocumentType – Single Line of text

 

Now upload some documents and tag them by type and by equipment. In the diagram below, take note that I’ve also thrown in a folder with some documents inside. While I won’t win any information architecture awards for doing this, it is very much the reality for many organisations and it demonstrates that the flow we will build accommodates complex folder structures also.

 

Oh… also keep the documents relatively simple. I can’t guarantee this auto-pdf goodness works with crazy large documents or those with embedded bits and pieces.

Building the Flow

The flow we are going to build is going to be triggered from PowerApps. PowerApps will pass in the ID and folder path of a particular file, and the flow will do a small bit of data cleansing before using a very powerful action called Send an HTTP request to SharePoint to bring it back as a PDF.

Please note this flow is deceptively simple as it only requires as little as 5 actions, but we are using some nice trickery so take your time…

Step 1: Create a new Blank Flow, and choose the PowerApps trigger

Step 2: Add an Initialize Variable Action.

This variable will hold the ID of the file sent from PowerApps. For reasons that will become apparent later, rename this action to FileID, add a variable called ID, make it String format.

For the Value, click Ask in PowerApps from the Dynamic Content panel (click see more if this is missing)

If you do this right, you will see a FileID_Value parameter name in the textbox. Now when we invoke this flow from PowerApps, this is the name of the parameter that the user will see. Eg:

Why is this? Well, behind the scenes, PowerApps used the name of the action to generate this parameter name, hence why it made sense to rename it something simple so you don’t make the PowerApps view confusing.

Note: I tend to add comments to my flow actions to make the intention clear as shown below.

 

Step 3: Add another Initialize Variable Action.

This variable will hold the folder path of the file and will also be sent from PowerApps. Rename this action to FolderPath, add a variable of the same name. This time make it a string and once, again use “Ask In PowerApps” to set a parameter called “FolderPath_Value” as shown below…

Step 3a. Sanity check interlude!

Please note this common trap for new players… if you clicked Ask in PowerApps more than once, then you might have some additional unwelcome parameters. I recommend being really (really) careful here to ensure you only have two parameters defined, otherwise it will make working with PowerApps a bit of a pain later. So, check that in Dynamic content, you only see the two parameters as shown below…

Step 4: Clean up the data.

A common occurrence in the world of messing with low-level stuff is that data is not always in the format we need it to be. In this case, it turns out that when PowerApps sends the folder path to Flow, it will send it in this format.

DocumentLibraryName/Folder/

Unfortunately, when we send this to SharePoint to get the PDF URL, it needs to be in a format that includes the site collection URL and the removal of the trailing slash.

/Sites/SiteName/DocumentLibraryName/Folder

Adding the site collection URL is easy – and we will do that later. But the trailing slash needs a flow expression to strip the trailing slash from the path. The expression will look like this:

substring(<path>,0,sub(length(<path>),1))

If you are new to Flow expressions, then yes – I agree that they are ugly. But trust me, they do become quite intuitive over time and actually can really bring your flows to life. For the record, what we are doing here is using a substring function to grab all except the final character of FolderPath variable. To exclude the final character, I take the length of FolderPath and subtract it by 1.

Now we could sort this out by adding another initialize variable action to reformat the path in the way we want it. But that means an extra step and variable. So, let’s be cool and modify the previous step to do it in one hit. So, go back to your FolderPath action and delete the FolderPath_Value reference in the Value textbox.

Now click Expression in the Dynamic content panel and check out this little-known trick. Type in the word substring( and you will see a somewhat annoying pop-up like so…

Now, make sure your cursor position is in between the two brackets after the word substring! Then click the Dynamic Content tab, and you can now choose the FolderPath parameter. It will add it into your expression. Neat huh?

Now your expression will look like the following:

substring(triggerBody()['FolderPath_Value'])

 

This neat trick allows you to write flow expressions and refer to variables or parameters without manually having to type them in. Now all we need to do is fill out the rest of the expression using this reference to the FilePath parameter.

From: substring(<path>,0,sub(length(<path>),1)) we replace <path> with triggerBody()['FolderPath_Value']:

 

substring(triggerBody()['FolderPath_Value'],0,sub(length(triggerBody()['FolderPath_Value']),1))

Your flow action will now look like the screenshot below… note that I completed this task by adding the above expression to the comment for the action to make it clear what is going on…

Step 5: Get the PDF Information from SharePoint

Now if you want to just get down to business, feel free to skip to bit where I show the next action to add. But if you want to know what we are going to do, read on!

This Flow leverages a little-known capability of SharePoint that among other things, allows us to generate image thumbnails and PDF’s of documents. This capability is an API with the nerdy name of RenderListDataAsStream. In a nutshell, it is possible to pass a reference to a document and it will dutifully spit out the URL to a PDF version.

To do this, we need to pass 4 things to the API.

  1. The document library where we want to get the file from
  2. The folder where this file resides in the library
  3. The SharePoint ID of the file we want to PDF
  4. A special code that tells the API to bring back a URL of the generated PDF

For reference, a sample API call for a file with an ID of 14 in the default SharePoint document library in a folder called “test” would look like this.

https://culmsee.sharepoint.com/sites/flowoftheweek/_api/web/lists/GetbyTitle(‘Documents’)/RenderListDataAsStream?FilterField1=ID&FilterValue1=14

Don’t try the above link in the browser, as it is a POST request. Additionally, we need to send some information in the request body too, namely the folder where the file resides and the code to get the PDF URL. The body looks like this:

    "parameters": {

       "RenderOptions" : 4103,

       "FolderServerRelativeUrl" : "/Sites/FlowOfTheWeek/Documents/test"

    }

}

Ok, so what is the deal with the RenderOption number above? Well, this API does lots more than just generate a PDF, and that parameter allows you to specify what information you want back. The documentation includes a table of different interesting things you can return, which you can do by adding the values together.

Label

Description

Value

ContextInfo

Return list context information

1

ListData

Return list data

2

ListSchema

Return list schema

4

EnableMediaTAUrls

Enables URLs pointing to Media TA service, such as .thumbnailUrl, .videoManifestUrl, .pdfConversionUrls.

4096

 

So, we are asking this API not just to bring back the data associated with a list item, but also some additional useful stuff. The last entry is particularly interesting as it mentions a mysterious beast known as the Media TA service which I assume means either “translation” or “totally awesome” :-). Basically, what happens is if we total the numbers listed in the above table (4103), we will end up all the data we need to do PDF conversion.

Okay enough talk!

Add a SharePoint action called Send an HTTP request to SharePoint. Set the Site Address to the site that contains your document library and set the Method to POST. Set the URI to _api/web/lists/GetbyTitle(<docLib>)/RenderListDataAsStream?FilterField1=ID&FilterValue1=, where <Doclib> is the name you specified for the document library

( for example, mine is _api/web/lists/GetbyTitle(‘Documents’)/RenderListDataAsStream?FilterField1=ID&FilterValue1= ).

Finally, on the end of the URI, click Dynamic Content and choose the ID variable as shown below:

In the Body section, paste the following configuration (watch the quotes when pasting from this article):

    "parameters": {

       "RenderOptions" : 4103,

       "FolderServerRelativeUrl" : "/<your site collection URL>/"

    }

}

In my case the FolderServerRelativeURL was “/sites/flowoftheweek/” but if you use the root site collection, it will simply be a slash “/”.

Finally, place your cursor just after the slash in the FolderServerRelativeURL parameter and from Dynamic content, choose the FolderPath variable.

Step 6: Save and Test the Flow

At this point, click the Test icon in the top right of the screen. Choose the option I’ll perform the trigger action and click the Save & Test button. On the popup that follows, click the Continue button and on the next screen, type in the ID number of one of the documents in your library and the folder path the document resides in.

For example: The first document uploaded to the library will likely be ID 1 and if is the default SharePoint document library, the folder will be Shared Documents/

Click the Run Flow button. Your flow will start and you can click Done. Assuming it worked, you will see a green tick of happiness in the history.

Click on the Send an HTTP Request to SharePoint action to expand it. We need to grab the output from the API call for the next action. Find the OUTPUTS section and copy the entire contents to the clipboard….

Note: If you made an error (e.g., you asked for a file that does not exist or the document library is empty), then the subsequent steps will fail. You can quickly sense-check this by looking at the clipboard output in notepad.

If there is no file found, you will see a line “Row”: [] in the ListData section like so…

{

  "wpq": "",

  "Templates": {},

  "ListData": {

    "Row": []

 

But if you did retrieve a file, there will be heaps of data inside Row like so…

{

  "wpq": "",

  "Templates": {},

  "ListData": {

    "Row": [

      {

        "ID": "1",

        "PermMask": "0x7ffffffffffbffff",

        [snip a heap of stuff]

      }

Note: The output needs to have row data before moving to the next step

Step 6: Add an action to help us work with APi output

Go back to edit mode and add a Data Operations action called Parse JSON to your flow. This action will allow us to make use of the output of the API call in the subsequent flow step.

Click the Use sample payload to generate schema link, paste your clipboard contents into the window and click the Done button.

In the Content field, go to Dynamic content panel and choose Body from the Send an HTTP Request to SharePoint action.

 

Step 7: Construct the PDF URL

Now when this API is called, a lot of data is returned. The purpose of the Parse JSON action in step 6 was to process all this output and turn it into flow objects to make our life easier here. This allows us to quickly grab only certain data from the output of the previous flow step without having to parse the output ourselves via more complex expressions.

At this point, you might be thinking that one of those is a nicely formatted PDF URL all done for us. Unfortunately, this is not the case. Microsoft give you all the bits you need, but it is up to you to put it all together. As a result, we need to do some more work to create the URL by trawling through some of the data returned by the previous step.

Of all the output currently in your clipboard, the main one that interests us is this entry…

“.pdfConversionUrl”: “{.mediaBaseUrl}/transform/pdf?provider=spo&inputFormat={.fileType}&cs={.callerStack}&docid={.spItemUrl}&{.driveAccessToken}”

This parameter is basically a template for generating the PDF URL. All the stuff in curly braces are tokens that have to be replaced by the actual values that are also returned as part of the API call. For example, if I search the clipboard content for the first token in .pdfConversionURL called  {.mediabaseURL}, I find this entry…

“.mediaBaseUrl”: “https://australiasoutheast1-mediap.svc.ms”

Now go and look at .pdfConversionUrl again. Replace {.mediabaseUrl} with https://australiasoutheast1-mediap.svc.ms and now we have

“.pdfConversionUrl”: “https://australiasoutheast1-mediap.svc.ms /transform/pdf?provider=spo&inputFormat={.fileType}&cs={.callerStack}&docid={.spItemUrl}&{.driveAccessToken}

Get the idea? We need to replace the remaining tokens ( {.fileType}, {.callerStack}, {spItemUrl} and {driveAccessToken} and replace them. Once we have done this, we finally have created our PDF URL. When we subsequently access that URL, we will receive the converted document in PDF format without needing to store the PDF. The source document can stay in its native office format!

Phew! Now let’s get this done…

Add an Initialize Variable action to your flow, name the variable PDFURL (or something similar) and set its Type to String format. image_thumb[35]_thumb

Now we come to the most complex bit of the flow where we have to substitute the tokens we just examined. Be careful here as this is the most likely place to make an error. In the Value textbox, click the Dynamic content flyout and find .mediaBaseUrl from the Parse JSON action…

Next, add the following text to the Value textbox, taking care not to delete what you just added in the previous step.

/transform/pdf?provider=spo&inputFormat=

Now we come to a slightly tricky bit. The next bit of content we need is the file type of the document we are dealing with. The bit that is tricky about this even though we are only asking for a single file when we call the API, it comes back as an array. Why? Well this API allows you to process multiple files in one go, so it always returns an array in the output, even if you only requested a single file.

The offending bit of data returned by the API is shown below. Inside a ListData object, we have a an array of Row objects:

  "ListData": {

    "Row": [

      {

        "ID": "1",

We need to get the file type of the document, so the PDF converter knows what it is dealing with. Like the way we dealt with removing the trailing slash from the FilePath variable in step 4, we can use another expression to handle it. Click the Expression tab and type in the following:

first(body('Parse_JSON')?['ListData']?['Row'])?['File_x0020_Type']                                                          

 

What this expression is doing is assuming we are only handling one file at a time and grabbing the first element of the Rows array and then grabbing the File_x0020_Type property.

Next, add the following text to the Value textbox, taking care not to delete what you just added in the previous step.

&cs=

Also, be super careful here because at the time of writing, the cursor in this textbox can randomly move and wipe out your edits…

Now in the Value textbox, click the Dynamic content panel and find .callerStack from the Parse JSON action…

Next, add the following text to the Value textbox, taking care not to delete what you just added in the previous step.

&docid=

Now we come to another array that needs to be handled. This is the URL of the document we are dealing with. Click the Expression tab and type in the following:

first(body('Parse_JSON')?['ListData']?['Row'])?['.spItemUrl']

 

Okay we are almost done… Add an ampersand ( & ) to the Value textbox, and then click the Dynamic content panel and find .driveAccessToken from the Parse JSON action…

Whew! We are done. I realise that was a bit of effort for one flow action, but it will all be worth it in the end. As a final step, rename this task to “Generate PDF URL”.

Step 8: Send the PDF URL back to PowerApps

Add the PowerAppsRespond to PowerApps action to the flow. Click the Add an output icon and choose Text from the list of output types. Name the output PDFURL and set the value to the variable you created in step 12 (PDFURL).

Ok we completed the flow. Save it and give it a quick review. It should look something like this…

If you have not done so already, save your Flow and give it a short, sharp name like GetMyPDF.

Building a sample PowerApp

Now let’s build a sample field worker iPad app. This app will allow a field worker to choose an equipment type, and then based on the equipment type, choose the available document types for it. While we will not win any design awards for this proof of concept, the added-bonus is you can learn how to do cascading dropdowns using SharePoint metadata.

The concept is shown below. First our trusty technician picks the equipment they are dealing with (step 1), and then they further refine by document type (step 2). Since the dropdowns cascade, only document types that exist for a chosen piece of equipment will be selectable.

This in turn refines a list of matching documents, irrespective of which folder they actually live in. From there the user clicks the “go” button (step 3) to display the PDF version (step 4).

To create the app, in the PowerApps portal, make a canvas app from blank and choose the Tablet form factor…

Next, let’s connect the app to the SharePoint document library we used at the start of this article. From the View menu, choose Data Sources and click the Add data source button. Make or use a SharePoint connection and choose the site collection where you uploaded the documents.

Click the Go button and manually type in the name of your document library and click Connect (at the time of writing, you still have to link to SharePoint document libraries by typing in their name).

From the Insert menu, choose Gallery and add a Blank vertical gallery. Set the data source to the connection you just made…

Click on the pencil in the gallery, and then from the Insert menu, choose Label. Click on this label and in the Text property, change it to ThisItem.'{FilenameWithExtension}'. You should now see the files and folders from the document library listed.

Note: {FilenameWithExtension} is one of a number of properties of a SharePoint library that PowerApps provides. We will soon use another built-in one as well…

Now let’s make use of the SharePoint columns we created at the start of this article. Recall that I made a column called DocumentType and one called Equipment. We are going to use these to filter this gallery so that only certain files are shown. After all, our field workers want to quickly get to the documents they need…

Click outside of the gallery, and then from the Insert menu, choose Controls and then pick Dropdown from the list

Set the Items property for this dropdown to Distinct(Documents, Equipment). If you then hold the Alt key, you will be able to click the dropdown and see the equipment. This list has been built from the SharePoint document library. The distinct function returns all unique values for the specified column (Equipment) without hardcoding the values.

Note”: We are seeing a blank value in the dropdown, because of the folder I created in this library which has not been tagged to a device or document type. But the empty value is seen as something unique by the Distinct function. If you want to make it go away, try this alternative setting for the Items property. This will remove the empty value…

Filter(Distinct(Documents, Equipment), !IsBlank(Result))
 

Insert another dropdown and place it under the first one. Set the Items property for this one to

Distinct(Filter(Documents, Equipment = Dropdown1.Selected.Value ), DocumentType )

What this does is filter the document library to only items that match the currently selected equipment. Then it uses distinct to get all the unique document types for that equipment. This pretty much means you now have a cascading set of dropdowns. The values in the second dropdown are based on what was selected in the first dropdown.

Now let’s return to our gallery and modify it so that it only shows items based on the dropdowns. In the Items property of the gallery, change it to:

Filter(Documents, Equipment=Dropdown1.Selected.Value, DocumentType=Dropdown2.Selected.Value)

Note that depending on the name of your dropdown controls, you might need to modify the text above.

Test by changing the values of the dropdowns and confirm that different documents are listed in the gallery.

Next let’s add a button to the gallery so a user can load the PDF. Select your gallery and click the pencil icon. From the Insert menu, add a Button and set the label to “go”. Position it to the right of the file name. Also double check that you placed the button inside the gallery and not on the screen by confirming the button sits as a child control of the gallery in the left hand navigation.

Click on the newly minted button and let’s now call our PDF generation flow. From the Action menu, click on Flows and in the data panel, find your Flow.

If this worked, the flow will be asking you for the two “Ask in PowerApps” parameters that you set up earlier. For the first parameter, enter Gallery1.Selected.ID and for the second, enter Gallery1.Selected.'{Path}'. Your formulae should look something like this…

E.g.: GetMyPDF.Run(Gallery1.Selected.ID, Gallery1.Selected.'{Path}')

Note: I am assuming your Gallery is called Gallery1.

We are not quite done though, because we need to capture the output of running this flow as a variable. This will give us the PDF URL. To achieve this, modify your expression so that the flow is inside the Set function …

Set(MyPDF, GetMyPDF.Run(Gallery1.Selected.ID, Gallery1.Selected.'{Path}') )

 

If you want to test things at this point, simply preview the app and press the button. If flow has worked, you can see the variable via the View menu so see if a URL has been created.

Finally, let’s display the PDF.

From the Insert menu, choose Controls and add a PDF Viewer control. Place it to the right of the gallery and set the Document property to MyPDF.pdfurl as shown below:

If all things go to plan, you should be viewing PDF’s. Use the dropdowns to find a different document and click the button to see the PDF. It may take a few seconds to load the first time, particularly if the original document is large, but it should happily display.

Try some different document types… For example, below is a sample excel document that I loaded into the library…

Taking it further (and caveats)

In the real world, I have used this technique to deliver documents to field staff with much more sophisticated navigation and searching capability. It has been used on large construction sites to share safety bulletins and it has been integrated with more extensive PowerApps solutions for managing callouts/jobs and managing assets. Field staff were stoked that they were able to quickly call up and view key information without having to navigate through a file share or SharePoint.

By the way, this method can also can be used to generate image thumbnails too, making it excellent for apps that are photo heavy and bandwidth sensitive. I even utilised this the thumbnail variation of this technique to add photos to activity feeds in apps and have even used to create rich action cards in Microsoft Teams.

Another use for this approach is its utility as a free PDF conversion tool. Paul O'Flaherty pointed this out to me that is more elegant than the commonly used OneDrive Flow action that many people use. Using a variation of the flow I outlined in this article, we can save a html file directly to a SharePoint library, and then use that file’s ID and Path to get the PDF URL of it. Finally, we can use the flow HTTP action to get a hold of that PDF and save it into SharePoint. Neat eh? We don’t need to use OneDrive connector anymore.

So what’s the catch? The main one is the common caveat – potentially lots of flow runs. Remember that with this method, each time a user clicks a button, a flow run is used to generate the PDF URL. One can improve this by leveraging the fact that we can call the RenderListDataAsStream API and send it multiple documents to convert to PDF. Thus, a single flow run can actually generate a lot of PDF (and thumbnail) URL’s.

Finally, Ashlee and I recorded a video of this technique so you can follow along with that over at YouTube.

Phew! We are finally done. Thank for sticking with me and please let us know your feedback and how you plan to utilise this approach. This feedback often gives me ideas and new directions to explore.

Till next time

Paul Culmsee

Company: www.sevensigma.com.au

Books: www.hereticsguidebooks.com

Blog: www.cleverworkarounds.com