I released a number of changes to my projects workflows, making sure I had released them as a new dll version in the GAC. The workflows use a DelayActivity to “sleep” for about four hours, before waking and checking if the due dates of the tasks involved are out of date. If so, then an email is sent to the task owner. After I released the new workflow version, new workflows instances would start fine, but after the four hour delay, an ominous error was found in the Workflow History and now email escalations occurred:
“Workflow Failed to run”.
Checking the ULS log, there was also the following error:
“OWSTIMER.EXE (0x07EC) 0x0E58 Windows SharePoint Services Workflow Infrastructure 98d8 Unexpected System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.SharePoint.Workflow.SPWorkflowManager.RunWorkflowElev(SPWorkflow originalWorkflow, SPWorkflow workflow, Collection`1 events, SPRunWorkflowOptions runOptions)”
A quick Google search brought up this old Microsoft Blog post where a number of users where experiencing the same problems. Although the bug has been fixed in later hotfixes and service packs (which my system already had), one of the causes stood out as the reason for my problem. That is; when releasing new versions of code into the GAC, the timer service is still using old versions which are non existent and so it falls over.
One recommendation to prevent this problem is to restart the Timer service, which is fine, but it doesn’t fix the workflows already running which are experiencing the problem. However, reading through the blog someone mentioned being able to view the status of the timer jobs in the content database using the following SQL command:
SELECT * FROM ScheduledWorkItems Where id = ’20EF2AC4-64D1-47E4-91C1-4E5F4BFC7CD4′ – where this is the problem workflow ID.
When you run this command, you will notice the DeliveryDate. This is the time the system will wake up the workflow. There is also the column “InternalState” which for problem workflows seemed to always be set to “9″ and for ok workflows set to “8″.
Using a broken test workflow, I did some hacking around, and changed the internal state value of the problem workflow to 8, and set the DeliveryDate value to sometime shortly in the future. I also, noted that the time originally in this field was 1 hour behind what I expected, and assume this is because I am running under GMT +1 timezone. Therefore I set the DeliveryDate to be one hour behind the time in the future, or else I would be waiting another hour longer than I expected! Hope that makes sense!
I then restarted the Timer service on the SharePoint server and low and behold, the problem workflow shortly “woke up” and did its job as expected when the DeliveryDate time had passed.
I changed all the other problem workflows in the same way and they all came back to life with no problems.
I have now made sure I have amended my deployment procedures of new workflow versions to restart the timer service so this problem shouldn’t happen again.