Fix race condition (COMPLETED/BOT_DIED) in task_runner.

Per b/69462084 and a TODO comment in this file, a race condition
exists when an external process polls task status that fails with
BOT_DIED (there's a short window in which the task will be marked
as COMPLETED before being updated to BOT_DIED). This change sends
the task-update without the exit code (to prevent the COMPLETED
status from being set), and then sends a task_error message. The
default must_signal_internal_failure is also changed to provide a
default error in case run_isolated fails.

TESTED=Ran unit tests in task_runner_test and they pass.

Any other tests?

Change-Id: I1db3e15ccbdac3da9181b273681180211b07841c
Reviewed-on: https://chromium-review.googlesource.com/834397
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Cr-Mirrored-From: https://chromium.googlesource.com/infra/luci/luci-py
Cr-Mirrored-Commit: 0b027452e658080df1f174c403946914443d2aa6
diff --git a/run_isolated.py b/run_isolated.py
index 0cdf3a6..96c2b0e 100755
--- a/run_isolated.py
+++ b/run_isolated.py
@@ -583,7 +583,7 @@
     'duration': None,
     'exit_code': None,
     'had_hard_timeout': False,
-    'internal_failure': None,
+    'internal_failure': 'run_isolated did not complete properly',
     'stats': {
     # 'isolated': {
     #    'cipd': {
@@ -704,6 +704,10 @@
                 command, cwd, env, data.hard_timeout, data.grace_period)
         finally:
           result['duration'] = max(time.time() - start, 0)
+
+    # We successfully ran the command, set internal_failure back to
+    # None (even if the command failed, it's not an internal error).
+    result['internal_failure'] = None
   except Exception as e:
     # An internal error occurred. Report accordingly so the swarming task will
     # be retried automatically.