drm/amd/amdgpu embed hw_fence into amdgpu_job

Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Jack Zhang <Jack.Zhang7@hotmail.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
author: Jack Zhang <Jack.Zhang1@amd.com> 2021-05-12 15:06:35 +0800
committer: Alex Deucher <alexander.deucher@amd.com> 2021-08-16 15:16:58 -0400
commit: c530b02f39850a639b72d01ebbf7e5d745c60831 (patch)
tree: 274c664d421fc31914f5b6a241094cd29231e011 /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent: 554594567b1fa3da74f88ec7b2dc83d000c58e98 (diff)
download: linux-c530b02f39850a639b72d01ebbf7e5d745c60831.tar.gz
linux-c530b02f39850a639b72d01ebbf7e5d745c60831.tar.bz2
linux-c530b02f39850a639b72d01ebbf7e5d745c60831.zip
1 files changed, 12 insertions, 1 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 65ccd1144863..395779ee44d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4449,7 +4449,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev)
 int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 				 struct amdgpu_reset_context *reset_context)
 {
-	int i, r = 0;
+	int i, j, r = 0;
 	struct amdgpu_job *job = NULL;
 	bool need_full_reset =
 		test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags);
@@ -4473,6 +4473,17 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 		if (!ring || !ring->sched.thread)
 			continue;
 
+		/*clear job fence from fence drv to avoid force_completion
+		 *leave NULL and vm flush fence in fence drv */
+		for (j = 0; j <= ring->fence_drv.num_fences_mask; j++) {
+			struct dma_fence *old, **ptr;
+
+			ptr = &ring->fence_drv.fences[j];
+			old = rcu_dereference_protected(*ptr, 1);
+			if (old && test_bit(AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT, &old->flags)) {
+				RCU_INIT_POINTER(*ptr, NULL);
+			}
+		}
 		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
 		amdgpu_fence_driver_force_completion(ring);
 	}
author	Jack Zhang <Jack.Zhang1@amd.com>	2021-05-12 15:06:35 +0800
committer	Alex Deucher <alexander.deucher@amd.com>	2021-08-16 15:16:58 -0400
commit	c530b02f39850a639b72d01ebbf7e5d745c60831 (patch)
tree	274c664d421fc31914f5b6a241094cd29231e011 /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent	554594567b1fa3da74f88ec7b2dc83d000c58e98 (diff)
download	linux-c530b02f39850a639b72d01ebbf7e5d745c60831.tar.gz linux-c530b02f39850a639b72d01ebbf7e5d745c60831.tar.bz2 linux-c530b02f39850a639b72d01ebbf7e5d745c60831.zip