AMDGPU: Avoid most waitcnts before calls

Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.

Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.

llvm-svn: 363465
3 files changed
tree: a8e610d7ea94ae9da85fc9d43609ba037ab9f202
  1. .arcconfig
  2. .clang-format
  3. .clang-tidy
  4. .gitignore
  5. README.md
  6. clang-tools-extra/
  7. clang/
  8. compiler-rt/
  9. debuginfo-tests/
  10. libclc/
  11. libcxx/
  12. libcxxabi/
  13. libunwind/
  14. lld/
  15. lldb/
  16. llgo/
  17. llvm/
  18. openmp/
  19. parallel-libs/
  20. polly/
  21. pstl/
README.md

The LLVM Compiler Infrastructure

This directory and its subdirectories contain source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and runtime environments.