Pytorch mps slow. 79 GB, other allocations: 388.