Common Problems

  Q: Why can’t I connect to the system after entering a VPN logon?

  A: Please confirm that there is no erroneous message during your login process, the "SSL VPN Client "control of the lower right corner of the desktop is open.

  Q: Why does it prompt "you have reached the maximum account" during the VPN login process?

  A: Because each VPN account has its own maximum login link numbers, make sure that the account does not reach the allowed terminals at the same time, and each time before you exit the account please click the "Exit" button first, then close the page, or the account will still exist some time before timeout and exit, the available number of links occupied in a short time.

  Q: Can users change the VPN password?

  A: The users can not change their VPN password by themselves at present, if you really need please contact us and ensure the complexity and security of the password.

  Q: Where can I find the MPI in the TH-1A system?

  A: The system comes with the MPI in the path of " / usr / local "of the login node, you can select different versions for your need.

  Q: Can users download their own MPI on the TH-1A?

  A: Just can they support the IP network protocol.

  Q: What is the different between TH-1A ‘s MPI and Intel MPI、mpich2, etc?

  A: TH-1A computer depends on its own high-speed interconnect system, with a high communication efficiency and ultra-low communication latency, it can perfectly support the parallel communication between tasks in the system. The system’s MPI is developed for the special needs. By using the system’s MPI it can play a greater system performance. High-speed interconnection system is compatible with the IP protocol, therefore you can use the Internet to download other MPI. While there will be a greater loss in performance. So we recommend our user to use the system’s MPI.

  Q: What's the different in submitting jobs between yhrun, yhbatch and yhalloc ?

  A: In Tianhe system the process of running a job is divided into two steps: resource allocation and task load.

         yhrun command is interactive , by running the yhrun command we can finish the resource allocation and task load. When running the yhrun command in a login shell, firstly it submits a job request to the system and waits for the resource allocation, and then load jobs on the assigned nodes, the job will exit when the login shell disconnects;

         yhbatch command is batched, it can take charge of the resource allocation, after getting the node resource it will submit the script in the first node,the job will still run when the login shell disconnects;

       yhalloc command is the allocation mode. The main difference between yhalloc and yhbatch is that after command resource allocated, yhalloc command can run the jobs directly on the submitted node. It is suitable for the specified node and special command. The job will exit and the resource will lost when the login shell disconnects.

       You can find more information about the above three commands in the user manual, here yhbatch command is recommended, because both the resource allocation and the submitting the job is independent of the login shell.

  Q: Why can my job exit automatically before it has been finished in the case of having run two days?

  A: TH-1A system is quite large, many problems can affect your job such as network, storage etc. So we have done some limit with the continuous running time of the job for each partition. The job will exit automatically when the time limit is reached in order to avoid greater losses to the user. The user can set up a checkpoint in their program to continue the broken jobs.

  Q: If you encounter some operating error, what should be done?

  A: Treating the common error such as: "No enough the endpoint resources," Job credential expired ", the user can find the error node through the log, use the command "-x node" to move out the error node then submit the job again. For example, "-x cn1" means I don’t want the cn1 node included when we apply for the resources. We also hope that you can contact with us at once when you meet the error, you had better provide the error log information so that we can analyze and deal with it efficiently.

