Troubleshooting Bevel¶
Like any software program, Bevel also may not work correctly first time; this guide helps to troubleshoot some of the common problems faced when using Bevel first time.
Tip
When deploying a network using Bevel, it is common for various kubernetes pods and jobs to undergo an initialization phase, where they take some time to start. Bevel's automation, powered by Ansible, diligently waits for these components to reach a state of "Running" or "Succeeded" before proceeding with subsequent deployment steps. During this process, you might encounter messages like "FAILED - RETRYING: ...". This is not a bug.
Troubleshooting in Bevel involves understanding and managing the retry mechanism embedded in the Ansible component. Each individual component has a retry count, and this count can be configured to suit the specific requirements of your network in the configuration file network.yaml
. Typically, when everything is operating as expected, the components successfully transition to the desired state within 10-15 retries.
To facilitate efficient troubleshooting, it's advisable to monitor the components during the retry attempts.
Example
From a new terminal window, execute
If any pod/job is showing error or getting restarted then check the logsThis proactive approach allows you to inspect the status of the components before the maximum retry count is reached, potentially avoiding unnecessary wait times until an error or failure message appears in the Ansible logs. By leveraging this understanding of the retry mechanism, you can streamline the troubleshooting process and ensure a smoother deployment experience with Bevel.
Important
Before starting the troubleshooting, do check that you have used all the supported software/tool versions correctly.
Flux Errors¶
invalid apiVersion "client.authentication.io/v1xxxx
- Check that you have compatible Kubernetes, kubectl versions and correct kubeconfig file.
Unable to mount config-map git-auth-xxxx
- Gitops private key file path is wrong or file is unreadable by Ansible controller. Check the
gitops.private_key
in network.yaml value is an absolute path, and the file is readable by the Ansible controller. Unable to clone repository
- Correct permissions have not been given to the gitops public key or your access token. Check that correct permissions are given as per this tutorial.
Also, check the
gitops.git_url
is either correct SSH (ssh://git@github.com/<username>/bevel.git
) or HTTPS (https://github.com/<username>/bevel.git
) clone address of the git repository. It may be that your Kubernetes has SSH is blocked, in which case use HTTPS instead. No such file or directory
- Files are not getting committed to the git repo from Ansible controller. Check there are no errors when you do
git commit
manually from thebevel
directory in your Ansible controller or the docker container. Also, check thegit branch
is same as that you have configured innetwork.yaml
.
Hyperledger Fabric Errors¶
CA pod is in Init:Crashloopbackoff state
- Issue with Vault connectivity. If the pod
ca-xxxx-xxx
has status asInit:Crashloopbackoff
, check the logs of the init containerca-certs-init
of this pod. This can be checked using the commandkubectl logs ca-xxxx-xxx -c ca-certs-init -n supplychain-net
. Ensure Vault Service is running and correct Vault address was used (do not use 127.0.0.1 or localhost). - Issue with Vault authentication. If the logs mention "access denied", make sure that the
xxx-vaultk8s-job-yyyy
andxxx-cacerts-job-yyyy
has completed successfully. Any Vault authentication problem is because of running different configurationsnetwork.yaml
on the same Vault. Please ensure that you reset the network before re-running with a different network.yaml. pod has unbound immediate PersistentVolumeClaims
- Storgeclass was not created or is not working. Check that you haven't modified any storage class templates, then check
network.organization.cloud_provider
for incorrect cloud provider. If you have modified storage class, please make sure that the storage class works with the mentioned cloud provider undernetwork.organization.cloud_provider
. Create Channel pod is in Crashloopbackoff
- Non-accessibility of proxy URL(s). Check the logs of the pod
createchannel-CHANNEL_NAME-xxx
. This can be checked using the commandkubectl logs createchannel-CHANNEL_NAME-xxx -n ORG_NAME-net
. If the logs mentions at the endError: failed to create deliver client: orderer client failed to connect to ORDERER_NAME.EXTERNAL_URL_SUFFIX:443:failed to create new connection: context deadline exceeded
then, check the external URL suffix being available and check its access from the security groups of the VPC. Ansible playbook failed but no createchannel/joinchannel pod is visible
- Job failed more than 6 times due to an error. All jobs in Bevel disappear if they failed for 6 times. To re-run the jobs, follow this
and then wait for the pod
# Get all the helmreleases for that namespace kubectl get hr -n ORG_NAME-net # Then delete the corresponding helmrelease kubectl delete hr createchannel-xxx -n ORG_NAME-net
createchannel-CHANNEL_NAME-xxx
to come up. Now, you can check the failure logs and debug. chaincode-install-xxxx-yyyy pod is in Crashloopbackoff
- Check the logs as per above, and if the chaincode git credentials are wrong/absent. Check the git credentials under
network.organization.services.peer.chaincode.repository
for possible incorrect credentials. genesis block file not found open allchannel.block: no such file or directory
- The orderer certificates aren't provided/non-accessible/incorrect. This error comes when the orderer certificate mentioned in the orderer block
network.orderers[*].certificate
is invalid, the path not readable or contains the wrong tls certificate of orderer. Fix the errors and reset and re-run the playbook.
Hyperledger Indy Errors¶
Ansible vars or dict object not found, domain genesis was not created
- network.yaml not properly configured. Check
organisation.service.trustees
,organisation.service.stewards
andorganisation.service.endorsers
is properly configured for the failingorganisation
in yournetwork.yaml
. Refer to Indy config file for more details. Vault Access denied, Root Token invalid, Vault Sealed
- If the logs mention "access denied", make sure that the Vault authentications were created correctly by checking all the tabs on Vault UI. Any Vault authentication problem is because of running different configurations
network.yaml
on the same Vault. Please ensure that you reset the network before re-running with a different network.yaml. Ansible vars or dict object not found, pool genesis was not created
- network.yaml not properly configured. Check
organisation.service.trustees
,organisation.service.stewards
andorganisation.service.endorsers
is properly configured for the failingorganisation
in yournetwork.yaml
. Refer to Indy config file for more details. nodes cannot connect with each other|Port/IP blocked from firewall
- You can check the logs of node pods using:
kubectl logs -f -n university university-university-steward-1-node-0
. Properly configure the required outbound and inbound rules for the firewall settings forAmbassador Pod
. E.g.if you using AWS the firewall setting for theAmbassador Pod
will be K8S Cluster'sworker-sg
Security Group. Not able to connect to the indy pool
- Ambassador IP does not match the PublicIps provided in network.yaml. Check the
Ambassador Host's IP
usinghost <Ambassador Public URL>
and verify if the same is present in thePublicIps:
section of yournetwork.yaml
. Also, the Port/IP may be blocked from firewall. Properly configure the required outbound and inbound rules for the firewall settings forAmbassador Pod
.E.g. if you using AWS the firewall setting for theAmbassador Pod
will be K8S Cluster'sworker-sg
Security Group. Resource Temporarily Unavailable
- Insufficient memory issues leads to RockDB getting locked. The
steward node pods
are not getting sufficient memory to turn up the RocksDB service hence it results in the DB getting locked. Recommedation is to either scale up the k8s nodes or increase the memory of existing k8s nodes.
R3 Corda Checks¶
Destination directory, example: /home/bevel/build/corda/doorman/tls, does not exist
- Folder to copy tls certs does not exist.
network.network_services.certificate
value is either misspelled or directory doesn't exist. Vault timeout or related error
- Vault was unavailable due to connection issues.Please verify all Vault configuration field in the
network.yaml
. Additionally check if the Vault service/instance is online and reachable from Kubernetes cluster. Doorman/NMS are unreachable HTTP error
- URI/URL could be misconfigured in the
network.yaml
or the services themselves are not running. Check logs of dooeman/networkmap pod and debug. AnsibleUndefinedVariable: 'dict object' has no attribute 'corda-X.X'
- Corda version is not supported.
network.version
value must be a supported Corda version. Notary DB Failed
- Notary registration has not happened properly or Notary store certificates failed.Check the notary registration container logs
kubectl logs <podname> -n <namespace> -c notary-initial-registration
. Check vault path '/credentials' for nodekeystore, sslkeystore and truststore certificates or check for error in logs of store-certs container of notary-registration jobkubectl logs <podname> -n <namespace> -c store-certs
.
Quorum Errors¶
Destination directory does not exist
- Build directory does not exist or not accessible. Ensure the path
network.config.genesis
andnetwork.config.staticnodes
folder path is accessible(read and write) by ansible controller and is an absolute path. Error enabling kubernetes auth: Error making API request
- Vault authentication issue. Ensure vault credentials are properly mentioned in
network.yaml
file. Retries exhausted/ pods stuck
- Issue with Vault connectivity. If the pod
PEER_NAME-0
has the status asInit:Crashloopbackoff
. Check the logs of the init containercertificates-init
of this pod using the commandkubectl logs PEER_NAME-0 -n ORG_NAME-quo -c certificates-init
. If the logs mention non accessibility of the Vault, make sure that the Vault is up and running and is accessible from the cluster. Could not locate file in lookup: network.config.genesis
- Genesis file not present in the location/not added in configuration file. Ensure the path of genesis file of exising network is correct/accessible(read and write) by ansible controller and is an absolute path .
Could not locate file in lookup: network.config.staticnodes
- Staticnodes file not present in the location/not added in configuration file. Ensure the path of staticnodes file of exising network is correct/accessible(read and write) by ansible controller and is an absolute path.