Ryan Danny How we Migrated RCSB.org at the San Diego Supercomputer Center to Kubernetes:lessons learned Rotated A little bit about me by Igor Khokhriakovaka Ingvordfor Kubernetes Community DaySCaLE 20xMarch, 9-12'23Pasadena, CA Feb, 1st'23 ~3.5M requests daily Well-seasoned CEO-Minded Software Developer/Architect with 15+ yrs of experience;Invited to SDSC RCSB PDB group to design and implement K8s platform and distributed reactive backend for their search API;Previously worked as Scientific Software Developer/Architect at DESY, Hamburg, Germany and ESRF, Grenoble, France;with main focus in designing SCADA and DCS systems as well as high level Meta Data acquisition systems;and even before that was a Full Stack Developer for a web based trading/analytics system; Research Collaboratory for Structural Bioinformatics PDB aka rcsb.org Lesson 1: careful design requires a lot (like A LOT) of learning/researching Motivation:- Retire legacy in-house solutions;- Move towards true CI/CD- Easily setup development environments- Perform experiments with new features Lesson 2: choose your provider carefully if you can Lesson 3: monitoring/controlling k8s resources Lesson 4: troubleshooting issues -> learn kubectl Lesson 5: choosing the right tools ain't an easy process but rewarding Lesson 6: try with some non-critical/non-continous tasks Lesson 8: be prepared to invest time/resources into internal learning Lesson 7: self hosted github/gitlab runners are great Challenge 1: Multicluster VS Cluster federation VS Single cluster Challenge 3: Costal distributed docker registry Challenge 4: Preserve exisiting log infrastructure Challenge 5: Ingress routing Many thanks to Conclusion Challenge 2: Storage like CI/CDotherwise what works today may stop working tomorrow We ended up testing our K8s cluster with our CI/CD Otherwise what works today may stop working tomorrow * developers must know the basics* architects must leverage new design principles* devops must be pretty advanced* using k8s implies architectural changes* using k8s implies understanding of its capabilities reuse open-source helm chartsallows quick learning in Helm; setting up secrets etcunlimited possibilities in configuring/customizing runners Harbor Ceph Skaffold: dev + devops Helm over Kustomize (IDEA support)Harbor also tests storage and etc IDEA and OctantElastic Main two categories to choose from: on-permises VS cloudproviderDoing costs estimations properly ain't an easy process: salaries + delays VS e.g. AWSAnother important aspect: k8s differs from anything else we used before even experienced teams may have issues In our case we bound to SDSCbasically means on-permises deployment i.e. pros and consK8s on bare metal took much longercalico set up ruined internal networking: github web hook could not communicate to deployments kubectl is your friendknow namespacesget eventsget logs Multicluster Ongoing effortHave dedicated Elastic cluster sub-domain aka host based routing e.g. arches.k8s.rcsb.orgalas path based routing ain't supported by our system, as otherwise does not require external DNS Using carefully chosen 3rd party tools does make our transition less painful for sure.Even though there were some complications we are very happy with all the new posibilities wide open to us Jeremy Henry; Henry Chao;Jose DuarteSDSC team Things to consider: - Migration process A/B switch VS smooth migration - Monitoring - CI/CD - Development Required knowledge base Tools/utilities New mindset - Infrastructure: Storage NFS -> CEPHFS* - Technical challenges - Choosing 3rd party tools/utilities/infrastructure Concept maps help greatly! * not affilated with any of them Or how we learned the hard way that "deployments do not create pods, replicasets do" Yet another annoying thing "too many open files" Prepare an image with your favorite tools on board and do: IDEA:service tab -> k8s -> configuration -> open in new tab -> switch namespaces -> folow log -> show yaml -> console -> port forwarding -> iterate through resoruces -> switch context Octant:cluster overview -> namespaces -> nodes -> applications -> peak java -> namespace overview -> terminal -> log $ kubectl run -it networktest --image=my-favorite bin/ash --restart=Never --rm unlimted configuration possibilities!!! VS RCSB PDB Team SDSC Team Jose M. Duarte Henry Chao Jeremy Henry Alyssa Colby Gavin Thanks for listening!And now to the fun part... Questions and Answers, Comments... My contacts:  ingvord.ru ikhokhryakov ingvord igor.khokhriakov@rcsb.org ikhokhriakov@ucsd.edu ingvord.mail@gmail.com ~20 services; ~80 instances per coast;~160 running production service instances in total;~1126 vCPUs; 9.95703125 TB of memory. Communication between services internally seems to be blocked. The new cluster looks like it's using a networking library called calico to handle network security, and there might be some ports through that library they'll need to open for our deployments to work.Those deployments are for integrating with Vault for secrets management and generating valid TLS certs, so they're definitely needed before we can get other things deployed. letsencrypt-staging letsencrypt-prod hashicorp-vault Issuers cert-managerCertificatesKubernetesSecrets signed keypair foo.bar.comIssuer:venafi-tpp example.comwww.example.comIssuer:letsencrypt-prod venafi-as-a-service signed keypair venafi-tpp *borrowed from cert-manager/docs -cert-manager credentials * not affilated with any of them Makes it possible for a complete end-to-end automation! 1. 2. 3. * * RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances Journal of Molecular Biology · Feb 2, 2023 Igor Khokhriakov aka Ingvord The RCSB PDB research-focused RCSB.org web portal serves more than 6M unique users annually across academic, government, industry, and public domains.
1
  1. Titel
  2. ChatGPT
  3. ChatGPT
  4. ChatGPT
  5. About me
  6. We are here
  7. SDSC
  8. RCSB
  9. RCSB.intro.1
  10. RCSB.intro.2
  11. RCSB.intro.3
  12. Ln1
  13. Ln1.motivation
  14. Ln1
  15. K8s landscape
  16. Migration
  17. Concept maps
  18. K8s platform project design
  19. K8s platform
  20. Technical challenges
  21. CICD workflows
  22. Ln2
  23. Ln2.thoughts
  24. Ln3
  25. IDEA.integration
  26. IDEA.deployment
  27. Octant
  28. Elastic
  29. Ln4
  30. deployment issue
  31. deployment issue
  32. deployment issue
  33. useful tip
  34. Ln5
  35. Ln5.tools
  36. IDEA+Helm
  37. IDEA+Skaffold
  38. IDEA+Skaffold2
  39. cert-manager
  40. cert-manager
  41. Ln6
  42. GitHub actions failure
  43. GitHub actions failure zoom
  44. GitHub actions failure reason
  45. GitHub actions failure reason zoom
  46. Ln7
  47. Ln7.zoom
  48. Ln8
  49. Ln8
  50. workshop
  51. workshop.zoom.1
  52. workshop.zoom.2
  53. Ch1
  54. Ch2
  55. Ch2.zoom
  56. Ch3
  57. Ch4
  58. Ch5
  59. Ch5
  60. Conclusion
  61. Thanks
  62. Acknowledgments
  63. Contacts