Recovering a Fully Replicated SolrCloud Node After Data Loss
In my spare time, I maintain a multi-tenant, high-scale SolrCloud with indices measure in the terabytes per node.
The SolrCloud deployment consists of dozens of collections, each configured to have two shards with a replication factor of two. This was quite lucky for us recently, because when one of our nodes went down, all of them continued to be available, and we did not experience a service outage of any kind.
One of our nodes experienced disk failure that resulted in total data loss. This was denormalized data, but re-analysis is time-consuming. We already had the data replicated to other nodes that were working just fine. For the most part, we were interested in restoring the previous cluster to its former health with a fresh instance of Solr taking the place of the downed node. This node would have an identical image up to even the same hostname and IP.
I was somewhat surprised to find that there wasn't really an off-the-shelf solution to this problem, and not much came up when Googling. I searched for things like "solrcloud restore lost replicas", "solrcloud recover nodes", and there were few actionable results.
I was able to use Solrcloudpy, a client library for Python, to identify downed nodes using the new collections API, and send instructions to the cluster to manually remove the orphaned replica listings, and then manually re-create those replicas to the same location. You can see how I did it in this Gist.
This kind of robustness and fault-tolerance is what makes SolrCloud one of my favorite distributed data stores to work with. Even for APIs without power-packed off-the-shelf clients, you can easily interact with them and understand how they should behave. This is largely thanks to the many talented maintainers and community members that continue to use and support it.
In many cases, I have simply used Python's requests library for simple JSON interactions with Solr, but sometimes it's nice to have better data modeling in your code. If you're a Python developer looking to get into SolrCloud, feel free to check out Solrcloudpy, an intuitive client for working with multi-server, multi-tenant search deployments.