<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="http://robertelwell.info/feed.xml" rel="self" type="application/atom+xml" /><link href="http://robertelwell.info/" rel="alternate" type="text/html" /><updated>2026-01-04T00:17:08+00:00</updated><id>http://robertelwell.info/feed.xml</id><title type="html">Language Hacker: Robert Elwell’s Personal Website</title><subtitle>Bob Elwell is an engineering leader in Atlanta, Georgia. He&apos;s an expert in scaling software and organizations.
</subtitle><entry><title type="html">Brewing Success: Making the Most of Annual Reviews</title><link href="http://robertelwell.info/2024/02/10/annual-reviews.html" rel="alternate" type="text/html" title="Brewing Success: Making the Most of Annual Reviews" /><published>2024-02-10T00:00:00+00:00</published><updated>2024-02-10T00:00:00+00:00</updated><id>http://robertelwell.info/2024/02/10/annual-reviews</id><content type="html" xml:base="http://robertelwell.info/2024/02/10/annual-reviews.html"><![CDATA[<h1 id="brewing-success-making-the-most-of-annual-reviews">Brewing Success: Making the Most of Annual Reviews</h1>

<p>We recently wrapped up an annual review season at my previous role. Between self-review,
manager review, calibration, and delivery, the process took close to three months!
The annual review process can be a big challenge to everyone involved. Introspection can be difficult,
and for many, it can be a stressful affair. Executed well, annual reviews are a fantastic way to
celebrate successes, advocate for your team, connect the dots on important conversations, and
get better alignment on how someone’s long-term growth goals fit into the overall business strategy.</p>

<p>This year involved closely collaborating with my managers in advance of the review period as
well as during our calibration period after self-reviews and manager reviews were written. 
I shared some advice to help make their reviews generate the kinds of outcomes we want to have 
for our people, and how to use these discussiosn to effect the right kind of changes. After 
getting everything all wrapped up, I wanted to memorialize some of this advice here to generate 
comment and provide some free advice for newer engineering managers daunted by this task 
for their first time.</p>

<h2 id="feedback-is-like-tea">Feedback Is Like Tea</h2>

<p>I once worked with an HR business partner who prefaced review season with this analogy, 
and it always stuck with me – probably because I’ve got Harney &amp; Sons on speed dial.
“Feedback is like tea – if you steep it for too long, it may become bitter, but if you don’t
steep it for long enough, it has no flavor.” In other words, when giving feedback in any form –
be it a peer review, a self-review, or a manager review – you should spend an appropriate
amount of time forming your thoughts, and doing so with appropriate tact and care. Avoid
ruminating on the negatives, but make sure that constructive feedback is clear, unequivocable,
and adequately balanced. Conversely, even a glowing review should have appropriate detail 
and consideration for areas of improvement.</p>

<h2 id="feedback-is-a-gift">Feedback Is a Gift</h2>

<p>What’s better than getting someone to seriously, honestly tell you how you’re doing?
The answer is, getting that same feedback accompanied with the knowledge that the person giving 
it has your best interest at heart. So long as you go into the process with that in mind, 
I encourage you to be bold and candid in your discussions. Every single one of us has something we 
need specific and actionable feedback on to get better — otherwise, we’d be perfect! 
Don’t be afraid to say important things that you know can help make a person you think is
great even greater.</p>

<p>An oft-repeated bit of advice is to use something called “sandwich feedback”, where you
say something positive, then deliver the critical piece of advice, and then finish with
another positive. The latest consensus on this approach is that it is in fact an
anti-pattern when it comes to delivering the advice. The reasons for this range from
the observation that it can be considered manipulative, to the fact that the number
of positives outweigh the number of negatives, setting off a subliminal <em>Two out of
three ain’t bad</em> response from the recipient. In other words, don’t use sandwich feedback
to soften the blow of what should be an important conversation. Hedging this way
makes your feedback more ambiguous and less effective.</p>

<p>Ideally, the format of your annual review should be informed by the tool that you are
using or the standards put in place by your HR business partner. When it comes to 
pacing, structure, and the like, stick to the customary script provided. Be direct,
clear, and fair at the appropriate juncture – don’t hide the things you want to get
better behind the things that are going fine.</p>

<p>Taking overly positive feedback to its logical extreme, there is no place for the 
“all fives” school of reviews (or whatever your max value may be). This is another example
of the “tea” being too “weak”. If you’re feeling pressure to give these kinds of
reviews to protect your reports, that’s a big culture red flag at your company. If you
feel the need to give these kinds of reviews because delivering difficult feedback 
makes you uncomfortable, you should bear in mind that this is an important competency
for a manager. You owe it to your reports to get comfortable delivering constructive feedback, 
and doing it in a way that keeps them feeling listened to and supported. 
Giving the highest possible ratings in every category and not providing 
thoughtful advice isn’t only a disservice to your reports, but it indicates a lack of engagement 
in the process to your HR business partner and your manager.</p>

<h2 id="no-surprises">No Surprises</h2>

<p>If you’re having a performance conversation about something for the first time during
an annual review, that should give you pause for thought. This should not be the first time 
important feedback is provided. Annual reviews are a time to synthesize, summarize, and reflect 
on what trajectory you can extrapolate. In fact, for sticky performance issues, you should 
be able to reference whether the problem has improved, has persisted, or gotten worse.
You should be able to say something along the lines of:</p>

<blockquote>
  <p>We initially discussed this problem back in July, and talked about how we could improve it.
You agreed that you’d try to work on it by using strategies such as X, Y, and Z.
I’ve commended you in past one-on-ones on implementing those strategies on occasion,
such as in November when you did X instead of M. But we did have a discussion
in December after I observed you did N, when I hoped you would have done Z instead.
So we’re seeing growth in this area, but continued growth is still important here,
because it still prevents you from being successful in ABC, which we agree is one of your goals.</p>
</blockquote>

<p>The above is an example of how performance discussions are <em>iterative</em>, with the annual
review cycle giving us the opportunity to neatly contextualize those trends, sucesses, and challenges
over a broader period of time.</p>

<p>Along this theme, if you’re tying promotions to the annual review cycle, don’t keep reports with
a vested interest in their development in the dark on where they are tracking. Recurring
conversations should note what the current deltas are between how the indiviudal is performing
and what the expectations of their desired role looks like. Make sure that you are 
giving clear, actionable feedback on how to more regularly exhibit the expectations of that role.
For in-time interactions, don’t be cryptic – if there’s an opportunity to step up, make
the reasoning behind your request clear. Remember that part of your responsibility 
as a manager is to help intentionally develop your reports along their desired career path.
In short, your reports should have a reasonable level of certainty whether they will 
be awarded that promotion because of how they have performed in the concrete milestones
you’ve set for them.</p>

<h2 id="use-data-to-drive-discussions">Use Data to Drive Discussions</h2>

<p>Would you rather have a nice, organic Japanese green tea brewed at a perfect 180º with 
12 ounces of cold, filtered water, or a generic green tea bag brewed with water you boiled
from the tap? Even if they tasted identically to you, science suggests that you’d like the 
first more, largely because of the detail that’s been provided to you.</p>

<p>Detail makes things more memorable, and more compelling. Data is the ultimate form of detail.
Data doesn’t just include being able to reference previous discussions, as mentioned in the previous
section.</p>

<p>While you should be using a variety of indicators to measure your team’s performance
throughout the entire year, annual reviews are a fantastic time to sit down and zoom out on
those metrics. Think about it like long-term investment. It can be difficult to watch your
money grow on a day-to-day basis in an index fund, but taking a look at it over a year can
tell a valuable story, especially compared to other index funds you may have tracked over the same
period of time.</p>

<p>For individuals at the same level, you should be able to identify clear trends on throughput based 
on key metrics that can be gleaned from version control, documentation, and issue tracking. 
Ideally, these will correlate to the expectations of the job description which you hopefully have
on file. We can then use that information to address outliers, giving clear metrics they need to improve on.
It might not be a bad thing that that data point is an outlier, but outliers are always valuable
conversation starters, and correlating outliers often tell an interesting story.</p>

<p>Here are some suggestions for how to use data as a storytelling device instead of a cudgel.
First, establish the metric of interest that the person has exhibited, and then share the delta observationally. 
Explain what that indicates, and then impart why improving the metric is valuable. For example:</p>

<blockquote>
  <p>Other team leads are performing N code reviews per month for a team with similar pull request volume,
whereas you’re averaging M. This is important, because the folks on the team you’re leading
deserve your feedback and guidance on the work they’re doing. Engaging in code review is
also a great way to stay on top of what everyone is doing, and a great way
to push on quality and ensure your team is meeting our overall architectural goals.
What are some ways we could get you more involved in your team’s code reviews?</p>
</blockquote>

<p>When using data to drive performance discussions, I encourage you to thread the needle
on adequate context, and to be as objective as possible when laying out the details.
Hearing constructive criticism is hard in general, and it can be very uncomfortable
for the individual on the receiving end of this concrete evidence. Because of this, 
when using data to underscore room for growth, you should make doubly sure that 
this isn’t the first time that you’ve had the discussion, and that you’ve used 
gentler strategies in the past. In this case, I would hope that you’ve got notes
from previous one-on-ones where you’ve written down that you asked the lead in question
to get more involved in the team’s code reviews.</p>

<h2 id="set-the-stage-for-the-next-year">Set the Stage for the Next Year</h2>

<p>I like to conclude my reviews with where I hope to see the person in a year’s time,
and what support I hope to give to get them there. Since so much of an annual
review is rehashing what happened in the last 12 months, I find that it’s
a little uplifting – even in the face of some difficult conversations – 
the reconfirm your belief in the potential of the person you’re delivering the
review to. So take the time a the end to build folks up, get them excited
about what’s next, and make sure they know they are an important part of 
your vision for the team’s success. This hopefully won’t just help to inspire,
but to confirm the important sense of belonging that every manager should foster
in their team.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Annual reviews are a fact of life in most companies. I like to face such inevitabilities
head-on, with lots of preparation so that I can bring my best self to the occasion.
There are plenty of other approaches to take, but I hope that there’s room for a few
of my recommendations in those approaches. What is most important is to strike that delicate balance 
between being fact-based and candid, supporting growth, and leaving each discussion with the recipient
with a clear understanding of how to move forward – ideally feeling good about their future.
Executing on these successfully are a responsibility and a privilege that can make a huge
impact on company morale and culture, so sieze the opportunity to be a positive force
that helps your team get ever better.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[Brewing Success: Making the Most of Annual Reviews]]></summary></entry><entry><title type="html">Hiring For Growth: Tips for Engineering Managers</title><link href="http://robertelwell.info/2024/01/09/hiring-tips.html" rel="alternate" type="text/html" title="Hiring For Growth: Tips for Engineering Managers" /><published>2024-01-09T00:00:00+00:00</published><updated>2024-01-09T00:00:00+00:00</updated><id>http://robertelwell.info/2024/01/09/hiring-tips</id><content type="html" xml:base="http://robertelwell.info/2024/01/09/hiring-tips.html"><![CDATA[<p>Over the course of my career, I’ve been involved in several high-growth engineering organizations.
Having recently helped Vultr more than double its engineering organization (and growing), and
with January always being the start of what can be a busy hiring season, I thought I’d start
2024 off right with some tips for how to balance a thoughtful process with speed.</p>

<h2 id="know-what-youre-hiring-for">Know What You’re Hiring For</h2>

<p>You should be responsible whenever possible in writing the job description for the role in question.
When you aren’t, you should work closely with the original author of the JD, and make sure to provide
any feedback that might be useful in finding the right candidate. As the hiring manager, you should be
ready with a clear sense of what 30, 60, and 90-day goals would look like for the role you’re looking to hire.</p>

<p>Consider looking at the competencies of comparable engineers with the role you’re searching for.
If there’s a big delta in skill or background based on what your description calls out, you may
want to consider further refining your criteria, or reconsidering how the role is titled or leveled.</p>

<h2 id="be-first-point-of-contact">Be First Point of Contact</h2>

<p>As hiring manager, you should be the first interview post phone screen. 
Since you’re the person responsible for fostering team culture, you should 
have the strongest barometer for whether a candidate’s values aligns with your team’s and your company’s.</p>

<p>The candidate deserves to get to know their manager as early into the process as possible to guarantee 
it would be the right relationship. Since your team’s time comes at a premium, if you or the candidate 
determine it’s not a good fit after the first round, you’re also preserving the limited bandwidth of your engineers.</p>

<p>If you’re a very soft yes or you’re on the fence without any strong red flags, this is a great time to defer to your team,
and still a solid use of their cycles.</p>

<h2 id="build-a-hiring-framework">Build a Hiring Framework</h2>

<p>Work with your HR business partner to introduce a consistent process for hiring. Having these processes spelled out is 
important for training others to interview (a valuable skill for engineers interested in growth either into a managerial role as 
well as senior/staff/principal IC), and also setting clear standards for what a successful candidate looks like.</p>

<p>Having these criteria clearly spelled out among a handful of well-rounded interview phases. They should be tailored to 
understand technical, organizational, and interpersonal competencies.
Knowing what you’re evaluating – and how – allows us to evaluate individuals consistently across interviewers.
A consistent approach to interviewing helps minimize both conscious and unconscious bias.
This goes even more so if you develop these hiring rounds collaboratively.</p>

<p>Don’t just include your peers in engineering. These standards should be established as part of your process for 
defining any new role, so it’s important to get HR buy-in when building out the components of this process.</p>

<h2 id="identify-a-successful-rejection">Identify a Successful Rejection</h2>

<p>In testing, we evaluate the happy path as well as the most common degenerate cases. It should be no different when it comes to interviewing. 
You should work with your counterparts in HR to make sure you’ve identified the most effective way to give candidates who you don’t intend 
to move forward with fast feedback – even if it’s just a form email.</p>

<p>The impression any and all interviewees receive when going through your process can impact your company’s reputation whether or not the candidate 
receives an offer. Because of this, you should work with your hiring team to make sure that all candidates feel listened to and respected. Make sure to express 
gratitude for every candidate’s time, and for the opportunity get to know them, even if they clearly won’t be a fit for the role. Work as many candidates 
as you can through the full interview they are scheduled for. Don’t cut an interview short just because you’ve run into a skill mismatch or some other 
clear deal breaker – use it as an opportunity to get to know the person, and try to understand whether they may be a fit in another role. At the very least, 
use this as a teaching opportunity so that the candidate leaves the interview having learned something valuable for a future interview.</p>

<p>Early in my career, and even further on, I’ve had some very valuable interviews where – despite not receiving an offer – I learned something 
important that helped me be more effective in my current role, and ultimately more successful in future interviews.</p>

<h2 id="roll-out-the-welcome-wagon">Roll Out The Welcome Wagon</h2>

<p>Anyone in Customer Success can tell you that a deal doesn’t stop when the ink dries. This is just as applicable in hiring.
I’ve seen great candidates rescind their offer acceptance because of poor follow-through, or even just room for second thoughts to creep in.
Once you have a start date squared away, you should begin a countdown. Around a week out, send an email to the candidate
with your team CC’ed, letting them know how excited you are for them to join the team. Encourage your team
to reach out sharing their enthusiasm.</p>

<p>Make sure that you have an onboarding buddy assigned who can act as a peer point-of-contact to show this person the ropes,
and give them the opportunity to build a positive relationship with this teammate early on.</p>

<p>Build out a consistent 30/60/90-day plan that includes links to onboarding docs, key contacts within engineering and
outside of engineering when necessary, and projects of increasing complexity. Having these spelled out will help the 
candidate establish a foundation and start delivering compounding benefits in their areas of responsibility.</p>

<p>Meet with your new hire frequently in the first 90 days to make sure that they are on track for your 30/60/90 plan. Use that
process not only to make sure that they are acclimatizing and acculturating properly, but to receive feedback on
your processes, and the state of your team and organization as a whole. I was once told that there is nothing more 
valuable that “fresh pain” when it comes to working through new processes that everyone on your team may have just
adapted to and forgotten about. Make sure to collect that fresh pain, empathize with it, and consider possible solutions.</p>

<h2 id="rinse-and-repeat">Rinse and Repeat</h2>

<p>You will find that if you follow these approaches that you’ll be able to hire the kind of people who align with
your goals and your team’s values. With continued investment in their growth and success, you will have developed
an individual who won’t just help you meet your business goals, but will support you further down the line. 
Onboarded properly, they too will develop techniques that use empathy, preparation, and communication to continue 
to build an organization of the kind of people who you would be proud to work with.</p>

<p>In other words, imagine every candidate you speak with interviewing a
future candidate to join the team you hope to build in six to twelve months.
That philosophy will help you succeed in hiring for growth in more ways than one.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[Over the course of my career, I’ve been involved in several high-growth engineering organizations. Having recently helped Vultr more than double its engineering organization (and growing), and with January always being the start of what can be a busy hiring season, I thought I’d start 2024 off right with some tips for how to balance a thoughtful process with speed.]]></summary></entry><entry><title type="html">Building a Recipe App on Vultr’s Platform</title><link href="http://robertelwell.info/2022/06/25/enough-recipes.html" rel="alternate" type="text/html" title="Building a Recipe App on Vultr’s Platform" /><published>2022-06-25T00:00:00+00:00</published><updated>2022-06-25T00:00:00+00:00</updated><id>http://robertelwell.info/2022/06/25/enough-recipes</id><content type="html" xml:base="http://robertelwell.info/2022/06/25/enough-recipes.html"><![CDATA[<h2 id="taking-flight-with-vultr">Taking Flight with Vultr</h2>
<p>Back in February, I made a very exciting move by joining <a href="https://vultr.com">Vultr</a> 
as Senior Director of Engineering. Vultr is an independent cloud provider
that has been in the industry for roughly 20 years. Over the last several years,
this has become a very compelling space. Digital transformation has 
brought more and more businesses into the cloud, and many more businesses 
have started their lives as cloud native over the last decade. With the growing sentiment of
cloud agnosticism, the fear of vendor lock-in, and the availability of highly configurable approaches to
deploying infrastructure as code, there’s never been a better opportunity to sieze
a portion of a growing and exciting market.</p>

<p>Not hitching your wagon to the “big three” isn’t just a matter of cost, though price
arbitrage will always be a compelling reason. Diversifying
your providers and data centers is the most effective way to avoid
service-impacting outages. It helps you reduce latency by keeping the edge closer to customers.
It also helps maturing companies meet a growing laundry list of regulatory issues varying 
from location to location related to data residency. Vultr has <strong>dozens</strong> of points of presence,
and growing – thanks to our excellent deployment operations team and 
the sysadmins who continue to build and grow our data centers.</p>

<p>It’s easier than ever to take advantage of Vultr’s platform
thanks to our broad adoption of the industry standards that have 
evolved over the last several years. We offer many things that 
most cloud developers already use. I thought it’d be fun to 
create a reference implementation that took advantage of our available
tooling, showing how truly easy it is to build modern web applications
off of Vultr’s offerings.</p>

<h2 id="enough-recipes">Enough Recipes!</h2>
<p>To kick the tires on everything, I created a simple website called <a href="https://enough.recipes">enough.recipes</a>.
Why call it that? I noticed countless recipe websites on Hacker News, and while
they were all interesting proofs of concept, most of them just felt like another
To-Do List App. The challenge to me was usually that they expected people to
add recipes themselves – as if there weren’t already enough recipes on the web!</p>

<p>I used my background as a search engineer at Wikia (now known as Fandom) to 
scrape the <a href="https://recipes.fandom.com/">Recipes Wiki</a> and create a search engine
to make its over 40,000 pages discoverable from a simple search. Back in my day,
the company embraced open source a little bit more and made it far easier to extract
content from their site. We even contributed to open source tools to make it easier to
interact with data from a given wiki. That seems to have changed over the last decade,
but no biggie – I could still use the core MediaWiki API to enumerate the URLs for all pages,
and then simply extract the relevant portion of the DOM from a simple HTTP request.</p>

<p>On a daily basis (using a K8s cron resource), I would iterate over the page list in 
the MediaWiki API, publishing each URL to my message bus. A consumer resource
will perform a request against the URL in question and store the appropriate
data from the DOM in both a database as well as a search engine.</p>

<h2 id="the-vultr-ecosystem">The Vultr Ecosystem</h2>

<p>So what are all the bits and pieces that I used to create this site?</p>

<p>I built a containerized application using Django, with all the pieces working
in a simple Docker Compose definition before proceeding to get it deployed to 
a production environment.</p>

<p>You can easily deploy containerized applications to a Kubernetes cluster,
and Vultr’s <a href="https://www.vultr.com/docs/vultr-kubernetes-engine/">Kubernetes Engine</a>
is a major achievement that we’ve GA’ed this year. Having used a variety of cloud-based 
Kubernetes offerings, I’m quite pleased with 
what the team had delivered. Being able to back our clusters with a variety of
customizable instance types gave me a great deal of flexibility around both cost and performance.</p>

<p>I was able to create the Kubernetes cluster using Vultr’s
<a href="https://registry.terraform.io/providers/vultr/vultr/latest/docs">Terraform provider</a>,
which was very seamless to use. It’s built on top of our <a href="https://www.vultr.com/api/">API</a>,
which is very nicely documented since it conforms with the OpenAPI spec.</p>

<p>The Docker image for the Django app served as the basis for the 
kubernetes deployment and service definitions that acted as a backend to
a simple nginx image, which was used to provide SSL termination and allow
scaling across one or more deployments of Gunicorn + WSGI.</p>

<p>Vultr also provides a <a href="https://www.vultr.com/products/load-balancers/">Load Balancer</a> 
that can be deployed as a resource within a Kubernetes deployment.
This automatically exposed a static IP for public ingress, and annotations
provided the ability to properly handle TLS and port-forwarding.</p>

<p>I was even able to use <a href="https://www.vultr.com/docs/introduction-to-vultr-dns/">Vultr’s DNS</a>
by pointing the domain I purchased to their nameservers, and then setting the LB’s
external IP as the A record for the domain.</p>

<p>Since this was a Django app, I would need to get a database set up.
Vultr has recently rolled out MySQL as a beta <a href="https://www.vultr.com/products/managed-databases/">Database as a Service</a>
offering. It was super exciting to get a chance to use this as an opportunity to preview how its functionality.
My favorite thing about how our DBaaS works is that the UI provides the ability to 
“click to copy” the database URL (i.e. <code class="language-plaintext highlighter-rouge">mysql://user:pass@some-ip:3306/your_db</code>).
The database URL has become the lingua franca of many ORMs, and
I’ve become quite accustomed to having to compose this string myself to work with
<a href="https://github.com/jazzband/dj-database-url">dj-database-url</a>. The convenience
was fantastic. Remember that <em>this URL contains secrets</em>, and so should be handled
as sensitive information.</p>

<p>Since I was doing some basic styling with Tailwind, I needed a place to store
my static assets. I was able to very easily use
the <a href="https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html">django-storages</a> 
S3 backend with Vultr’s <a href="https://www.vultr.com/products/object-storage/">S3-Compatible Object Storage</a>
by simply plugging in the right credentials and configurations.</p>

<p>Some of my deployments needed block storage, as did the helm charts
I needed for both Kafka (my message queue) and Elasticsearch. I was able
to use our <a href="https://www.vultr.com/products/block-storage/">Scalable Block Storage</a>
product to support these use cases. It’s worth noting that this required specifying the 
<a href="https://kubernetes.io/docs/concepts/storage/storage-classes/">storage class name</a>
as well as the desired size. Without both of these configured for each deployment
the helm chart maintains, you wouldn’t be able to successfully run those instances in 
your cluster. Here’s an example for the helm command used to get Kafka running:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm <span class="nb">install</span> <span class="nt">-n</span> enough-recipes<span class="se">\</span>
             broker bitnami/kafka <span class="se">\</span>
              <span class="nt">--set</span><span class="o">=</span>persistence.storageClass<span class="o">=</span>vultr-block-storage <span class="se">\</span>
              <span class="nt">--set</span><span class="o">=</span>persistence.size<span class="o">=</span>10Gi <span class="se">\</span>
              <span class="nt">--set</span><span class="o">=</span>zookeeper.persistence.storageClass<span class="o">=</span>vultr-block-storage <span class="se">\</span>
              <span class="nt">--set</span><span class="o">=</span>zookeeper.persistence.size<span class="o">=</span>10Gi
</code></pre></div></div>

<h2 id="bringing-it-all-together">Bringing it All Together</h2>

<p>I’ve built plenty of things with EKS that involved a lot
of banging my head against the wall trying to figure out the nuances of IAM roles
and various permissions settings. Vultr’s default behavior eschews many of the arcane
things that make the bigger cloud providers so hard to work with. That’s one of the reason’s we call it
the <strong>Developer-First Platform</strong>. By enabling fast prototyping at a competitive price,
we should be on the tip of your tongue when developing MVPs or building contract applications
for cost-conscious clients.</p>

<p>You can view <a href="https://github.com/relwell/enough-recipes">Enough Recipes on GitHub</a>
for all of the application code and definitions.</p>

<p>It was a lot of fun to build this site and understand how all the great pieces of
the Vultr platform fit together. I talked about this with our developer advocate,
<a href="https://waltrib.com/">Walt Ribeiro</a>, for Vultr’s YouTube channel. You can check it out here:</p>

<p><a href="https://www.youtube.com/watch?v=DVIW7pNlovY"><img src="https://img.youtube.com/vi/DVIW7pNlovY/0.jpg" alt="Enough Recipes Vultr" /></a></p>

<p>So what’s next for Vultr? Well, without giving too much away,
we just had a very exciting H2 planning session with lots of great takeaways.
You’ll be able to do a lot more on our platform with many more of the conveniences
you may have come to expect from the bigger guys.</p>

<p>Did I mention <strong>we’re hiring</strong>? Solving interesting problems on a daily basis
is just par for the course. We occupy a space that’s not going away any time soon,
and will only gain more attention as cost-conscious companies revisit their cloud
costs during the upcoming business cycle. If you’re interested, drop me a line or 
<a href="https://www.vultr.com/jobs">check out our jobs page</a>!</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[Taking Flight with Vultr Back in February, I made a very exciting move by joining Vultr as Senior Director of Engineering. Vultr is an independent cloud provider that has been in the industry for roughly 20 years. Over the last several years, this has become a very compelling space. Digital transformation has brought more and more businesses into the cloud, and many more businesses have started their lives as cloud native over the last decade. With the growing sentiment of cloud agnosticism, the fear of vendor lock-in, and the availability of highly configurable approaches to deploying infrastructure as code, there’s never been a better opportunity to sieze a portion of a growing and exciting market.]]></summary></entry><entry><title type="html">Building a Yacht Rock Song Title Generator</title><link href="http://robertelwell.info/2019/04/15/yachtkov.html" rel="alternate" type="text/html" title="Building a Yacht Rock Song Title Generator" /><published>2019-04-15T00:00:00+00:00</published><updated>2019-04-15T00:00:00+00:00</updated><id>http://robertelwell.info/2019/04/15/yachtkov</id><content type="html" xml:base="http://robertelwell.info/2019/04/15/yachtkov.html"><![CDATA[<p>Since moving to Atlanta and getting a place with a pool,
I like to spend the warmer months coding out on the deck with my
Jack Russell Terrier, jumping into the water while things
build, and then diving back into the code. I’ve found
Yacht Rock to be a genre that accompanies this lifestyle quite nicely.
The smooth West Coast sounds of the mid ’70s through early ’80s
is just the kind of thing you can sing along to when you’re in one mood,
and not pay too much attention to when you really need to ruminate on a problem.</p>

<p>When the weather gets cold, I’ve tended to stay in that frame of mind.
So I decided to make a fun, little website that pays tribute to these
songs by exploring what characterizes them based on their title.
Thanks to the folks who originated the term
with the Yacht Rock web series, and then the Beyond Yacht Rock podcast,
getting the data for this project was a breeze.</p>

<p><a href="http://yachtkov.herokuapp.com">Yachtkov</a> is a Yacht Rock song title
generator that uses a Hidden Markov Model trained on song titles
rating greater than 50 on the <a href="http://www.yachtornyacht.com/">Yachtski Scale</a>.</p>

<p>Using Beautiful Soup, I simply pulled the data from the site and fed it into
<a href="https://github.com/jsvine/markovify">Markovify</a>,
which is a high-level tool for working with hidden markov models
in Python. Something I like a lot about this language generation tool is that
it has configurations for trying to create content that is sufficiently unlike
the source data. With short things like song titles, this helps avoid
duplicating the source content – meaning you shouldn’t see song titles
that are identical to titles in the source data.</p>

<p>This was also an opportunity to work with
<a href="https://reactjs.org/docs/hooks-intro.html">React Hooks</a>. I used
the state and effect hooks to handle asynchronous requests to a simple Flask
backend.</p>

<p>The highly throwback look and feel wasn’t just a function of
being easy to code; I thinkit pays a solid tribute to what these kinds of
songs evince from the moniker of <em>Yacht Rock</em> alone.</p>

<p>I would probably be remiss if I didn’t take a minute to acknowledge that
I’ve “been the fool before” with these sorts of projects,
albeit in a difference genre of music. You may remember my
<a href="http://robertelwell.info/2013/03/19/pycon-presentation-video.html">PyCon 2013 presentation on building a Media Takeout Headline Generator</a>,
where I discussed probabilistic language modeling within the context of
some of my favorite hip hop artists at the time.</p>

<p>While it should be noted that my car’s still blasting Hip Hop Nation
most of the time (don’t get me started on Yacht Rock Radio),
a bunch of things have changed since then.
And I don’t just mean I’m a little older and wiser.</p>

<p>The MTO Headline Generator generated a bunch of text with Python,
and then served up the pre-processed text with PHP.
This project uses a Flask backend that builds the language model once, stores
it in memory, and serves it just as fast as retrieving a random value from
a large flat file.</p>

<p>The PHP site was, of course, intentionally modeled after Media Takeout’s
styling at the time, and so used entirely server-side rendering (if memory
serves me correctly). This Flask app serves an extremely simple React App,
and makes async requests to retrieve data from the same app on a separate
endpoint.</p>

<p>And of course, from a computational linguistics standpoint, HMMs produce
far better output than the somewhat more naive N-Gram-based approach
I used in the past.</p>

<p>I think there’s something special in creating art from your code,
especially when you can do it in a way that lays bare the endemic
characteristics of another form of art. It lets you engage as both a
craftsman and observer, providing commentary and critique simultaneously.</p>

<p>I encourage you to imagine what songs like <em>Show Me The Night</em>,
<em>You Made a Fool Believes</em>, <em>Caught Up in the Business</em>, and
<em>Tell Me What You Won’t Do For Love</em> sound like. Rest assured,
they’ll have that Doobie bounce, smooth production, a little something
extra melodically, and probably a Porcaro or two in the personnel.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[Since moving to Atlanta and getting a place with a pool, I like to spend the warmer months coding out on the deck with my Jack Russell Terrier, jumping into the water while things build, and then diving back into the code. I’ve found Yacht Rock to be a genre that accompanies this lifestyle quite nicely. The smooth West Coast sounds of the mid ’70s through early ’80s is just the kind of thing you can sing along to when you’re in one mood, and not pay too much attention to when you really need to ruminate on a problem.]]></summary></entry><entry><title type="html">Building a Sentiment Ticker with Raspberry Pi and NLTK</title><link href="http://robertelwell.info/2017/02/12/building-a-sentiment-ticker-with-raspberry-pi.html" rel="alternate" type="text/html" title="Building a Sentiment Ticker with Raspberry Pi and NLTK" /><published>2017-02-12T00:00:00+00:00</published><updated>2017-02-12T00:00:00+00:00</updated><id>http://robertelwell.info/2017/02/12/building-a-sentiment-ticker-with-raspberry-pi</id><content type="html" xml:base="http://robertelwell.info/2017/02/12/building-a-sentiment-ticker-with-raspberry-pi.html"><![CDATA[<p>I recently got my hands on a 
<a href="https://www.dexterindustries.com/grovepi-starter-kit/">GrovePi+ Starter Kit</a>
and the latest Raspberry Pi.
This toolkit is specifically designed to do IoT computing with a Raspberry Pi.
It gave me some inspiration to do something a little bit more 
outside-the-box than the kinds of projects they provide in the starter 
kit booklet – a little bit more art than science.</p>

<h3 id="visualizing-color-and-text-with-groves-lcd">Visualizing Color and Text with Grove’s LCD</h3>
<p>One of the components you get is an RGB LCD. Once you’ve plugged it into 
the Grove and have everything connected to the Pi, you can 
hop into a terminal on your Pi and use the libraries 
that the Grove comes with to control the screen.</p>

<p>The libraries are impressively easy. Sending text to the screen is as easy as:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">grove_rgb_lcd</span> <span class="k">as</span> <span class="n">screen</span>

<span class="n">screen</span><span class="p">.</span><span class="n">setRGB</span><span class="p">(</span><span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>  <span class="c1"># all red, for instance
</span><span class="n">screen</span><span class="p">.</span><span class="n">setText</span><span class="p">(</span><span class="s">"I'm going to show up on the screen!"</span><span class="p">)</span>
</code></pre></div></div>

<p>So what are some of the things you can do with a screen like this?
Grove has some examples related to its other sensors, such as 
temperature and humidity, which is cool. I’m definitely 
interested in integrating my <a href="https://github.com/SoCo/SoCo">Sonos</a> 
with the screen as a “now playing” display. But I wanted 
to find a more interesting way to play with the three 
dimensions of color.</p>

<h3 id="finding-and-visualizing-sentiment">Finding and Visualizing Sentiment</h3>
<p>This is where sentiment analysis comes in. 
There are a handful of different ways to analyze sentiment.
For instance, <a href="http://nlp.stanford.edu/sentiment/code.html">Stanford CoreNLP’s deep learning model</a>
charts sentiment on a one to five score. Last time I checked, 
it was among the best in class. But there’s a different kind of 
sentiment analysis that evaluates text along three different axes 
of intensity:
positive language, neutral language, and negative language.
Each value will range from zero to one.
This is great, because an RGB screen has three different values
we can play with, a minimum of zero to a maximum of 255. I used CJ Hutto’s
<a href="https://github.com/cjhutto/vaderSentiment">Vader Sentiment Analysis Project</a>, 
a reasonably recent and easy-to-use project geared towards 
social media text. Another perk of this project is that it has 
been incorporated into NLTK, which tends to be 
my go-to NLP toolkit for hobbyist stuff like this.</p>

<p>Using NLTK, I can get the sentiment data for a bit of 
text like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># assuming you already have the vader data downloaded
# otherwise, you will need to call nltk.download('vader')
</span><span class="kn">from</span> <span class="nn">nltk.sentiment.vader</span> <span class="kn">import</span> <span class="n">SentimentIntensityAnalyzer</span>
<span class="n">analyzer</span> <span class="o">=</span> <span class="n">SentimentIntensityAnalyzer</span><span class="p">()</span>
<span class="n">analyzer</span><span class="p">.</span><span class="n">polarity_scores</span><span class="p">(</span><span class="s">"This is the text I want to analyze"</span><span class="p">)</span>

<span class="c1"># will return a dict with intensity of 0.0 to 1.0 for: 
# 'neg' (negative)
# 'pos' (positive)
# 'neu' (neutral)
</span></code></pre></div></div>

<h3 id="twitter-streaming">Twitter Streaming</h3>
<p>Now I need some kind of textual source to analyze, of course.
<a href="https://github.com/sixohsix/twitter">The Python Twitter Client</a>
is perfect for this kind of a project. In particular, 
I found it very easy to work though the issues related to 
OAuth2 when attempting to initiate an authenticated session.</p>

<p>Assuming you have handled authentication correctly, you 
can use a <code class="language-plaintext highlighter-rouge">TwitterStream</code> instance to sample random 
tweets or filter against a specific topic. For my 
project, I’m filtering for the term “Atlanta”, the 
city I just moved to. Assuming you have an <code class="language-plaintext highlighter-rouge">OAuth</code> 
object correctly configured, that’s as easy as:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">tweet</span> <span class="ow">in</span> <span class="n">TwitterStream</span><span class="p">(</span><span class="n">auth</span><span class="o">=</span><span class="n">oauth_object</span><span class="p">).</span><span class="n">statuses</span><span class="p">.</span><span class="nb">filter</span><span class="p">(</span><span class="n">track</span><span class="o">=</span><span class="s">"atlanta"</span><span class="p">):</span>
  <span class="k">print</span> <span class="n">tweet</span><span class="p">[</span><span class="s">'text'</span><span class="p">]</span>  <span class="c1"># or anything else you want to do with a tweet!
</span></code></pre></div></div>

<h3 id="bringing-it-all-together">Bringing It All Together</h3>
<p>I’ve posted some code called the
<a href="https://github.com/relwell/grove-sentiment-scanner">Grove Sentiment Scanner</a>
that shows exactly how we combine these parts all together.
We search for the term “Atlanta” in the Twitter stream, and 
for each tweet, we can parse the sentiment. We take the 
zero to one intensity values for each axis, and transpose them 
to zero to 255 values corresponding to red (negative), 
green (positive), or blue (neutral).</p>

<h3 id="artists-statement">Artist’s Statement</h3>
<p>I’m hesitant to say there’s a ton of intrinsic utility in 
a project like this. But it’s a lot of fun, and I think there’s 
a bit of art involved, here.</p>

<h4 id="generating-empathy">Generating Empathy</h4>
<p>There is an internally consistent sense of synesthesia here between
what we see and what a bit of text is intended to make us feel.
An approach like this has the capacity to encourage empathy in unexpected ways,
as we adapt the visual components of our mind to reason over 
emotional and verbally symbolic components at the same time.
We can use the colors that we see to understand the underlying
emotion of the text before we finish processing a sentence.
This could serve as a tool of disambiguation emotional 
content in a text for individuals who have higher than 
average difficulty divining emotion from text.</p>

<h4 id="color-and-meaning">Color and Meaning</h4>
<p>We are grounded in some very basic sense of semiotics by 
mapping positivity to green and negativity to red. 
The stop light most immediately comes to mind as 
an artifact we use every day that takes advantage of this 
opposition of color and emotional intent.
Blue, as being part of how RGB generates the range of colors 
it does, is in many regards incidental. But we can look at 
how blue is used and understand that it does often 
communicate neutrality quite effective. This is the reason,
for instance, why blue is so often used as the color 
scheme for retail locations.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[I recently got my hands on a GrovePi+ Starter Kit and the latest Raspberry Pi. This toolkit is specifically designed to do IoT computing with a Raspberry Pi. It gave me some inspiration to do something a little bit more outside-the-box than the kinds of projects they provide in the starter kit booklet – a little bit more art than science.]]></summary></entry><entry><title type="html">Recovering a Fully Replicated SolrCloud Node After Data Loss</title><link href="http://robertelwell.info/2016/07/25/solrcloud-data-recovery.html" rel="alternate" type="text/html" title="Recovering a Fully Replicated SolrCloud Node After Data Loss" /><published>2016-07-25T01:12:14+00:00</published><updated>2016-07-25T01:12:14+00:00</updated><id>http://robertelwell.info/2016/07/25/solrcloud-data-recovery</id><content type="html" xml:base="http://robertelwell.info/2016/07/25/solrcloud-data-recovery.html"><![CDATA[<p>In my spare time, I maintain a multi-tenant, high-scale SolrCloud with indices measure in the terabytes per node.</p>
<p>The SolrCloud deployment consists of dozens of collections, each configured to have two shards with a replication factor of two. This was quite lucky for us recently, because when one of our nodes went down, all of them continued to be available, and we did not experience a service outage of any kind.</p>
<p>One of our nodes experienced disk failure that resulted in total data loss. This was denormalized data, but re-analysis is time-consuming. We already had the data replicated to other nodes that were working just fine. For the most part, we were interested in restoring the previous cluster to its former health with a fresh instance of Solr taking the place of the downed node. This node would have an identical image up to even the same hostname and IP.</p>
<p>I was somewhat surprised to find that there wasn't really an off-the-shelf solution to this problem, and not much came up when Googling. I searched for things like "solrcloud restore lost replicas", "solrcloud recover nodes", and there were few actionable results.</p>
<p>I was able to use <a href="https://github.com/solrcloudpy/solrcloudpy">Solrcloudpy</a>, a client library for Python, to identify downed nodes using the <a href="https://cwiki.apache.org/confluence/display/solr/Collections+API">new collections API</a>, and send instructions to the cluster to manually remove the orphaned replica listings, and then manually re-create those replicas to the same location. You can see how I did it in this <a href="https://gist.github.com/relwell/51aecaf7a435c68a1651872f0febbb5b">Gist</a>.</p>
<p>This kind of robustness and fault-tolerance is what makes SolrCloud one of my favorite distributed data stores to work with. Even for APIs without power-packed off-the-shelf clients, you can easily interact with them and understand how they should behave. This is largely thanks to the many talented maintainers and community members that continue to use and support it.</p>
<p>In many cases, I have simply used Python's requests library for simple JSON interactions with Solr, but sometimes it's nice to have better data modeling in your code. If you're a Python developer looking to get into SolrCloud, feel free to check out <a href="https://github.com/solrcloudpy/solrcloudpy">Solrcloudpy</a>, an intuitive client for working with multi-server, multi-tenant search deployments.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[In my spare time, I maintain a multi-tenant, high-scale SolrCloud with indices measure in the terabytes per node. The SolrCloud deployment consists of dozens of collections, each configured to have two shards with a replication factor of two. This was quite lucky for us recently, because when one of our nodes went down, all of them continued to be available, and we did not experience a service outage of any kind. One of our nodes experienced disk failure that resulted in total data loss. This was denormalized data, but re-analysis is time-consuming. We already had the data replicated to other nodes that were working just fine. For the most part, we were interested in restoring the previous cluster to its former health with a fresh instance of Solr taking the place of the downed node. This node would have an identical image up to even the same hostname and IP. I was somewhat surprised to find that there wasn't really an off-the-shelf solution to this problem, and not much came up when Googling. I searched for things like "solrcloud restore lost replicas", "solrcloud recover nodes", and there were few actionable results. I was able to use Solrcloudpy, a client library for Python, to identify downed nodes using the new collections API, and send instructions to the cluster to manually remove the orphaned replica listings, and then manually re-create those replicas to the same location. You can see how I did it in this Gist. This kind of robustness and fault-tolerance is what makes SolrCloud one of my favorite distributed data stores to work with. Even for APIs without power-packed off-the-shelf clients, you can easily interact with them and understand how they should behave. This is largely thanks to the many talented maintainers and community members that continue to use and support it. In many cases, I have simply used Python's requests library for simple JSON interactions with Solr, but sometimes it's nice to have better data modeling in your code. If you're a Python developer looking to get into SolrCloud, feel free to check out Solrcloudpy, an intuitive client for working with multi-server, multi-tenant search deployments.]]></summary></entry><entry><title type="html">Solrcloudpy 1.8 Released</title><link href="http://robertelwell.info/2016/03/13/solrcloudpy-1-8-released.html" rel="alternate" type="text/html" title="Solrcloudpy 1.8 Released" /><published>2016-03-13T03:42:55+00:00</published><updated>2016-03-13T03:42:55+00:00</updated><id>http://robertelwell.info/2016/03/13/solrcloudpy-1-8-released</id><content type="html" xml:base="http://robertelwell.info/2016/03/13/solrcloudpy-1-8-released.html"><![CDATA[<p>I do a lot of work with Solrcloudpy. It has a fantastic API for handling collections in Solrcloud, allowing you to shard your business logic across intelligent pivots and horizontally scale to an arbitrary size. This is super important stuff when you're in a multi-tenant, service-oriented world.</p>
<p>Keeping vital libraries like this up to date is important. As Solr evolves, we want to make sure you can continue to use Solrcloudpy with a minimum of disruption. There are other Solr libraries such as <a href="https://github.com/moonlitesolutions/SolrClient">SolrClient</a>, but this one in particular supports Python 3 only. If you have a high-scale infrastructure primarily in Python, you may be using Celery. A popular backend for Celery is RabbitMQ. The Celery community has no plans to add Python 3 support to their RabbitMQ driver, keeping many committed to 2.7 without changing backends, which has significant ramifications in operationalization, monitoring, and other production acceptance concerns. In other words, we've got a real hairy yak on our hands.</p>
<p>For this reason, I reached out to Didier Deshommes, the former maintainer of Solrcloudpy. His work was moving away from Solr, and he was looking for new maintainers. I worked with him to build out the <a href="https://github.com/solrcloudpy/solrcloudpy">Solrcloudpy organization in Github</a>. We collaborated on delivering version 1.8.</p>
<p>Version 1.8 includes the following features:</p>
<ul>
<li>Compatibility with later versions of Solr 5.x</li>
<li>Improved documentation for the specific purpose of adding code completion to popular IDEs</li>
<li>An improved test plan for multiple versions of Solr</li>
<li>Conformance to PEP8 standards</li>
</ul>
<p>Now as one of the library's primary maintainers, I plan on doing some work in the future to add functionality in a backwards-compatible and well-tested manner. We welcome contributors and look forward to everyone's involvement in the future of this project. Thanks again to Didier Deshommes for the great work that he's done with Solrcloudpy, and to all its other former contributors.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[I do a lot of work with Solrcloudpy. It has a fantastic API for handling collections in Solrcloud, allowing you to shard your business logic across intelligent pivots and horizontally scale to an arbitrary size. This is super important stuff when you're in a multi-tenant, service-oriented world. Keeping vital libraries like this up to date is important. As Solr evolves, we want to make sure you can continue to use Solrcloudpy with a minimum of disruption. There are other Solr libraries such as SolrClient, but this one in particular supports Python 3 only. If you have a high-scale infrastructure primarily in Python, you may be using Celery. A popular backend for Celery is RabbitMQ. The Celery community has no plans to add Python 3 support to their RabbitMQ driver, keeping many committed to 2.7 without changing backends, which has significant ramifications in operationalization, monitoring, and other production acceptance concerns. In other words, we've got a real hairy yak on our hands. For this reason, I reached out to Didier Deshommes, the former maintainer of Solrcloudpy. His work was moving away from Solr, and he was looking for new maintainers. I worked with him to build out the Solrcloudpy organization in Github. We collaborated on delivering version 1.8. Version 1.8 includes the following features: Compatibility with later versions of Solr 5.x Improved documentation for the specific purpose of adding code completion to popular IDEs An improved test plan for multiple versions of Solr Conformance to PEP8 standards Now as one of the library's primary maintainers, I plan on doing some work in the future to add functionality in a backwards-compatible and well-tested manner. We welcome contributors and look forward to everyone's involvement in the future of this project. Thanks again to Didier Deshommes for the great work that he's done with Solrcloudpy, and to all its other former contributors.]]></summary></entry><entry><title type="html">Deep Learning on Long-Form UGC at Scale</title><link href="http://robertelwell.info/2016/02/19/deep-learning-ugc.html" rel="alternate" type="text/html" title="Deep Learning on Long-Form UGC at Scale" /><published>2016-02-19T09:14:19+00:00</published><updated>2016-02-19T09:14:19+00:00</updated><id>http://robertelwell.info/2016/02/19/deep-learning-ugc</id><content type="html" xml:base="http://robertelwell.info/2016/02/19/deep-learning-ugc.html"><![CDATA[<p>Not too long ago, I was a senior software engineer at Wikia -- the <em>other</em> company run by Jimmy Wales. This was the company that took the MediaWiki platform that Wikipedia uses, and scaled it to create communities for countless interests.</p>
<p>I had just completed a stint focusing on improving Wikia's <a href="https://github.com/Wikia/app/tree/dev/extensions/wikia/Search">search functionality</a>.  What I really wanted to focus on was finding ways to deliver interesting product features directly from the content provided by our user communities. This would allow me to apply my talents in machine learning and natural language processing to put a greater degree of intelligence behind the MediaWiki platform.</p>
<p>I led a very small team, and built a lot of fun technologies. Little made it to production, but we had some very interesting results and learned a great deal from what did. During this time, Grant Ingersoll, one of the authors of Taming Text, reached out to us to talk about how we were using Solr and other natural language processing technologies as part of the data processing pipeline we were building. This prompted us to write a chapter for a possible second edition of Taming Text. We got great feedback from Grant on the chapter, but after about two years, the project has stalled a bit.</p>
<p>With Grant's approval, I'm making this chapter available here for free for the first time: <a href="http://robertelwell.info/assets/doc/ElwellKunerChongWikiaTamingText.pdf" target="_blank">A High­Scale Deep Learning Pipeline for Identifying Similarities in Online Communities</a>.</p>
<p>It's always fun to look back at projects like this and ask what I'd do differently. I probably should have used a Storm topology for the CoreNLP parsing component. That would have made things a bit more reliable (about as reliable as Storm, though, I guess). I definitely think a lot of the threaded functions should have probably gone into Celery or some other worker processing platform. The amount we got done with just <a href="https://github.com/boto/boto">boto</a>, the current literature, and tens of millions of English articles was tons of fun, though.</p>
<p>I'd also like to give a quick thank you to Murad Salahi, whose article <a href="https://scripted.com/scripted-updates/nlp-hacking-in-python/" target="_blank">Teaching a Computer How To Read</a> provided much inspiration for what we ultimately accomplished at Wikia. Not to mention, thanks to my co-authors John Kuner and Tristan Chong for providing valuable contributions to the paper and the work it outlined.</p>
<p>If you want to take a deeper dive, most of the code for this project is still available online as part of Wikia's <a href="https://github.com/Wikia/data-science-toolkit">Data Science Toolkit</a> project.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[Not too long ago, I was a senior software engineer at Wikia -- the other company run by Jimmy Wales. This was the company that took the MediaWiki platform that Wikipedia uses, and scaled it to create communities for countless interests. I had just completed a stint focusing on improving Wikia's search functionality.  What I really wanted to focus on was finding ways to deliver interesting product features directly from the content provided by our user communities. This would allow me to apply my talents in machine learning and natural language processing to put a greater degree of intelligence behind the MediaWiki platform. I led a very small team, and built a lot of fun technologies. Little made it to production, but we had some very interesting results and learned a great deal from what did. During this time, Grant Ingersoll, one of the authors of Taming Text, reached out to us to talk about how we were using Solr and other natural language processing technologies as part of the data processing pipeline we were building. This prompted us to write a chapter for a possible second edition of Taming Text. We got great feedback from Grant on the chapter, but after about two years, the project has stalled a bit. With Grant's approval, I'm making this chapter available here for free for the first time: A High­Scale Deep Learning Pipeline for Identifying Similarities in Online Communities. It's always fun to look back at projects like this and ask what I'd do differently. I probably should have used a Storm topology for the CoreNLP parsing component. That would have made things a bit more reliable (about as reliable as Storm, though, I guess). I definitely think a lot of the threaded functions should have probably gone into Celery or some other worker processing platform. The amount we got done with just boto, the current literature, and tens of millions of English articles was tons of fun, though. I'd also like to give a quick thank you to Murad Salahi, whose article Teaching a Computer How To Read provided much inspiration for what we ultimately accomplished at Wikia. Not to mention, thanks to my co-authors John Kuner and Tristan Chong for providing valuable contributions to the paper and the work it outlined. If you want to take a deeper dive, most of the code for this project is still available online as part of Wikia's Data Science Toolkit project.]]></summary></entry><entry><title type="html">Getting All Distinct Values From Solr</title><link href="http://robertelwell.info/2015/09/25/solr-facet-all-distinct-values.html" rel="alternate" type="text/html" title="Getting All Distinct Values From Solr" /><published>2015-09-25T06:47:47+00:00</published><updated>2015-09-25T06:47:47+00:00</updated><id>http://robertelwell.info/2015/09/25/solr-facet-all-distinct-values</id><content type="html" xml:base="http://robertelwell.info/2015/09/25/solr-facet-all-distinct-values.html"><![CDATA[<p>If you're looking to try to get all distinct values for a field, Solr has a great functionality called <a href="https://cwiki.apache.org/confluence/display/solr/Faceting">faceting</a> that does most of the work for you.</p>
<p>If you're using any kind of sufficiently large set of data, where your number of distinct values for a particular field is in the range of thousands or hundreds of thousands, getting all values can be hard.</p>
<p>The traditional approach is to serially iterate or paginate using a fixed limit and a growing offset. This can be a problem at any kind of reasonable scale. This amount of serialization requires you to put the latency for all of your queries end to end, providing an unnecessarily large worst-case time complexity.</p>
<p>There are a lot smarter ways to do things these days, and each individual approach can be boiled down to the use of asynchronous processing. It's easy to asynchronously process a paginated dataset and maintain its order assuming you know the size of the data first.</p>
<p>The problem then becomes determining the number of distinct values on the client side within a document set without first iterating over each value (or asking Solr for the max, which is complicated, and is often difficult to implement client-side). We will need to make a series of educated guesses that should work faster than iterating serially.</p>
<p>I've created the following quick-and-dirty heuristic to get the maximum number of distinct values for a resultset where you can easily determine the total number of documents:</p>
<p>Start with the total number of documents in the result set as your assumed number of distinct values (we'll call this the expected max).  Attempt to retrieve that value for the facet offset at the expected from solr, with and a sufficiently large set of values as your limit. If you don't find any values, cut the expected max in half and start over with that value as the expected max. If you do find values, and the number of values is as large as the limit requested, then set your expected max to 1.15% its current value and start over.</p>
<p>Once you have retrieved a result set where the number of values returned is less than the limit provided, you can add the length of the available values to your expected max, and you now know the number of distinct values. You should manage to do this with a lot fewer requests than normally, too.</p>
<p>Once you have the the max, you can retrieve all distinct values asynchronously client-side, because you can now derive all the arguments required to invoke or enqueue requests against each result set as you retrieve them from Solr.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[If you're looking to try to get all distinct values for a field, Solr has a great functionality called faceting that does most of the work for you. If you're using any kind of sufficiently large set of data, where your number of distinct values for a particular field is in the range of thousands or hundreds of thousands, getting all values can be hard. The traditional approach is to serially iterate or paginate using a fixed limit and a growing offset. This can be a problem at any kind of reasonable scale. This amount of serialization requires you to put the latency for all of your queries end to end, providing an unnecessarily large worst-case time complexity. There are a lot smarter ways to do things these days, and each individual approach can be boiled down to the use of asynchronous processing. It's easy to asynchronously process a paginated dataset and maintain its order assuming you know the size of the data first. The problem then becomes determining the number of distinct values on the client side within a document set without first iterating over each value (or asking Solr for the max, which is complicated, and is often difficult to implement client-side). We will need to make a series of educated guesses that should work faster than iterating serially. I've created the following quick-and-dirty heuristic to get the maximum number of distinct values for a resultset where you can easily determine the total number of documents: Start with the total number of documents in the result set as your assumed number of distinct values (we'll call this the expected max).  Attempt to retrieve that value for the facet offset at the expected from solr, with and a sufficiently large set of values as your limit. If you don't find any values, cut the expected max in half and start over with that value as the expected max. If you do find values, and the number of values is as large as the limit requested, then set your expected max to 1.15% its current value and start over. Once you have retrieved a result set where the number of values returned is less than the limit provided, you can add the length of the available values to your expected max, and you now know the number of distinct values. You should manage to do this with a lot fewer requests than normally, too. Once you have the the max, you can retrieve all distinct values asynchronously client-side, because you can now derive all the arguments required to invoke or enqueue requests against each result set as you retrieve them from Solr.]]></summary></entry><entry><title type="html">Analyzing CoreNLP XML Output with Python</title><link href="http://robertelwell.info/2014/06/05/corenlp-python.html" rel="alternate" type="text/html" title="Analyzing CoreNLP XML Output with Python" /><published>2014-06-05T19:53:49+00:00</published><updated>2014-06-05T19:53:49+00:00</updated><id>http://robertelwell.info/2014/06/05/corenlp-python</id><content type="html" xml:base="http://robertelwell.info/2014/06/05/corenlp-python.html"><![CDATA[<p>After the discussion generated from my <a href="http://robertelwell.info/blog/exist-db-not-ready-for-high-scale/">last post</a>, I've come to realize that the solution I have in place for analyzing millions of pages of XML parse data is pretty useful, and relatively performant. Because of this, I've decided to share the library I've been using with a broader audience.</p>
<p>Today I'm celebrating two events: my 30th birthday (woop woop), and the 1.0 release of the <a href="http://corenlp-xml-library.readthedocs.org/en/latest/index.html">Python CoreNLP XML Library</a>.<br />
<!--more--></p>
<p><a href="https://github.com/relwell/corenlp-xml-lib"><b>corenlp_xml</b> is a Python library that provides a data model on top of Stanford CoreNLP's XML output</a>. You can install it using pip. <b>corenlp_xml</b> uses lxml and lazy-loading techniques for high-performance querying and data access capabilities. It uses NLTK's tree parsing capabilities to provide additional interactions against the XML's S-expression sentence parse node.</p>
<p>I've used <b>corenlp_xml</b> to solve the following problems at high scale:</p>
<ul>
<li>Recursively identify all noun phrases for each sentence in a document</li>
<li>Cross-reference proper noun phrases with coreference mentions, getting a full mention count in a document for a given entity</li>
<li>Identify interactions between subject and object using dependency parse data</li>
<li>Get all semantic heads for a given document</li>
</ul>
<p>More information is available on <a href="http://corenlp-xml-library.readthedocs.org/en/latest/index.html">corenlp_xml's ReadTheDocs page</a>. This library should be a great help to anyone who wants the power and accuracy of CoreNLP's parsing output, but is interested in using Python's fast numerical computing affordances  for further analytics, data science, or machine learning.</p>
<p>If you think the library would be useful for you, please feel free to <a href="https://github.com/relwell/corenlp-xml-lib">contribute back to the project</a>, or <a href="http://lucenerevolution.uservoice.com/forums/202013-use-case-track/suggestions/5995369-high-scale-search-and-data-science">vote for my upcoming Lucene/Solr Revolution talk</a>.</p>]]></content><author><name>{&quot;login&quot;=&gt;&quot;relwell_admin&quot;, &quot;email&quot;=&gt;&quot;relwell@robertelwell.info&quot;, &quot;display_name&quot;=&gt;&quot;Robert&quot;, &quot;first_name&quot;=&gt;&quot;Robert&quot;, &quot;last_name&quot;=&gt;&quot;Elwell&quot;}</name><email>relwell@robertelwell.info</email></author><summary type="html"><![CDATA[After the discussion generated from my last post, I've come to realize that the solution I have in place for analyzing millions of pages of XML parse data is pretty useful, and relatively performant. Because of this, I've decided to share the library I've been using with a broader audience. Today I'm celebrating two events: my 30th birthday (woop woop), and the 1.0 release of the Python CoreNLP XML Library. corenlp_xml is a Python library that provides a data model on top of Stanford CoreNLP's XML output. You can install it using pip. corenlp_xml uses lxml and lazy-loading techniques for high-performance querying and data access capabilities. It uses NLTK's tree parsing capabilities to provide additional interactions against the XML's S-expression sentence parse node. I've used corenlp_xml to solve the following problems at high scale: Recursively identify all noun phrases for each sentence in a document Cross-reference proper noun phrases with coreference mentions, getting a full mention count in a document for a given entity Identify interactions between subject and object using dependency parse data Get all semantic heads for a given document More information is available on corenlp_xml's ReadTheDocs page. This library should be a great help to anyone who wants the power and accuracy of CoreNLP's parsing output, but is interested in using Python's fast numerical computing affordances for further analytics, data science, or machine learning. If you think the library would be useful for you, please feel free to contribute back to the project, or vote for my upcoming Lucene/Solr Revolution talk.]]></summary></entry></feed>