When it crashes it throws errors saying "Too many open files". Running lsof showed it wasn't actually open files but thousands of orphaned sockets left open. The sockets looked like this in the lsof output:
java 2428 root 2173u sock 0,7 0t0 123291433 can't identify protocol
There won't be anything listed in netstat. These sockets don't have open connections to anything. The Solr log file will start showing errors similar to this:
SEVERE: java.io.FileNotFoundException: /usr/local/apache-solr-3.5.0/example/solr/data/index/_dgf.frq (Too many open files)
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock
Initially we dealt with this problem by monitoring the number of open files for the java process and running a reindex when it got close to the limit. Not a great solution but at the time there weren't enough hours in the day to put a bunch of effort into figuring this out. In my case the limit when Solr blew up was 4000 open sockets. Once Solr had that many sockets open it would just throw 500 errors.
Usually the answer to a situation like this is upgrade Solr to a newer version. Unfortunately I couldn't do that in this case because we have a ruby gem that is dependent on Solr version 3.5. My research pointed to Jetty as the source problem and not Solr. Once I found this post I knew for sure Jetty was causing the orphaned sockets. Solr 3.5.0 is packaged with Jetty 6.1.26 which has a bug that causes the orphaned sockets under certain conditions. Because Jetty 6 is fairly old the developers are not going to fix it. At this point I set about upgrading Jetty to version 7.
The first thing I had to figure out was what stuff was Solr and what stuff was Jetty. Turns out most of the package is Jetty. Solr is contained in apache-solr-3.5.0/example/solr and apache-solr-3.5.0/example/webapps/solr.war. So I decided to try and stuff Solr 3.5.0 into Jetty 7.6.13. Later I may try moving to the latest version of Jetty 9 but I'm just trying to solve this orphaned socket problem right now and was worried the older version of Solr might have problems with a newer Jetty.
Upgrading JettyHere are the steps I took to upgrade Solr 3.5.0 to Jetty 7
Download latest Jetty 7 (jetty-distribution-7.6.13.v20130916.tar.gz at the time this was written) from here http://download.eclipse.org/jetty/7.6.13.v20130916/dist/
tar xfvz jetty-distribution-7.6.13.v20130916.tar.gz
Create destination directory for all the new files
copy the contents of jetty-distribution-7.6.13.v20130916 to new directory
cp -a jetty-distribution-7.6.13.v20130916/* /usr/local/apache-solr-3.5.0-jetty-7.6.13/example
Copy solr files from old solr installation to new Jetty directory
cp -a /usr/local/apache-solr-3.5.0/example/solr /usr/local/apache-solr-3.5.0-jetty-7.6.13/example
cp -a /usr/local/apache-solr-3.5.0/example/webapps/solr.war /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/webapps/
Edit the jetty.xml config file to change the listening port
Change this line
<Set name="port"><Property name="jetty.port" default="8080"/></Set>
<Set name="port"><Property name="jetty.port" default="8983"/></Set>
At this point solr will run but there are some example war files and config files that aren't needed for Solr and should be cleaned up.
- Edit /usr/local/apache-solr-3.5.0-jetty-7.6.13/example/start.ini
Comment out the line
so it reads
- Clean up example war files
mv test.war spdy.war BAK
mv jetty-spdy.xml jetty-spdy-proxy.xml jetty-testrealm.xml BAK
mv test.xml BAK
I use a symbolic link for the installation directory so the start script doesn't have to be modified. Before restarting I have to switch that sym link.
service solr stop
ln -s apache-solr-3.5.0-jetty-7.6.13 solr
service solr start
Then you can test hitting the service locally.
it should return html that says something like this:
<title>Welcome to Solr</title>
<h1>Welcome to Solr!</h1>
You will probably need to run a reindex if transactions have been taking place while solr was down for the upgrade.
Resources used to compile this post