En algunas pruebas de alto rendimiento realizadas con el WSO2 BAM versión 2.4.0 podemos toparnos con un error del tipo broken pipe. Y quisiera dejarles la solución del mismo.
Antes les dejo los logs generados para que les sirva en la identificación del problema.
TID: [0] [BAM] [2014-02-03 10:17:39,593] ERROR {org.apache.hadoop.mapred.MapTask} - IO error in map input file file:/opt/wso2bam-2.4.0/repository/data/hive/warehouse-1234/mappingjmxdatatable {org.apache.hadoop.mapred.MapTask} TID: [0] [BAM] [2014-02-03 10:17:42,063] WARN {org.apache.hadoop.mapred.LocalJobRunner} - job_local_0001 {org.apache.hadoop.mapred.LocalJobRunner} java.io.IOException: IO error in map input file file:/opt/wso2bam-2.4.0/repository/data/hive/warehouse-1234/mappingjmxdatatable at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:242) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:211) Caused by: java.io.IOException: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:275) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236) ... 5 more Caused by: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.hadoop.hive.cassandra.input.ColumnFamilyRowRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRowRecordReader.java:619) at org.apache.hadoop.hive.cassandra.input.ColumnFamilyRowRecordReader$StaticRowIterator.computeNext(ColumnFamilyRowRecordReader.java:624) at org.apache.hadoop.hive.cassandra.input.ColumnFamilyRowRecordReader$StaticRowIterator.computeNext(ColumnFamilyRowRecordReader.java:556) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.hadoop.hive.cassandra.input.ColumnFamilyRowRecordReader.nextKeyValue(ColumnFamilyRowRecordReader.java:242) at org.apache.hadoop.hive.cassandra.input.CassandraHiveRecordReader.nextKeyValue(CassandraHiveRecordReader.java:182) at org.apache.hadoop.hive.cassandra.input.CassandraHiveRecordReader.next(CassandraHiveRecordReader.java:75) at org.apache.hadoop.hive.cassandra.input.CassandraHiveRecordReader.next(CassandraHiveRecordReader.java:22) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:273) ... 9 more Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:157) at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) at org.apache.cassandra.thrift.Cassandra$Client.send_get_range_slices(Cassandra.java:686) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:675) at org.apache.hadoop.hive.cassandra.input.ColumnFamilyRowRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRowRecordReader.java:583) ... 18 more Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) ... 23 more TID: [0] [BAM] [2014-02-03 10:17:42,296] ERROR {org.apache.hadoop.hive.ql.exec.ExecDriver} - Ended Job = job_local_0001 with errors {org.apache.hadoop.hive.ql.exec.ExecDriver} TID: [0] [BAM] [2014-02-03 10:17:42,302] ERROR {org.apache.hadoop.hive.ql.exec.ExecDriver} - Error during job, obtaining debugging information... {org.apache.hadoop.hive.ql.exec.ExecDriver}
SOLUCIÓN:
La solución es realmente sencilla. Solo basta localizar el fichero cassandra.yaml y modificar estas 2 líneas:
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
thrift_max_message_length_in_mb: 16
y dejarlas como sigue:
thrift_framed_transport_size_in_mb: 60
thrift_max_message_length_in_mb: 64
thrift_max_message_length_in_mb: 64
WSO2 BAM: Solucionando un error del tipo Broken Pipe.