Why we change index state to PENDING_DISABLE on RegionMovedException

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Why we change index state to PENDING_DISABLE on RegionMovedException

Batyrshin Alexander
As I know RegionMovedException is not a problem at all, its just notification that we need to update meta information about table regions and retry.
Why we do extra work with changing state of index?

2019-09-10 22:35:00,764 WARN  [hconnection-0x4a63b6ea-shared--pool10-t961] client.AsyncProcess: #41, table=IDX_TABLE, attempt=1/1 failed=1ops, last exception: org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=prod023 port=60020 startCode=1568139705179. As
 of locationSeqNum=93740117. on prod027,60020,1568142287280, tracking started Tue Sep 10 22:35:00 MSK 2019; not retrying 1 - final failure
2019-09-10 22:35:00,789 INFO  [RpcServer.default.FPBQ.Fifo.handler=170,queue=10,port=60020] index.PhoenixIndexFailurePolicy: Successfully update INDEX_DISABLE_TIMESTAMP for IDX_TABLE due to an exception while writing updates. indexState=PENDING_DISABLE
org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:  disableIndexOnFailure=true, Failed to write to multiple index tables: [IDX_TABLE]
        at org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:236)
        at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
        at org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:614)
        at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:589)
        at org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:572)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3677)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3138)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3080)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2406)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
Reply | Threaded
Open this post in threaded view
|

Re: Why we change index state to PENDING_DISABLE on RegionMovedException

Vincent Poon-2
Normally you're right, this should get retried at the HBase layer and would be transparent.  However as part of PHOENIX-4130, we have the hbase client only try the write once, so there's no chance to retry.  We did that to avoid tying up rpc handlers on the server.
Instead, we retry the entire Phoenix mutation from the client side.  The index is put into "PENDING_DISABLE", so that if the next write succeeds, it can flip back to "ACTIVE".

On Tue, Sep 10, 2019 at 2:29 PM Alexander Batyrshin <[hidden email]> wrote:
As I know RegionMovedException is not a problem at all, its just notification that we need to update meta information about table regions and retry.
Why we do extra work with changing state of index?

2019-09-10 22:35:00,764 WARN  [hconnection-0x4a63b6ea-shared--pool10-t961] client.AsyncProcess: #41, table=IDX_TABLE, attempt=1/1 failed=1ops, last exception: org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=prod023 port=60020 startCode=1568139705179. As
 of locationSeqNum=93740117. on prod027,60020,1568142287280, tracking started Tue Sep 10 22:35:00 MSK 2019; not retrying 1 - final failure
2019-09-10 22:35:00,789 INFO  [RpcServer.default.FPBQ.Fifo.handler=170,queue=10,port=60020] index.PhoenixIndexFailurePolicy: Successfully update INDEX_DISABLE_TIMESTAMP for IDX_TABLE due to an exception while writing updates. indexState=PENDING_DISABLE
org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:  disableIndexOnFailure=true, Failed to write to multiple index tables: [IDX_TABLE]
        at org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:236)
        at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
        at org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:614)
        at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:589)
        at org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:572)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3677)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3138)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3080)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2406)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
Reply | Threaded
Open this post in threaded view
|

Re: Why we change index state to PENDING_DISABLE on RegionMovedException

Geoffrey Jacoby
Just wanted to add that in the new index architecture recently introduced in Phoenix 4.14.3 and the forthcoming 4.15, the index stays in ACTIVE state even if there's a write failure, and the index will be transparently repaired the next time someone reads from the affected keyrange. From the client perspective indexes will always be in sync. Indexes created using the older index framework will still work, but will need to be upgraded to the new framework with the IndexUpgradeTool in order to benefit from the new behavior. 

We'll be updating the docs on the website soon to reflect that; in the meantime you can look at PHOENIX-5156 and PHOENIX-5211 if you'd like more details. 

Geoffrey

On Tue, Sep 10, 2019 at 3:02 PM Vincent Poon <[hidden email]> wrote:
Normally you're right, this should get retried at the HBase layer and would be transparent.  However as part of PHOENIX-4130, we have the hbase client only try the write once, so there's no chance to retry.  We did that to avoid tying up rpc handlers on the server.
Instead, we retry the entire Phoenix mutation from the client side.  The index is put into "PENDING_DISABLE", so that if the next write succeeds, it can flip back to "ACTIVE".

On Tue, Sep 10, 2019 at 2:29 PM Alexander Batyrshin <[hidden email]> wrote:
As I know RegionMovedException is not a problem at all, its just notification that we need to update meta information about table regions and retry.
Why we do extra work with changing state of index?

2019-09-10 22:35:00,764 WARN  [hconnection-0x4a63b6ea-shared--pool10-t961] client.AsyncProcess: #41, table=IDX_TABLE, attempt=1/1 failed=1ops, last exception: org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=prod023 port=60020 startCode=1568139705179. As
 of locationSeqNum=93740117. on prod027,60020,1568142287280, tracking started Tue Sep 10 22:35:00 MSK 2019; not retrying 1 - final failure
2019-09-10 22:35:00,789 INFO  [RpcServer.default.FPBQ.Fifo.handler=170,queue=10,port=60020] index.PhoenixIndexFailurePolicy: Successfully update INDEX_DISABLE_TIMESTAMP for IDX_TABLE due to an exception while writing updates. indexState=PENDING_DISABLE
org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:  disableIndexOnFailure=true, Failed to write to multiple index tables: [IDX_TABLE]
        at org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:236)
        at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
        at org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:614)
        at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:589)
        at org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:572)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3677)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3138)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3080)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2406)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
Reply | Threaded
Open this post in threaded view
|

Re: Why we change index state to PENDING_DISABLE on RegionMovedException

Batyrshin Alexander
Its looks premises, but we need to do write performance evaluation old indexes vs new one before we can go with this update.

On 11 Sep 2019, at 01:15, Geoffrey Jacoby <[hidden email]> wrote:

Just wanted to add that in the new index architecture recently introduced in Phoenix 4.14.3 and the forthcoming 4.15, the index stays in ACTIVE state even if there's a write failure, and the index will be transparently repaired the next time someone reads from the affected keyrange. From the client perspective indexes will always be in sync. Indexes created using the older index framework will still work, but will need to be upgraded to the new framework with the IndexUpgradeTool in order to benefit from the new behavior. 

We'll be updating the docs on the website soon to reflect that; in the meantime you can look at PHOENIX-5156 and PHOENIX-5211 if you'd like more details. 

Geoffrey

On Tue, Sep 10, 2019 at 3:02 PM Vincent Poon <[hidden email]> wrote:
Normally you're right, this should get retried at the HBase layer and would be transparent.  However as part of PHOENIX-4130, we have the hbase client only try the write once, so there's no chance to retry.  We did that to avoid tying up rpc handlers on the server.
Instead, we retry the entire Phoenix mutation from the client side.  The index is put into "PENDING_DISABLE", so that if the next write succeeds, it can flip back to "ACTIVE".

On Tue, Sep 10, 2019 at 2:29 PM Alexander Batyrshin <[hidden email]> wrote:
As I know RegionMovedException is not a problem at all, its just notification that we need to update meta information about table regions and retry.
Why we do extra work with changing state of index?

2019-09-10 22:35:00,764 WARN  [hconnection-0x4a63b6ea-shared--pool10-t961] client.AsyncProcess: #41, table=IDX_TABLE, attempt=1/1 failed=1ops, last exception: org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: hostname=prod023 port=60020 startCode=1568139705179. As
 of locationSeqNum=93740117. on prod027,60020,1568142287280, tracking started Tue Sep 10 22:35:00 MSK 2019; not retrying 1 - final failure
2019-09-10 22:35:00,789 INFO  [RpcServer.default.FPBQ.Fifo.handler=170,queue=10,port=60020] index.PhoenixIndexFailurePolicy: Successfully update INDEX_DISABLE_TIMESTAMP for IDX_TABLE due to an exception while writing updates. indexState=PENDING_DISABLE
org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:  disableIndexOnFailure=true, Failed to write to multiple index tables: [IDX_TABLE]
        at org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:236)
        at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
        at org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
        at org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:614)
        at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:589)
        at org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:572)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
        at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3677)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3138)
        at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3080)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2406)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)