Apache phoenix problem with order by and offset giving duplicate results in paging

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache phoenix problem with order by and offset giving duplicate results in paging

Lalit Jadhav
Hello,

I'm using order by with offset for paging in apache phoenix. I am getting a duplicate result on the next page.

I have a view of the HBase table in Apache phoenix. I am using TO_NUMBER() to convert my stringified column into an integer. I want to sort this converted column in descending order.

Example: I have an Hbase view like :

 ------------------------------
| TITLE      |   COUNT          |
  ------------------------------
| t8         | 10               |
  ------------------------------
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |

My Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 0;

Result :

+------------+------------------+
| TITLE      |   COUNT          |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t7         | 1                |
| t10        | 1                |

Next Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 5;

result :

+------------+------------------+
| TITLE      |        COUNT     |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Here t7 is repeated on both the result.

Expected result :

1st Query:-
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |

2nd Query
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Please help me to know what is the exact issue? Also, I want to know, is CURSOR can be used for the same purpose?.


---
Lalit Jadhav
Reply | Threaded
Open this post in threaded view
|

Re: Apache phoenix problem with order by and offset giving duplicate results in paging

Thomas D'Silva-2
Try including the title in the order by clause as well {order by TITLE, TO_NUMBER(COUNT) desc}. 
Using offset and limit for paging is not efficient when the table has a lot of rows. 
You try using row value constructors and ordering by the primary key column of the table (see https://phoenix.apache.org/paged.html)

On Sun, Nov 10, 2019 at 11:04 PM Lalit Jadhav <[hidden email]> wrote:
Hello,

I'm using order by with offset for paging in apache phoenix. I am getting a duplicate result on the next page.

I have a view of the HBase table in Apache phoenix. I am using TO_NUMBER() to convert my stringified column into an integer. I want to sort this converted column in descending order.

Example: I have an Hbase view like :

 ------------------------------
| TITLE      |   COUNT          |
  ------------------------------
| t8         | 10               |
  ------------------------------
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |

My Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 0;

Result :

+------------+------------------+
| TITLE      |   COUNT          |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t7         | 1                |
| t10        | 1                |

Next Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 5;

result :

+------------+------------------+
| TITLE      |        COUNT     |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Here t7 is repeated on both the result.

Expected result :

1st Query:-
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |

2nd Query
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Please help me to know what is the exact issue? Also, I want to know, is CURSOR can be used for the same purpose?.


---
Lalit Jadhav
Reply | Threaded
Open this post in threaded view
|

Re: Apache phoenix problem with order by and offset giving duplicate results in paging

Daniel Wong
Lalit,

I agree with what Thomas recommended though you'd likely want to append the PK to the sort so that you get a consistent result back.  That is order by TO_NUMBER(COUNT) desc, PK.  As the PK is totally ordered by definition your entire result set is totally ordered. Assuming COUNT is not part of the PK there is no consistent tie breaking amongst values with the same count.  This means that the result set you get back may have differences in order causing the behavior you noticed above.

As long as you can do these queries from a single thread, a Cursor could be more performant as you will not have to scan the required data multiple times, but today is similar to doing this in a single query without limit if I recall.
If you can order by an index or by PK, Thomas' suggestion will be the right way forward though there are some currently known issues there which I am addressing in https://issues.apache.org/jira/browse/PHOENIX-4845



On Tue, Nov 12, 2019 at 4:27 PM Thomas D'Silva <[hidden email]> wrote:
Try including the title in the order by clause as well {order by TITLE, TO_NUMBER(COUNT) desc}. 
Using offset and limit for paging is not efficient when the table has a lot of rows. 
You try using row value constructors and ordering by the primary key column of the table (see https://phoenix.apache.org/paged.html)

On Sun, Nov 10, 2019 at 11:04 PM Lalit Jadhav <[hidden email]> wrote:
Hello,

I'm using order by with offset for paging in apache phoenix. I am getting a duplicate result on the next page.

I have a view of the HBase table in Apache phoenix. I am using TO_NUMBER() to convert my stringified column into an integer. I want to sort this converted column in descending order.

Example: I have an Hbase view like :

 ------------------------------
| TITLE      |   COUNT          |
  ------------------------------
| t8         | 10               |
  ------------------------------
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |

My Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 0;

Result :

+------------+------------------+
| TITLE      |   COUNT          |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t7         | 1                |
| t10        | 1                |

Next Query:

Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5 offset 5;

result :

+------------+------------------+
| TITLE      |        COUNT     |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Here t7 is repeated on both the result.

Expected result :

1st Query:-
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t8         | 10               |
| t9         | 2                |
| t4         | 1                |
| t6         | 1                |
| t10        | 1                |

2nd Query
+------------+------------------+
| TITLE      |COUNT             |
+------------+------------------+
| t7         | 1                |
| t43        | 0                |
| t14        | 0                |
| t11        | 0                |
| t42        | 0                |
+------------+------------------+

Please help me to know what is the exact issue? Also, I want to know, is CURSOR can be used for the same purpose?.


---
Lalit Jadhav


--
Daniel Wong
Salesforce
Mobile: 628.217.1808