Issue using ANY ARRAY feature

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue using ANY ARRAY feature

Simon Mottram
Phoenix Version: 4.14.2-HBase-1.4
HBase Version: AWS EMR
Release
label:emr-5.24.1
Hadoop distribution:Amazon
Applications:Phoenix 4.14.1,
Hue 4.4.0, HBase 1.4.9


Having an issue where ANY(ARRAY[]) stops the query returning any
results when used in a 'AND' conjunction

e.g (fielda = 'a') AND (fieldb = any(array['a','b','c]))

Always returns zero results, if i change to disjunction (OR) it works
fine but obviously isn't what's wanted here.  Excuse the long post but
I wanted to be as clear as possible.

It's quite possible I have misunderstood the way ANY() works but...

Here's a simple query that returns a correct number of results:

SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
 FROM DEV_OAPI.OBSERVATION
WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME = 'TRIAL00015')  
 ORDER BY OBSERVATIONDATE DESC
 LIMIT 10
 OFFSET 0

So there's definitely records where biomaterial name and trialname have
these values

If I change it to

SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
 FROM DEV_OAPI.OBSERVATION
WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))  
 ORDER BY OBSERVATIONDATE DESC
 LIMIT 10
 OFFSET 0

I get valid results

So the ANY(ARRAY[]) function works

Here's the explain which looks very odd to more, but it works

PLAN                                                                  
                                                                       
CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN OVER
DEV_OAPI.OBSERVATION                                                  
                                                                       
SERVER FILTER BY
org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
[children=[ARRAY['TRIAL00015'], TRIALNAME =
org.apache.phoenix.expression.function.ArrayElemRefExpression
[children=[ARRAY['TRIAL00015'], 1]]]]
    SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE
DESC]                                                                 C
CLIENT MERGE
SORT                                                                  
CLIENT LIMIT 10    

So far so good, BUT.

However if I combine the ARRAY expression with any expression using AND
I get zero results, even tho as above both sides of the conjunction
return true.

e.g.

SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
 FROM DEV_OAPI.OBSERVATION
WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
 ORDER BY OBSERVATIONDATE DESC
 LIMIT 10
 OFFSET 0

Explain (newlines added):
 SERVER FILTER BY
 (BIOMATERIALNAME = 'SCION00424'
 AND
 org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
[children=[ARRAY['TRIAL00015','SOMETHING ELSE'],
 TRIALNAME =
org.apache.phoenix.expression.function.ArrayElemRefExpression
[children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]])

Just out of interest I tried with strings only in the array check

SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
 FROM DEV_OAPI.OBSERVATION
WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' =
ANY(ARRAY['TRIAL00015']))  
 ORDER BY OBSERVATIONDATE DESC
 LIMIT 10
 OFFSET 0

This works fine (in a kind of unhelpful way)

I have tested using the thick client:
        <!-- HBASE THICK CLIENT DEPS -->
        <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-core</artifactId>
            <version>4.14.2-HBase-1.4</version>
        </dependency>

and the thin client

         <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-queryserver-client</artifactId>
            <version>4.14.2-HBase-1.4</version>
        </dependency>

I've tried using braces and re-ordering but any query of the form:
<anyclause> AND field = ANY(ARRAY['value1'...])  
Returns zero results regardless of values

We can't change version as we are using the Amazon AWS EMR managed
stack and no other phoenix libraries work.

Thanks for taking the time to read this far!

Cheers

Simon
Reply | Threaded
Open this post in threaded view
|

Re: Issue using ANY ARRAY feature

Simon Mottram
Update:

Just in case anyone hits this issue in future with the AWS managed
HBase, the fix is to use a very specific version of the driver

For thick client:

        <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-core</artifactId>
            <version>4.14.1-HBase-1.4</version>
        </dependency>

For thin client:

        <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-queryserver-client</artifactId>
            <version>4.14.1-HBase-1.4</version>
        </dependency>

This required support help from AWS as this driver version is not
mentioned on the official Apache Phoenix download page

Regards

Simon


On Sun, 2019-11-17 at 21:03 +0000, Simon Mottram wrote:

> Phoenix Version: 4.14.2-HBase-1.4
> HBase Version: AWS EMR
> Release
> label:emr-5.24.1
> Hadoop distribution:Amazon
> Applications:Phoenix 4.14.1,
> Hue 4.4.0, HBase 1.4.9
>
>
> Having an issue where ANY(ARRAY[]) stops the query returning any
> results when used in a 'AND' conjunction
>
> e.g (fielda = 'a') AND (fieldb = any(array['a','b','c]))
>
> Always returns zero results, if i change to disjunction (OR) it works
> fine but obviously isn't what's wanted here.  Excuse the long post
> but
> I wanted to be as clear as possible.
>
> It's quite possible I have misunderstood the way ANY() works but...
>
> Here's a simple query that returns a correct number of results:
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
> 'TRIAL00015')  
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
>
> So there's definitely records where biomaterial name and trialname
> have
> these values
>
> If I change it to
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))  
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
>
> I get valid results
>
> So the ANY(ARRAY[]) function works
>
> Here's the explain which looks very odd to more, but it works
>
> PLAN                                                                
>  
>                                                                      
>  
> CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN
> OVER
> DEV_OAPI.OBSERVATION                                                
>  
>                                                                      
>  
> SERVER FILTER BY
> org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
> [children=[ARRAY['TRIAL00015'], TRIALNAME =
> org.apache.phoenix.expression.function.ArrayElemRefExpression
> [children=[ARRAY['TRIAL00015'], 1]]]]
>     SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE
> DESC]                                                                
>  C
> CLIENT MERGE
> SORT                                                                
>  
> CLIENT LIMIT 10    
>
> So far so good, BUT.
>
> However if I combine the ARRAY expression with any expression using
> AND
> I get zero results, even tho as above both sides of the conjunction
> return true.
>
> e.g.
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
> ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
>
> Explain (newlines added):
>  SERVER FILTER BY
>  (BIOMATERIALNAME = 'SCION00424'
>  AND
>  org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'],
>  TRIALNAME =
> org.apache.phoenix.expression.function.ArrayElemRefExpression
> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]])
>
> Just out of interest I tried with strings only in the array check
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>  FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' =
> ANY(ARRAY['TRIAL00015']))  
>  ORDER BY OBSERVATIONDATE DESC
>  LIMIT 10
>  OFFSET 0
>
> This works fine (in a kind of unhelpful way)
>
> I have tested using the thick client:
>         <!-- HBASE THICK CLIENT DEPS -->
>         <dependency>
>             <groupId>org.apache.phoenix</groupId>
>             <artifactId>phoenix-core</artifactId>
>             <version>4.14.2-HBase-1.4</version>
>         </dependency>
>
> and the thin client
>
>          <dependency>
>             <groupId>org.apache.phoenix</groupId>
>             <artifactId>phoenix-queryserver-client</artifactId>
>             <version>4.14.2-HBase-1.4</version>
>         </dependency>
>
> I've tried using braces and re-ordering but any query of the form:
> <anyclause> AND field = ANY(ARRAY['value1'...])  
> Returns zero results regardless of values
>
> We can't change version as we are using the Amazon AWS EMR managed
> stack and no other phoenix libraries work.
>
> Thanks for taking the time to read this far!
>
> Cheers
>
> Simon
Reply | Threaded
Open this post in threaded view
|

Re: Issue using ANY ARRAY feature

Josh Elser-2
Hi Simon,

Thanks for replying back with your fix. We appreciate when folks do this
so that others can also see the solution.

http://phoenix.apache.org/download.html only publishes the "latest"
release for a line that we're maintaining. That's why you'll see 4.14.3
listed on the website, not 4.14.0/1/2.

If you see a release x.y.z, you can reasonably assume[1] that you'll
also find release x.y.z' where z'=[0,z). We expect that compatibility is
maintained in the bugfix releases to some line, so there is no reason to
not update to the latest version.

- Josh

[1] The one caveat here is that if there is a security-issues, we may
explicitly pull a release from being downloaded.

On 12/8/19 6:12 PM, Simon Mottram wrote:

> Update:
>
> Just in case anyone hits this issue in future with the AWS managed
> HBase, the fix is to use a very specific version of the driver
>
> For thick client:
>
>          <dependency>
>              <groupId>org.apache.phoenix</groupId>
>              <artifactId>phoenix-core</artifactId>
>              <version>4.14.1-HBase-1.4</version>
>          </dependency>
>
> For thin client:
>
>          <dependency>
>              <groupId>org.apache.phoenix</groupId>
>              <artifactId>phoenix-queryserver-client</artifactId>
>              <version>4.14.1-HBase-1.4</version>
>          </dependency>
>
> This required support help from AWS as this driver version is not
> mentioned on the official Apache Phoenix download page
>
> Regards
>
> Simon
>
>
> On Sun, 2019-11-17 at 21:03 +0000, Simon Mottram wrote:
>> Phoenix Version: 4.14.2-HBase-1.4
>> HBase Version: AWS EMR
>> Release
>> label:emr-5.24.1
>> Hadoop distribution:Amazon
>> Applications:Phoenix 4.14.1,
>> Hue 4.4.0, HBase 1.4.9
>>
>>
>> Having an issue where ANY(ARRAY[]) stops the query returning any
>> results when used in a 'AND' conjunction
>>
>> e.g (fielda = 'a') AND (fieldb = any(array['a','b','c]))
>>
>> Always returns zero results, if i change to disjunction (OR) it works
>> fine but obviously isn't what's wanted here.  Excuse the long post
>> but
>> I wanted to be as clear as possible.
>>
>> It's quite possible I have misunderstood the way ANY() works but...
>>
>> Here's a simple query that returns a correct number of results:
>>
>> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>>   FROM DEV_OAPI.OBSERVATION
>> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
>> 'TRIAL00015')
>>   ORDER BY OBSERVATIONDATE DESC
>>   LIMIT 10
>>   OFFSET 0
>>
>> So there's definitely records where biomaterial name and trialname
>> have
>> these values
>>
>> If I change it to
>>
>> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>>   FROM DEV_OAPI.OBSERVATION
>> WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
>>   ORDER BY OBSERVATIONDATE DESC
>>   LIMIT 10
>>   OFFSET 0
>>
>> I get valid results
>>
>> So the ANY(ARRAY[]) function works
>>
>> Here's the explain which looks very odd to more, but it works
>>
>> PLAN
>>    
>>                                                                      
>>    
>> CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN
>> OVER
>> DEV_OAPI.OBSERVATION
>>    
>>                                                                      
>>    
>> SERVER FILTER BY
>> org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
>> [children=[ARRAY['TRIAL00015'], TRIALNAME =
>> org.apache.phoenix.expression.function.ArrayElemRefExpression
>> [children=[ARRAY['TRIAL00015'], 1]]]]
>>      SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE
>> DESC]
>>   C
>> CLIENT MERGE
>> SORT
>>    
>> CLIENT LIMIT 10
>>
>> So far so good, BUT.
>>
>> However if I combine the ARRAY expression with any expression using
>> AND
>> I get zero results, even tho as above both sides of the conjunction
>> return true.
>>
>> e.g.
>>
>> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>>   FROM DEV_OAPI.OBSERVATION
>> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
>> ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
>>   ORDER BY OBSERVATIONDATE DESC
>>   LIMIT 10
>>   OFFSET 0
>>
>> Explain (newlines added):
>>   SERVER FILTER BY
>>   (BIOMATERIALNAME = 'SCION00424'
>>   AND
>>   org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
>> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'],
>>   TRIALNAME =
>> org.apache.phoenix.expression.function.ArrayElemRefExpression
>> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]])
>>
>> Just out of interest I tried with strings only in the array check
>>
>> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
>>   FROM DEV_OAPI.OBSERVATION
>> WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' =
>> ANY(ARRAY['TRIAL00015']))
>>   ORDER BY OBSERVATIONDATE DESC
>>   LIMIT 10
>>   OFFSET 0
>>
>> This works fine (in a kind of unhelpful way)
>>
>> I have tested using the thick client:
>>          <!-- HBASE THICK CLIENT DEPS -->
>>          <dependency>
>>              <groupId>org.apache.phoenix</groupId>
>>              <artifactId>phoenix-core</artifactId>
>>              <version>4.14.2-HBase-1.4</version>
>>          </dependency>
>>
>> and the thin client
>>
>>           <dependency>
>>              <groupId>org.apache.phoenix</groupId>
>>              <artifactId>phoenix-queryserver-client</artifactId>
>>              <version>4.14.2-HBase-1.4</version>
>>          </dependency>
>>
>> I've tried using braces and re-ordering but any query of the form:
>> <anyclause> AND field = ANY(ARRAY['value1'...])
>> Returns zero results regardless of values
>>
>> We can't change version as we are using the Amazon AWS EMR managed
>> stack and no other phoenix libraries work.
>>
>> Thanks for taking the time to read this far!
>>
>> Cheers
>>
>> Simon