Performance degradation on query analysis

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance degradation on query analysis

Stepan Migunov
Hi
We have an issue with our production environment - from time to time we notice a significant performance degradation for some queries. The strange thing is that the EXPLAIN operator for these queries takes the same time as queries execution (5 minutes or more). So, I guess, the issue is related to query's analysis but not data extraction. Is it possible that issue is related to SYSTEM.STATS access problem? Any other ideas?
Reply | Threaded
Open this post in threaded view
|

Re: Performance degradation on query analysis

Josh Elser-2
Can you share the output you see from the EXPLAIN? Does it differ
between times it's "fast" and times it's "slow"?

Sharing the table(s) DDL statements would also help, along with the
shape and version of your cluster (e.g. Apache Phoenix 4.14.2 with 8
RegionServers).

Spit-balling ideas:

Could be reads over the SYSTEM.CATALOG table or the SYSTEM.STATS table.

Have you looked more coarsely at the RegionServer logs/metrics? Any
obvious saturation issues (e.g. handlers consumed, JVM GC pauses, host
CPU saturation)?

Turn on DEBUG log4j client side (beware of chatty ZK logging) and see if
there's something obvious from when the EXPLAIN is slow.

On 9/17/19 3:58 AM, Stepan Migunov wrote:
> Hi
> We have an issue with our production environment - from time to time we notice a significant performance degradation for some queries. The strange thing is that the EXPLAIN operator for these queries takes the same time as queries execution (5 minutes or more). So, I guess, the issue is related to query's analysis but not data extraction. Is it possible that issue is related to SYSTEM.STATS access problem? Any other ideas?
>
Reply | Threaded
Open this post in threaded view
|

RE: Performance degradation on query analysis

Stepan Migunov
Thanks, Josh. The problem was really related to reading the SYSTEM.STATS
table.
There were only 8,000 rows in the table, but COUNT took more than 10
minutes. I noticed that the storage files (34) had a total size of 10 GB.

DELETE FROM SYSTEM.STATS did not help - the storage files are still 10 GB,
and COUNT took a long time.
Then I truncated the table from the hbase shell. And this fixed the
problem - after UPDATE STATS for each table, everything works fine.

Are there any known issues with SYSTEM.STATS table? Apache Phoenix 4.13.1
with 15 Region Servers.

-----Original Message-----
From: Josh Elser [mailto:[hidden email]]
Sent: Tuesday, September 17, 2019 5:16 PM
To: [hidden email]
Subject: Re: Performance degradation on query analysis

Can you share the output you see from the EXPLAIN? Does it differ between
times it's "fast" and times it's "slow"?

Sharing the table(s) DDL statements would also help, along with the shape
and version of your cluster (e.g. Apache Phoenix 4.14.2 with 8
RegionServers).

Spit-balling ideas:

Could be reads over the SYSTEM.CATALOG table or the SYSTEM.STATS table.

Have you looked more coarsely at the RegionServer logs/metrics? Any obvious
saturation issues (e.g. handlers consumed, JVM GC pauses, host CPU
saturation)?

Turn on DEBUG log4j client side (beware of chatty ZK logging) and see if
there's something obvious from when the EXPLAIN is slow.

On 9/17/19 3:58 AM, Stepan Migunov wrote:
> Hi
> We have an issue with our production environment - from time to time we
> notice a significant performance degradation for some queries. The strange
> thing is that the EXPLAIN operator for these queries takes the same time
> as queries execution (5 minutes or more). So, I guess, the issue is
> related to query's analysis but not data extraction. Is it possible that
> issue is related to SYSTEM.STATS access problem? Any other ideas?
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance degradation on query analysis

Ankit Singhal
Please schedule compaction on SYSTEM.STATS table to clear the old entries.

On Thu, Sep 19, 2019 at 1:48 PM Stepan Migunov <[hidden email]> wrote:
Thanks, Josh. The problem was really related to reading the SYSTEM.STATS
table.
There were only 8,000 rows in the table, but COUNT took more than 10
minutes. I noticed that the storage files (34) had a total size of 10 GB.

DELETE FROM SYSTEM.STATS did not help - the storage files are still 10 GB,
and COUNT took a long time.
Then I truncated the table from the hbase shell. And this fixed the
problem - after UPDATE STATS for each table, everything works fine.

Are there any known issues with SYSTEM.STATS table? Apache Phoenix 4.13.1
with 15 Region Servers.

-----Original Message-----
From: Josh Elser [mailto:[hidden email]]
Sent: Tuesday, September 17, 2019 5:16 PM
To: [hidden email]
Subject: Re: Performance degradation on query analysis

Can you share the output you see from the EXPLAIN? Does it differ between
times it's "fast" and times it's "slow"?

Sharing the table(s) DDL statements would also help, along with the shape
and version of your cluster (e.g. Apache Phoenix 4.14.2 with 8
RegionServers).

Spit-balling ideas:

Could be reads over the SYSTEM.CATALOG table or the SYSTEM.STATS table.

Have you looked more coarsely at the RegionServer logs/metrics? Any obvious
saturation issues (e.g. handlers consumed, JVM GC pauses, host CPU
saturation)?

Turn on DEBUG log4j client side (beware of chatty ZK logging) and see if
there's something obvious from when the EXPLAIN is slow.

On 9/17/19 3:58 AM, Stepan Migunov wrote:
> Hi
> We have an issue with our production environment - from time to time we
> notice a significant performance degradation for some queries. The strange
> thing is that the EXPLAIN operator for these queries takes the same time
> as queries execution (5 minutes or more). So, I guess, the issue is
> related to query's analysis but not data extraction. Is it possible that
> issue is related to SYSTEM.STATS access problem? Any other ideas?
>
Reply | Threaded
Open this post in threaded view
|

Re: Performance degradation on query analysis

Josh Elser-2
Did you change your configuration to prevent compactions from regularly
happening, Stepan?

By default, you should have a major compaction run weekly which would
have fixed this for you, although minor compactions would have run
automatically as well to rewrite small hfiles as you are creating new
one (generating new stats).

On 9/19/19 4:50 PM, Ankit Singhal wrote:

> Please schedule compaction on SYSTEM.STATS table to clear the old entries.
>
> On Thu, Sep 19, 2019 at 1:48 PM Stepan Migunov
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Thanks, Josh. The problem was really related to reading the SYSTEM.STATS
>     table.
>     There were only 8,000 rows in the table, but COUNT took more than 10
>     minutes. I noticed that the storage files (34) had a total size of
>     10 GB.
>
>     DELETE FROM SYSTEM.STATS did not help - the storage files are still
>     10 GB,
>     and COUNT took a long time.
>     Then I truncated the table from the hbase shell. And this fixed the
>     problem - after UPDATE STATS for each table, everything works fine.
>
>     Are there any known issues with SYSTEM.STATS table? Apache Phoenix
>     4.13.1
>     with 15 Region Servers.
>
>     -----Original Message-----
>     From: Josh Elser [mailto:[hidden email] <mailto:[hidden email]>]
>     Sent: Tuesday, September 17, 2019 5:16 PM
>     To: [hidden email] <mailto:[hidden email]>
>     Subject: Re: Performance degradation on query analysis
>
>     Can you share the output you see from the EXPLAIN? Does it differ
>     between
>     times it's "fast" and times it's "slow"?
>
>     Sharing the table(s) DDL statements would also help, along with the
>     shape
>     and version of your cluster (e.g. Apache Phoenix 4.14.2 with 8
>     RegionServers).
>
>     Spit-balling ideas:
>
>     Could be reads over the SYSTEM.CATALOG table or the SYSTEM.STATS table.
>
>     Have you looked more coarsely at the RegionServer logs/metrics? Any
>     obvious
>     saturation issues (e.g. handlers consumed, JVM GC pauses, host CPU
>     saturation)?
>
>     Turn on DEBUG log4j client side (beware of chatty ZK logging) and see if
>     there's something obvious from when the EXPLAIN is slow.
>
>     On 9/17/19 3:58 AM, Stepan Migunov wrote:
>      > Hi
>      > We have an issue with our production environment - from time to
>     time we
>      > notice a significant performance degradation for some queries.
>     The strange
>      > thing is that the EXPLAIN operator for these queries takes the
>     same time
>      > as queries execution (5 minutes or more). So, I guess, the issue is
>      > related to query's analysis but not data extraction. Is it
>     possible that
>      > issue is related to SYSTEM.STATS access problem? Any other ideas?
>      >
>
Reply | Threaded
Open this post in threaded view
|

RE: Performance degradation on query analysis

Stepan Migunov
Thanks Josh,  you are right, we have actually disabled automatic major
compaction. Now we added SYSTEM.STATS to weekly compaction and I hope this
resolve the issue.


-----Original Message-----
From: Josh Elser [mailto:[hidden email]]
Sent: Tuesday, September 24, 2019 6:39 PM
To: [hidden email]
Subject: Re: Performance degradation on query analysis

Did you change your configuration to prevent compactions from regularly
happening, Stepan?

By default, you should have a major compaction run weekly which would have
fixed this for you, although minor compactions would have run automatically
as well to rewrite small hfiles as you are creating new one (generating new
stats).

On 9/19/19 4:50 PM, Ankit Singhal wrote:

> Please schedule compaction on SYSTEM.STATS table to clear the old entries.
>
> On Thu, Sep 19, 2019 at 1:48 PM Stepan Migunov
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Thanks, Josh. The problem was really related to reading the
> SYSTEM.STATS
>     table.
>     There were only 8,000 rows in the table, but COUNT took more than 10
>     minutes. I noticed that the storage files (34) had a total size of
>     10 GB.
>
>     DELETE FROM SYSTEM.STATS did not help - the storage files are still
>     10 GB,
>     and COUNT took a long time.
>     Then I truncated the table from the hbase shell. And this fixed the
>     problem - after UPDATE STATS for each table, everything works fine.
>
>     Are there any known issues with SYSTEM.STATS table? Apache Phoenix
>     4.13.1
>     with 15 Region Servers.
>
>     -----Original Message-----
>     From: Josh Elser [mailto:[hidden email] <mailto:[hidden email]>]
>     Sent: Tuesday, September 17, 2019 5:16 PM
>     To: [hidden email] <mailto:[hidden email]>
>     Subject: Re: Performance degradation on query analysis
>
>     Can you share the output you see from the EXPLAIN? Does it differ
>     between
>     times it's "fast" and times it's "slow"?
>
>     Sharing the table(s) DDL statements would also help, along with the
>     shape
>     and version of your cluster (e.g. Apache Phoenix 4.14.2 with 8
>     RegionServers).
>
>     Spit-balling ideas:
>
>     Could be reads over the SYSTEM.CATALOG table or the SYSTEM.STATS
> table.
>
>     Have you looked more coarsely at the RegionServer logs/metrics? Any
>     obvious
>     saturation issues (e.g. handlers consumed, JVM GC pauses, host CPU
>     saturation)?
>
>     Turn on DEBUG log4j client side (beware of chatty ZK logging) and see
> if
>     there's something obvious from when the EXPLAIN is slow.
>
>     On 9/17/19 3:58 AM, Stepan Migunov wrote:
>      > Hi
>      > We have an issue with our production environment - from time to
>     time we
>      > notice a significant performance degradation for some queries.
>     The strange
>      > thing is that the EXPLAIN operator for these queries takes the
>     same time
>      > as queries execution (5 minutes or more). So, I guess, the issue is
>      > related to query's analysis but not data extraction. Is it
>     possible that
>      > issue is related to SYSTEM.STATS access problem? Any other ideas?
>      >
>