Announcement

Collapse
No announcement yet.

Oracle 8i performance question - a cry for help!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Oracle 8i performance question - a cry for help!

    Hello Folks

    Got a problem, which is giving me major grief.

    I have an Oracle database, for a live implementation of the software I have overall responsibility for (I am in no way an Oracle Guru, but I know a few bits'n'peices...)

    The problem is this....

    The Database is a fairly large one (sized at around 30GB) with about 200 registered users... We have the box configured (apparently) for 466 concurrent users.

    When I connect > 50 users, the system starts to drag really badly... with 70 users queries start taking about 90seconds (where they took <<2 seconds before).

    I have looked at the load on the processors (circa 60%), but page file writes are quite high.....

    anyone got any ideas?

    I have trebbled the size of the SGA, checked the fragmentation of the extents (they wernt bad) and checked for rougue processess both in oracle and at an OS level.

    I intend to replace the box this instance is on - but it wont happen for at least another 2 months - and I am afraid that that miy just mask a configuration issue.....

    the Box is a Data General, intel tripple processor Xenon, with 1 GB ram and a clarion disk array....

    It is not, as far as I know running ANYTHING else...

    ANY help would be appreciated

    Regards
    RedRed
    Dont just swallow the blue pill.

  • #2
    RedRed, a few times ago I was "chatting" with Oracle for infos about a system I had to set-up.

    For ours standard database, a bit less than 1GB, with 2 read/write access users and about 30 read only users, they recomended a Cluster with 2 special PCs from Oracle, each one with a cost of 50.000€...


    Dunno if the guy from Oracle was a bit "drunk" or what, but...
    Sat on a pile of deads, I enjoy my oysters.

    Comment


    • #3
      For DB that size, you need a huge disk array server or clustered servers. With that many users and data, it can't be helped. Some small things you can try:

      - Add more RAM.
      - Add a second NIC.
      - Move it to a faster network connection.
      - Move it to a big server with a disk array and multiple fast network connections.

      Jammrock
      “Inside every sane person there’s a madman struggling to get out”
      –The Light Fantastic, Terry Pratchett

      Comment


      • #4
        Oracle

        RedRed,

        i agree with Jammrock...

        i maintain an oracle server running on an IBM rs6000/p6 server(AIX 4.3). it has an SSA disk array with bypass with 16 9GB drives(2 for rootvg), 4 533Mhz processors and 512MB memory for each processor. the server configuration also has 4 nics per server(2 servers in this cabinet).

        all users(150) connect through cisco 2900 switches. as of now i have had no latency issues.

        ..knock...knock!

        do you have HACMP(High Availability Clustering for Multi-Processors) or an equivilant running?

        cc

        Comment


        • #5
          I'd suggest that you try to see, if the system is running out of memory. It sure sounds like that.

          Your DB is (IMHO) fairly small, only 30GB, but that 30GB will not fit to memory and it should not. That is why caches are for. Of course, fast disks/disk-arrays can be necessary. There are several basic fallacies when configuring the Oracle caches (SGA). Here are few hints:

          1. Configure SharedPool and DatabaseBlockBuffers so that they are big enough. You want to keep as much data as possible in the memory. The size of the SharedPool depends on the application: If you use lots of PL/SQL or different SQL-statements, you might need larger SharedPool. Remember that configuration of DatabaseBlockBuffers is NOT bytes, but blocks (block = DB_BLOCK_SIZE)

          2. Do NOT allocate too much memory for SGA! This will cause paging, which is, obviously counter-productive! If you have 1 GB RAM, then you can allocate, maybe, 700MB for database block buffers, 100MB for SharedPool and and leave 200MB for OS and session-specific information. You can always show the size of the SGA in SQL*Plus by using command SHOW SGA.

          3. Note that SORT_AREA_SIZE is allocated PER SESSION, not per instance. I have seen Oracle instances where people have configured 10MB SortAreaSize. With ~ 100 session that is 1GB. The size of SortAreaSize depends, of course, on the application.

          4. There are other tricks, but those are the ones I have seen most.


          You should also run the normal statistics-scripts to see, where the bottlenecks are. The order of optimization, however, is always from application to DB to OS to hardware, in that order (ie. application optimization usually has highest effect on performance). Without knowing the statistics-numbers of the DB, it is very hard to estimate, where the problem actually is.

          You can also try to check for locks.

          One thing: I believe that usually when CPU load is 60-70% the box is very heavily loaded, but I think that your box should be able to work with 70 users, depending highly on the nature of your application, ie.: Type of application (OLTP or DW)? Types of users (% of read-only, % of read-write, etc. eg. Some people input data only, some people take reports, some background processes are loading huge amounts of data etc.)? How much data is inserted per day? Etc...

          If you can give more detailed information, I might be able to help more.
          "Time is fun when you are having flies"
          - Kermit the Frog

          "You know it's good when all your friends who know nothing about electronic music say 'what the hell is that crap!?'"
          -hotlizard@discogs.com

          Comment


          • #6
            Hello There - thanks very much, one and all......

            My main problem is that Oracle is 'not my bag', and that currently the system is being maintained by the application developers - (Set up before my time as Project Manager). The problem is 'WHO WATCHES THE WATCHMEN?' I dont believe that they have carried out any reasonable maintainence since they started! We have no formal resource within my organisation for this kind of thing....

            BTW The show SGA returns:

            Total System Global Area 766792356 bytes
            Fixed Size 74404 bytes
            Variable Size 356945920 bytes
            Database Buffers 409600000 bytes
            Redo Buffers 172032 bytes

            the setup parameters are as follows:
            background_dump_dest = /oracle8/admin/secomm/bdump
            compatible = 8.1.7
            control_files = "/app/ost/8.1.7/oradata/parist3/cntlparist302.ctl"
            control_files = "/app/ost/8.1.7/oradata/parist3/cntlparist301.ctl"
            control_files = "/app/ost/8.1.7/oradata/parist3/cntlparist303.ctl"
            core_dump_dest = /oracle8/admin/secomm/cdump
            db_block_buffers = 200000
            db_domain = world
            db_file_multiblock_read_count = 32
            db_files = 250
            db_name = parist3
            dml_locks = 500
            enqueue_resources = 1020
            global_names = TRUE
            hash_area_size = 24020000
            log_archive_dest_1 = "LOCATION=/user4/arch/parist3"
            log_archive_start = FALSE
            log_buffer = 163840
            log_checkpoint_interval = 10000
            max_dump_file_size = 10240
            open_cursors = 300
            optimizer_mode = CHOOSE
            processes = 500
            rollback_segments = ROLLBACK5
            rollback_segments = ROLLBACK1
            rollback_segments = ROLLBACK3
            rollback_segments = ROLLBACK7
            rollback_segments = ROLLBACK9
            rollback_segments = ROLLBACK2
            rollback_segments = ROLLBACK6
            rollback_segments = ROLLBACK8
            rollback_segments = ROLLBACK4
            shared_pool_size = 300000000
            sort_area_retained_size = 100000
            sort_area_size = 1020000
            user_dump_dest = /oracle8/admin/secomm/udump

            I cant get the full init.ora as I dont have root access.... (I dont want it either - cant be blamed for any f*ups!)

            I honestly dont know if these look reasonable -

            I forgot to mention that the DB is istting on a 10 disk Clarrion array - I odnt think there are any problems with its performance... This application is a DCOM three teir application - with next to no useage on the client or the middle teir..... (probably a 'design issue' there!)

            RedRed
            Dont just swallow the blue pill.

            Comment


            • #7
              1 gig does not sound like enough memory.
              Our DBs are around that size and they have 4 proc and 4 gigs each.

              But. it could be your parse to execute ratio.
              What language is the front end written in?
              I am looking for a script to show your parse to exec ratio now.

              Here is a generic link to get you started.

              chuck

              [edit] well, I can't get the #$%^ links to work!?
              Just search the site for "hard parse"
              Last edited by cjolley; 17 February 2003, 06:54.
              Chuck
              秋音的爸爸

              Comment


              • #8
                I'm not having much luck finding stuff here at home.
                When I get to work tomorrow I can get you more specific information.
                chuck
                Chuck
                秋音的爸爸

                Comment


                • #9
                  Thanks CJ I am searching now....

                  Apparently, the Civil Service IT branch have just got on to our IT department - they have been looking at the OS level performance, and found a problem....

                  Apparently the Clarrion array has up to 200 MS access time! OUCH! Its a nine disk RAID 5 array - I was told by the software company it was adequate (though they didnt supply it)

                  that certainly wont be helpful......

                  I have now been tasked with replacing the box - I am going to outsource the advice - I dont know enough about the subject.....

                  I have either IBM or HP I can choose from, on our regional supply contract... I would have had to replace the box sooner or later as Oracle doesnt support DGUX beond Oracle 8i...

                  It will be intresting to find out more... It will probably take me 2 months go get it approved - nothing I can do there....

                  can anyone explain why the system appears to perform OK until I hit a certain number of users - and then simply flop? It doesnt crash exactly, but it performance probably halves per 5 users byond a particular threshold..... Before that it seems fine??? I thought page file useage - especially after hearing about the clarrion, but that wouldnt explain such a dramatic change....

                  Because I am tied into a support contract with the Data General mafia - (not my choice!) - adding more memory is going to cost me - perhaps a couple of tousand UK POUNDS for 4 gig.... the box cant go byond that anyway.....

                  Regards
                  RedRed
                  Dont just swallow the blue pill.

                  Comment


                  • #10
                    Actually, it is probably the paging that is the reason for your problem, but I cannot be sure since I don't have access to your server. (Remember, that pagefile access from disk is like 10000 times slower than memory access!)

                    I went through your initialization parameters and came up with this: The SGA is mostly configured ok, BUT, if I recall correctly, also the parameter HASH_AREA_SIZE is per session, not per instance (I have to check this when I have more time. I only remember that it is badly documented...).

                    If I am right, this means, that your HASH_AREA_SIZE will drain 24020000bytes * 90 users = 2GB! Total memory for your instance is near 3GB!

                    Try this:

                    Monitor your system performance throughout the day. Notice, when the system starts to swap and check how many active sessions there are connected to the DB and what the response times for your queries are. If these match, it is the paging that is killing your performance (and I believe it is). You can also do this backwards: keep monitoring the system performance in the background and immediately when you notice that the queries start slowing down, check what is the paging status and how many sessions there are active in DB. Also try to measure the memory size of Oracle-related processes.

                    If you think it is the paging (and I am right about the HASH_AREA_SIZE), try to reduce the amount of HASH_AREA_SIZE. Put it to eg. 2MB. It will require you to restart the DB, but you can probably do this at morning or other off-hours. If it didn't help, you can put the value back.

                    Hope this helps. If it doesn't, send more information and I'll try to think something else. (One question: what kind of application you are using on your DB?)

                    Edited some typos
                    Last edited by Pyrophos; 17 February 2003, 11:42.
                    "Time is fun when you are having flies"
                    - Kermit the Frog

                    "You know it's good when all your friends who know nothing about electronic music say 'what the hell is that crap!?'"
                    -hotlizard@discogs.com

                    Comment


                    • #11
                      Another thing just came to my mind: RAID-5 configurations are REALLY bad for redo-logs. The reasoning for this was that RAID-5 is ok for random-access, such as normal datafiles. But RAID-5 is very slow when writing lots of sequential data, such as redo-logs.
                      There is a Oracle paper about this, but I was (what a surprise...) unable to find it just now.

                      In really performance-hungry situations, I have seen lots of systems using RAID 0+1 or RAID 1 configurations so that eg. one table is on one mirrored disk. Sure, they are pain to manage, but the performance is excellent and you can tune the performance of the application by moving tables and tablespaces to different disks to reduce IO-contention.

                      In most cases, however, I think that "plug-and-play" disk arrays are just fine.
                      "Time is fun when you are having flies"
                      - Kermit the Frog

                      "You know it's good when all your friends who know nothing about electronic music say 'what the hell is that crap!?'"
                      -hotlizard@discogs.com

                      Comment


                      • #12
                        Checked the HASH_AREA_SIZE. It seems to be per-session, although the explanation was not exact. Thus, 24MB * 90 users = 2GB additional memory...

                        This would also explain the behavior of your system: As more users connect to the DB, they will eat all the memory. Then the system starts to swap. At first, the swapping will not slow down the system much but as more users connect to the DB the swapping will rapidly kill the performance. As long as the system can keep all the SGA & PGA data in memory, the performance will suffer only a little per additional session. But after exceeding the physical memory, the performance degeneration is almost exponential...

                        If I were the DBA of that instance, I would try to reduce the HASH_AREA_SIZE. HASH_AREA_SIZE defaults to 2 x SORT_AREA_SIZE and I believe that default value is adequate for most purposes.
                        "Time is fun when you are having flies"
                        - Kermit the Frog

                        "You know it's good when all your friends who know nothing about electronic music say 'what the hell is that crap!?'"
                        -hotlizard@discogs.com

                        Comment


                        • #13
                          Thanks guys - I think that between you you have hit the nails on the head!

                          I am going through my SAR logs to check page-file useage now,
                          I got TOAD to check the parse-to-hit ratio....

                          I was concerned that the box might not have sufficient RAM, but I can get the developer to tweak the HASH_AREA_SIZE down for the time being (until I get my new server - and I wont make the same mistake!)

                          I REALLY, REALLY appreciate your time and effort.... anyone comming to Belfast can have lunch on me!

                          Regards
                          Red Red
                          Dont just swallow the blue pill.

                          Comment


                          • #14
                            Red,
                            I'm working up some info for you now.
                            But, I did want to ask: Do you have a development server?
                            It sounds like the developers have DBA rights in your production box.
                            This is very dangerous!
                            I am the designer of all of our databases and write almost all of the SQL and PL/SQL that is used against it and I don't have, or want any write or compile rights in our production DB.
                            If you can, keep the box production is on now to accomodate a development instance of your DB when you get the new server.
                            Just an FYI warning.
                            chuck
                            Chuck
                            秋音的爸爸

                            Comment


                            • #15
                              Red,
                              Here is a way to find out if your app is crushing the SQL parser.

                              first:
                              Code:
                              function remove_constants( p_query in varchar2 ) return varchar2 as
                                  l_query long;
                                  l_char  varchar2(1);
                                  l_in_quotes boolean default FALSE;
                              begin
                                  for i in 1 .. length( p_query )
                                  loop
                                      l_char := substr(p_query,i,1);
                                      if ( l_char = '''' and l_in_quotes )
                                      then
                                          l_in_quotes := FALSE;
                                      elsif ( l_char = '''' and NOT l_in_quotes )
                                      then
                                          l_in_quotes := TRUE;
                                          l_query := l_query || '''#';
                                      end if;
                                      if ( NOT l_in_quotes ) then
                                          l_query := l_query || l_char;
                                      end if;
                                  end loop;
                                  l_query := translate( l_query, '0123456789', '@@@@@@@@@@' );
                                  for i in 0 .. 8 loop
                                      l_query := replace( l_query, lpad('@',10-i,'@'), '@' );
                                      l_query := replace( l_query, lpad(' ',10-i,' '), ' ' );
                                  end loop;
                                  return upper(l_query);
                              end;
                              Then:
                              Code:
                              create table t1 as select sql_text from v$sqlarea;
                               
                              alter table t1 add sql_text_wo_constants varchar2(1000);
                               
                              update t1 set sql_text_wo_constants = remove_constants(sql_text);
                              
                              select sql_text_wo_constants, count(*) 
                                from t1 
                               group by sql_text_wo_constants 
                              having count(*) > 100 
                               order by 2 desc;
                              The number of rows you get back shows how many copies of essentialy identical SQL it is parsing. The ones with counts under 100 are no big deal.

                              Your developers can the fix this problem by identifying the program code that is producing these queries.

                              Here's how it works.
                              Each query to your DB is like a little program that the DB engine has to compile before it can be run.
                              This compile overhead (called parsing) usualy takes more computing power than the acutal retrieval of the information.
                              To allow the DB to service a high number of concurrent users, one of the tricks that Oracle uses is to re-use the parsed queries if it can.
                              However, the queries must be identical for this to work.
                              So there is a process called binding that allowes for the use of variables in this small "program".
                              It goes like this inside the database:
                              1, bind 'Fred Blotnic' to variable v_name.
                              2, parse "select value from table where column = :v_var";
                              3, substitute 'Fred Blotnic' for v_var.
                              4, run query.
                              Next user to query "select value from table where column = :v_var" :
                              1, bind 'John Doe' to v_var
                              2, woops! don't need to parse.
                              3, substitute 'John Doe' for v_var.
                              4, run query.
                              Next user queries with the variable hard coded:
                              1, oops! no bind
                              2, parse "select value from table where column = 'Jane Roe'"
                              3, no bind to substitute
                              4, run query.

                              Imagine the parse step as being as costly as having to re-compile Word in order to edit a document.

                              Some tools (particularly web based) are realy bad about submiting queries with the variables hard coded instead of bound.
                              This can also happen to apps that are written with a lot of dynamically (and poorly) generated SQL.
                              The symtoms are just what you have.
                              The app does great untill active users are added to the point where the DB gets overloaded doing parsing.
                              It's as if you started opening lots of applications in Windows, it does fine untill you reach a saturation point then it's performance collapses.
                              This is a very common and incidious bug in DB front end programming against Oracle.
                              Chuck

                              PS I'd be interested to hear what result you get if you run the code I gave you.
                              Our apps use bind variables a lot and so we only got two rows back, both of them off of the ASP stuff.
                              In fact that's what I was working on just the last couple of days

                              [edit] for clarity
                              Last edited by cjolley; 18 February 2003, 14:00.
                              Chuck
                              秋音的爸爸

                              Comment

                              Working...
                              X