Announcement

Collapse
No announcement yet.

Active Directory go BOOM!!!!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Active Directory go BOOM!!!!

    The entire M$ active directory forest at work just crashed and BURNED, BABY, BURNED!! This of course means that no one can:

    1) Logon if they need to reboot, or just came in.

    2) Access email, since we use Exchange.

    3) Access any network devices, like shared drives.

    4) Use our internal exchange IM service.

    5) Can't use various tech tools.

    Oh yeah, someone's head will roll over this one.

    Jammrock
    “Inside every sane person there’s a madman struggling to get out”
    –The Light Fantastic, Terry Pratchett

  • #2
    I hope you have an alibi
    We have enough youth - What we need is a fountain of smart!


    i7-920, 6GB DDR3-1600, HD4870X2, Dell 27" LCD

    Comment


    • #3
      Backups?
      Main: Dual Xeon LV2.4Ghz@3.1Ghz | 3X21" | NVidia 6800 | 2Gb DDR | SCSI
      Second: Dual PIII 1GHz | 21" Monitor | G200MMS + Quadro 2 Pro | 512MB ECC SDRAM | SCSI
      Third: Apple G4 450Mhz | 21" Monitor | Radeon 8500 | 1,5Gb SDRAM | SCSI

      Comment


      • #4
        Ouch

        But you can still get on MURC to post this
        Why is it called tourist season, if we can't shoot at them?

        Comment


        • #5
          I guess it was a rain forrest.

          Time for a holiday.
          Chief Lemon Buyer no more Linux sucks but not as much
          Weather nut and sad git.

          My Weather Page

          Comment


          • #6
            Considering my job is "Active Directory Planning and Engineering" for a financial institution with ~40,000 employees, I'd be very interested in more details on this. I certainly hope your backups were up to date...



            May want to forward that link to the AD guys
            Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard coated bastards with bastard filling. But I don't find them half as annoying as I find naive, bubble-headed optimists who walk around vomiting sunshine. -- Dr. Perry Cox

            Comment


            • #7
              Rumor has it that they ran a "Microsoft critical update" that the servers didn't like. Or some sort of upgrade because AD servers were randomly rebooting. Well ... the patch made things worse. Within 7 hours of applying the fix the servers began rebooting almost non-stop.

              With all the AD servers out of action from reboots, all AD serverices, like AD look up, authentication for network shares and email, etc went bye-bye.

              Internet on the other hand was unaffected since it doesn't rely on AD authentication.

              It took about 3.5 hours before all services were restored.

              Jammrock
              “Inside every sane person there’s a madman struggling to get out”
              –The Light Fantastic, Terry Pratchett

              Comment


              • #8
                If you could find out which update it was, that would be really handy information to have

                Good thing they managed to fix it without resorting to restoring the entire forest from backup. That involves shutting down all the domain controllers, restoring one from backup in each domain, then formatting and rebuilding the rest of them from scratch and replicating the restored data from the primary. It could take a few days if there are lots of them (we have a total of 22 DCs under our control (root domain and two subdomains), plus another 100+ in other domains that we don't manage). Very nasty process, luckily it hasn't happened to us...
                Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard coated bastards with bastard filling. But I don't find them half as annoying as I find naive, bubble-headed optimists who walk around vomiting sunshine. -- Dr. Perry Cox

                Comment


                • #9
                  Originally posted by agallag
                  If you could find out which update it was, that would be really handy information to have

                  Good thing they managed to fix it without resorting to restoring the entire forest from backup. That involves shutting down all the domain controllers, restoring one from backup in each domain, then formatting and rebuilding the rest of them from scratch and replicating the restored data from the primary. It could take a few days if there are lots of them (we have a total of 22 DCs under our control (root domain and two subdomains), plus another 100+ in other domains that we don't manage). Very nasty process, luckily it hasn't happened to us...
                  You know that sounds like a very nasty weakness of AD. Me thinks Microshaft ought to think of a way of rapidly restoring AD if a normal restore would take days.
                  Chief Lemon Buyer no more Linux sucks but not as much
                  Weather nut and sad git.

                  My Weather Page

                  Comment


                  • #10
                    Possibly this is of a total different order of mnagnitude that such MS product, but when I was a console operator on an Unisys 7200-2 (not sure about the model, 3 yrs ago) mainframe, we had to be able to rebuild from scratch within 60 minutes.... Which we practiced at least quarterly. For critical systems, 3 days downtime is a joke.
                    Join MURCs Distributed Computing effort for Rosetta@Home and help fight Alzheimers, Cancer, Mad Cow disease and rising oil prices.
                    [...]the pervading principle and abiding test of good breeding is the requirement of a substantial and patent waste of time. - Veblen

                    Comment


                    • #11
                      That kind of outage doesn't make 3 days downtime. As soon as the first DC is restored, services will continue, although with seriously reduced performance. That could take a maximum 15-30 minutes per domain, since we do hourly backups of the entire directory to the local drive on every single DC (we do daily, weekly, and monthly backups to tape as well). Once the PDC emulator box is restored in a domain, that domain is once again live and providing authentication. Each subsequent DC you rebuild will improve the performance, until you're fully back to normal.

                      Edit: Also, I haven't heard of this kind of outage on a real production AD forest ever happening outside of a test lab. It's really quite reliable. Our forest has been live for more than two years now with not a minute of downtime. That's right, 100.0% uptime. The only restores we've done so far were as a result of accidental deletion of objects by support people.
                      Last edited by agallag; 27 September 2003, 08:46.
                      Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard coated bastards with bastard filling. But I don't find them half as annoying as I find naive, bubble-headed optimists who walk around vomiting sunshine. -- Dr. Perry Cox

                      Comment


                      • #12
                        Our AD domain has been more or less reliable too with only one major problem in the 3 years it has been running.
                        That was when we lost a DC and it just happened to be running all five FSMO roles - the person who planned the network in the first place forgot to distribute those tasks between two DC's.
                        Oh and it just happened the same server was also running the Global Catalogue for the Exchange Server too.

                        However, we just thransfered three of the FSMO roles over through the GUI tools, forced the other two with "Seize" commands and created a new Global Catlogue on anotehr server.
                        Total downtime around 2hrs and we then spent as long as we liked getting the failed server back up and running - new RAID controller required.

                        Other than that one incident the AD has been absolutely excellent.
                        It cost one penny to cross, or one hundred gold pieces if you had a billygoat.
                        Trolls might not be quick thinkers but they don't forget in a hurry, either

                        Comment


                        • #13
                          ROFLMAO!!!!

                          It turns out that the server reboot was caused by loading some sort of Symantec network security program. After applying the M$ recommeded patches and fixes it only escalated the problem and made things significantly worse.

                          Removing the Symantec product fixed the reboot problem.

                          Jammrock
                          “Inside every sane person there’s a madman struggling to get out”
                          –The Light Fantastic, Terry Pratchett

                          Comment


                          • #14
                            Yeah, that sounds about right Don't they have a test lab for that sort of thing?
                            Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard coated bastards with bastard filling. But I don't find them half as annoying as I find naive, bubble-headed optimists who walk around vomiting sunshine. -- Dr. Perry Cox

                            Comment


                            • #15
                              They do and they did. But the problem took a while before it started, then got progressively worse.
                              “Inside every sane person there’s a madman struggling to get out”
                              –The Light Fantastic, Terry Pratchett

                              Comment

                              Working...
                              X