Diagnostiek bijnier incidentaloom

Beoordeeld: 07-05-2024

Uitgangsvraag

Wat is de optimale diagnostiek en follow-up van een gevonden bijnierincidentaloom op een CT-scan?

Aanbeveling

Bij de volgende aanbevelingen wordt uitgegaan van >1 cm bijnierincidentaloom in een patiënt zonder oncologische voorgeschiedenis.

Verricht biochemische evaluatie van ieder incidentaloom en laat patiënt klinisch onderzoeken door endocrinoloog en bij afwijkingen vervolgen.
Laat verdere beeldvormende follow-up achterwege, indien het incidentaloom evident benigne karakteristieken heeft (overwegend macroscopisch vet, densiteit < 10 HU of benigne calcificatie) en kleiner is dan 4 cm.
Evalueer of het incidentaloom groei vertoont, wanneer oude beeldvorming beschikbaar is. Bij onveranderde diameter/volume > 1 jaar en benigne kenmerken kan verdere follow-up achterwege worden gelaten.
Beschouw aspecifieke bijniernoduli van 1-2 cm als meest waarschijnlijk benigne. Overweeg bij HU 11-20 en afmeting 1-4 cm aanvullende FDG-PET, meerfasen CT of verricht follow-up CT danwel MRI na 1 jaar. Bij kinderen en zwangeren is MRI te prefereren.
Bespreek in multidisciplinair overleg bij groei meer dan 20% (tenminste 5 mm), HU >20, of diameter > 4 cm en HU 11-20 van de bijnierlaesie.
Beslis samen met de patiënt of overgegaan kan worden tot adrenalectomie wanneer een niet-hormonaal actief incidentaloom groter is dan 4 cm, afhankelijk van klinische symptomen en verdere beeldvormende kenmerken als densiteit, heterogeniteit en necrose.
Bespreek in multidisciplinair overleg bij verdenking op maligniteit, indien het incidentaloom hormonaal actief is en/of chirurgie wordt overwogen. Voer dit overleg bij voorkeur in hierop gespecialiseerde (bijnierschorscarcinoom) centra.
Zie stroomschema Diagnostiek

Overwegingen

Voor- en nadelen van de interventie en de kwaliteit van het bewijs

Uit de search kwam één systematic review naar voren (Sabet, 2016) die elf studies includeerde (Birsen, 2014; Reginelli, 2014; Allan, 2013; Henning, 2009; Meyer, 2006; Mantero, 2000; Sworczak, 2000; Bergestrom, 2000; Kasperlik, 1997; Herrera, 1991; Hubbard, 1989) en twee losse studies (Foo, 2018; Corwin, 2022).

De geïncludeerde studies hebben naar verschillende factoren op een CT-scan gekeken die onderscheid kunnen maken tussen maligniteit en een benigne bijnier incidentaloom. De factoren die de verschillende studies beschreven waren tumor grootte, tumor heterogeniteit, (on)regelmatigheid van de tumor, tumorbegrenzing, densiteit en washout op de CT-scan. Door de opzet van de geïncludeerde studies waarbij de systematic review zowel studies met functionele als niet-functionele tumoren includeerde, het gebruik van verschillende referentie testen en onduidelijkheid rondom test interpretatie en timing van de CT-scan en referentie test, is de bewijskracht laag. Dit betekent dat we niet zeker kunnen zijn over de uitkomsten van de studies.

Wanneer er naar de individuele factoren op de CT-scan wordt gekeken, laat de systematic review van Sabet (2016) een sensitiviteit van 0.91 (95%CI 0.83-0.95) voor tumor grootte van 3 centimeter en 0.91 (95%CI 0.82-0.96) voor tumor grootte van 4 centimeter zien. Voor deze tumor groottes is het betrouwbaarheidsinterval minder groot dan voor de sensitiviteit van tumor groottes 5 en 6 centimeter, 0.78 (0.67-0.87) en 0.74 (0.63-0.82) respectievelijk. Dit suggereert dat de zekerheid over de sensitiviteit van tumor grootte met het afkappunt 3 of 4 centimeter, groter is. Derhalve zou voor patiënten zonder bekende maligniteit in de voorgeschiedenis een bijnierincidentaloom kleiner dan 4 cm met densiteit 10 HU of lager als benigne worden beschouwd. Interessant genoeg adviseren de auteurs in dit geval follow-up, zonder kenmerken te noemen die dermate bij een benigne laesie passen dat verdere follow-up achterwege kan worden gelaten. Voorbeelden hiervan zijn bijvoorbeeld een evident myelolipoom, lipide rijk adenoom of een cyste. Juist bij adequaat ontslag uit follow-up zou een kosten efficiënt beleid kunnen worden gevoerd. Ook de specificiteit voor maligne laesies stijgt sterk vanaf de gekozen drempelwaarde van 4 cm, hetgeen eerdere studies over deze waarde ondersteunde. De gevonden positieve en negatieve Likelyhood Ratio (LR) voor grootte blijken echter bevestigend noch uitsluitend voor maligniteiten, waardoor de auteurs ook reeds aangeven dat andere variabelen meegewogen zouden moeten worden voor een definitieve diagnose. Van deze variabelen lijkt de gemeten densiteit in Hounsfield Units (HU) hiervoor de sterkst bijdragende kandidaat. Voor deze meting dient een Region of Interest geplaatst te worden in een gebied 2/3 dat van het incidentaloom. De morfologie van de laesies zelf (heterogeniteit, marges, irregulaire vorm en calcificaties) toont minder significante Likelihood Ratio’s. Dit wil zeggen dat bijvoorbeeld ook benigne incidentalomen een irregulaire vorm kunnen tonen of intralesionale calcificaties kunnen hebben. Wel komt in deze en ook andere studies naar voren dat een maximale diameter kleiner dan 4 cm in adrenale noduli een dermate lage kans geeft op maligne etiologie, dat men zou kunnen volstaan met follow-up. Dit bij een pretest kans op maligniteit van gemiddeld ongeveer 5% bij patiënten zonder oncologische voorgeschiedenis.

Daarnaast laten de geïncludeerde studies uit de review van Sabet (2016) die kijken naar tumor heterogeniteit een sensitiviteit zien van 0.79, 0.93 en 0.75 respectievelijk. De geïncludeerde studies uit de review van Sabet (2016) die kijken naar tumor densiteit laten een sensitiviteit zien van 1, 0.95 en 1 respectievelijk.

De studie van Foo (2018) die kijkt naar een algoritme om maligniteit te voorspellen neemt twee factoren mee: Tumor grootte en densiteit. Deze studie laat een sensitiviteit van 0.75 zien. Hiervoor werd een Cleveland Clinic risico stratificatie model toegepast. Het uiteindelijke voorkomen van een maligniteit in deze retrospectieve analyse van patiënten zonder maligniteit in de voorgeschiedenis was 8%. Definitieve preoperatieve diagnose van ACC is niet goed mogelijk op basis van cytologie of biochemie alleen, waardoor kenmerken als grootte en densiteit op CT belangrijke onderscheidende variabelen worden om de kans op een maligne proces in te schatten. Foo et al. sluiten aan bij recente richtlijnen welke een afkapwaarde van 4 cm suggereren voor chirurgische behandeling. Hormonaal actieve tumoren werden in deze studie geexcludeerd. Punten werden toegekend voor grootte (respectievelijk 1, 2 of 3 punten voor diameter >4, 4-6 of > 6 cm) en voor densiteit (respectievelijk 1, 2 of 3 punten voor densiteit op non-contrast CT <10 HU, 10-20 HU of >20 HU). Er werden geen maligniteiten geïdentificeerd bij scores 2 of 4. Een ACC liet echter een score van 3 zien, in scores 5 en 6 bestond een 27 % incidentie van ACC. Hoogste sensitiviteit (75%) en specificiteit (87 %) werd gevonden bij een afkapwaarde van 5. Daarnaast werd een associatie met een verhoogd risico op maligniteit gevonden bij heterogeniteit van het incidentaloom (p=0.0016) en een relatief washout percentage lager dan 40 % (p=0.0178). In verband met onjuiste classificatie van een ACC volgens het stratificatiemodel stelden auteurs voor het algoritme uit te breiden met een additionele parameter, waarvoor relatieve washout goed geschikt zou kunnen zijn. Dit komt overeen met de huidig voorgestelde Europese richtlijnen ESE en ENSAT, waarin bij niet discriminatoire kenmerken op blanco of post-contrast CT een bijnier-specifiek washout CT wordt geadviseerd (Fassnacht, 2016). Timing van deze CT is dan afhankelijk van de initiële grootte van de laesie.

Indien relatieve washout als factor wordt meegerekend, dient men wel in acht te nemen dat tumoren met grote gebieden van necrose of inliggende hemorrhagie een minder betrouwbaar contrast washout resultaat geven.

De studie van Corwin (2022) kijkt naar diagnostische accuratesse van factoren tumor grootte en washout voor onderscheid maken tussen maligniteit en benigniteit van de nodules. Deze studie rapporteerde een sensitiviteit van 0.75 (95%CI 0.70-0.80) zien voor de factor >60% washout en een sensitiviteit van 0.77 (95%CI 0.72-0.82) voor de factoren >60% washout en < 4 centimeter tumor grootte. De negatief voorspellende waarde die deze studie rapporteerde was voor beide (gecombineerde) factoren erg laag, 4.8% en 1.4% respectievelijk.

Ondanks de onzekerheid met betrekking tot de resultaten, komen de factoren tumor grootte (< 4 centimeter), tumor densiteit en tumor heterogeniteit wel vaker terug als sensitief als het gaat om onderscheid maken tussen maligniteit en benigniteit van een bijnier incidentaloom.

Waarden en voorkeuren van patiënten (en evt. hun verzorgers)

Wanneer evident sprake is van een benigne afwijking of maligne afwijking op basis van de gevonden kenmerken van het incidentaloom op CT, zal de verdenking hierop vaak de doorslag geven voor te voeren beleid. Het is uiteraard belangrijk eventuele follow-up beeldvorming, mogelijke chirurgie of juist ontslaan uit follow-up goed met patiënten te bespreken. Wanneer het een incidentaloom met aspecifieke kenmerken betreft, kan follow-up beeldvorming helpen in het stellen van een diagnose. In het geval dat meerdere modaliteiten mogelijk zijn, is het goed deze mogelijkheden in informed consent met de patiënt te bespreken. In sommige patiëntgroepen (kinderen, zwangeren) is het verstandig de stralingsbelasting via CT of PET zoveel mogelijk te beperken en voor een MRI scan te opteren als vervolg. Echter, sommige patiënten houden deze langere scanduur niet vol, of worden sterk gehinderd door claustrofobie. Andere opties tot beeldvorming kunnen dan wenselijker zijn.

Kosten (middelenbeslag)

Omdat de prevalentie van bijnierincidentalomen groot is en beeldvorming middels CT en/of MRI nog steeds toeneemt, zijn er voor zorgkosten belangrijke implicaties in het kiezen van een doelmatige strategie die onnodige diagnostiek of operaties voorkomt en toch de zeldzamere maligniteiten tijdig diagnosticeert. Daarnaast zou het voorkomen van onnodige stralenbelasting bij follow-up CT scans een rol moeten spelen in het kiezen van de juiste vervolg strategie. Chomsky-Higgins (2018) hebben gekeken naar de kosten en gezondheidsuitkomsten (in QALYs) voor verschillende surveillance strategieën bij patiënten met non-functionele incidentalomen kleiner dan 4 cm. Boven een 0.7% prevalentie voor adrenocorticaal carcinoom werd hierbij een eenmalige surveillance voor incidentalomen het meest effectief gevonden. Meer frequente follow-up leverde geen significante verbetering in Quality Adjusted Life Years (QALYs) en leidden tevens tot hogere kosten, hogere cumulatieve stralingsdoses en meer fout-positieve testuitslagen. Bij een significant percentage benigne laesies in incidentalomen onder 4 cm kan derhalve verminderde surveillance gesuggereerd worden. Voor oudere patiënten boven 60 jaar rapporteerden de auteurs zelfs afzien van verdere surveillance als meer kosteneffectieve strategie. Biochemische evaluatie kan onderdeel vormen van de eenmalige follow-up voor aspecifieke bijniernoduli. De ESE en ENSAT richtlijnen (2016) sluiten relatief reeds aan bij het beperkt houden van surveillance door een aanbeveling bij hormonaal inactieve laesies met densiteit lager dan 10 HU ongeacht grootte van verdere surveillance af te zien. Chomsky-Higgins (2018) gingen in hun studie intentioneel uit van een model dat de prevalentie van ACC hierbij overschat. Een lagere prevalentie is geassocieerd met geen surveillance als optimale strategie. Bij het bekende lage percentage voorkomen van maligniteiten onder incidentalomen in Nederland kunnen uitkomsten van de studie redelijkerwijs vertaald worden naar de Nederlandse zorg. In de toekomst zouden grote datasets verder kunnen helpen in het stratificeren van risicogroepen voor optimale vormgeving van follow-up.

Aanvaardbaarheid, haalbaarheid en implementatie

De geformuleerde aanbevelingen in deze module helpen om kenmerken op CT te identificeren welke differentiatie tussen benigne bijnierlaesies en maligniteiten vergemakkelijken. De werkgroep gaat hierbij uit van voldoende toegang tot CT in de algemene Nederlandse zorg. Parameters als grootte boven 4 cm, densiteitsmeting en evaluatie van washout zijn met de juiste instructies makkelijk toepasbaar. Gerichte aanbevelingen voor verdere follow-up moeten onnodige surveillance en onnodige operaties in deze grote groep voorkomen. De genoemde beeldvormende modaliteit en technieken zijn ook in niet-gespecialiseerde centra toepasbaar. Betreffende de meer-fasen bijnier CT zijn er in de literatuur sterk uiteenlopende protocollen te vinden, waardoor de beschikbare literatuur matig met elkaar te vergelijken is. In de aanbeveling worden meer voorkomende tempi na contrastbolus gehanteerd. Op basis van veel voorkomende selectiebias in oudere studies en gelimiteerde waarde van absolute washout boven 60% in de studie van Corwin, dient het kenmerk washout volgens de werkgroep met meer voorzichtigheid te worden geïnterpreteerd dan de kenmerken grootte en densiteit. Foo et al. rapporteren echter wel significante associatie van verhoogde kans op maligne laesies bij relatieve washout percentages lager dan 40%. De Europese ESE/ENSAT richtlijnen maken melding van de mogelijkheid tot het verrichten van washout CT bij persisterende onduidelijkheid op initiële CT, maar betrachten ook voorzichtigheid over de waarde van washout profielen, gebaseerd op de beschikbare literatuur (Fassnacht, 2016). Volgens de werkgroep zou de toegang tot (blanco) CT voor initiële risicostratificatie voor eenieder in Nederland gewaarborgd moeten zijn. Voor casus die overleg in een multidisciplinair team behoeven zou ook voldoende haalbaarheid moeten zijn in algemene Nederlandse ziekenhuizen. Bij hoge verdenking op ACC verdient het de voorkeur van de werkgroep betreffende casus vanwege het zeldzame voorkomen hiervan, te verwijzen naar hierin gespecialiseerde centra. De in deze richtlijn genoemde afkappunten op CT maken reeds langere tijd deel uit van het repertoire van de opleiding Radiologie. Betere bekendheid met geldende richtlijnen zou kunnen helpen de toepassingen in standaard verslaglegging te vergroten.

Hoewel de toegankelijkheid van CT als beeldvormende modaliteit, en de hoge spatiele resolutie dit vaak de gekozen scan maakt om bijnierlaesies te karakteriseren, kunnen ook andere modaliteiten een toegevoegde waarde hebben in deze evaluatie. Voorbeelden hiervan zijn MRI en FDG-PET. Voor kinderen en zwangeren is MRI zelfs als eerste keus scan te overwegen, gezien men onnodige stralingsdosis wil voorkomen. Signaalverlies door een chemical shift artefact op een uit-fase MRI scan kan helpen een vetrijk adenoom te diagnosticeren. Visueel vergelijk van in- en uit-fase scans is vaak voldoende om een mogelijk adenoom op te merken. Ook hier zijn echter fout-positieve resultaten beschreven, waarbij vethoudende metastasen van bijvoorbeeld een hepatocellulair carcinoom of een renaal cel carcinoom een adenoom kunnen nabootsen. De sensitiviteit van MRI blijkt hoger bij HU-waarden onder de 30.

Rationale van de aanbeveling: weging van argumenten voor en tegen de interventies

Bij detectie van een bijnierincidentaloom is het van belang te duiden of men met een benigne of maligne entiteit te maken heeft, ook om onnodige en kostbare follow-up te voorkomen wanneer het een benigne laesie betreft. Dergelijke follow-up kan ook leiden tot angst en onzekerheid bij de patiënt. Een zeldzame maligniteit als ACC heeft echter vaak een slechte prognose en dient zonder uitstel gedetecteerd en geopereerd te worden. Vaak heeft de radioloog bij afwezigheid van hieraan gerelateerde klachten, een poortwachtersfunctie in de detectie en adequate beschrijving van deze laesies. Kenmerken als grootte, densiteit en -indien verricht- washout dienen dan ook nauwkeurig in het radiologisch verslag genoemd te worden. Indien beleid rondom een incidentaloom bediscussieerd dient te worden, verdient het de voorkeur dit in een multidisciplinair team te doen. Essentieel hierin zijn de endocrinoloog, chirurg, radioloog en patholoog.

Omdat veruit de meeste bijnierincidentalomen vetrijke adenomen betreffen, zal vaak kunnen worden volstaan met een blanco CT opname. Veel van deze laesies worden al gedetecteerd op index beeldvorming, welke om andere klinische redenen wordt vervaardigd, bijvoorbeeld een CT thorax. Een densiteit < 10 HU is hoog specifiek voor het lipide rijk adenoom (98%), waardoor men, zeker bij beperkte grootte mag uitgaan van een benigne laesie. Ongeveer 30% van de adenomen is echter lipide arm, waardoor er in densiteit een overlap kan bestaan met andere laesies, zoals ACC en feochromocytoom. Bij kleine bijniernoduli (1-2 cm) kan worden overwogen deze middels een enkele scan (6-12 maanden na detectie) te vervolgen om te controleren op eventuele groei. Bij grotere bijnierlaesies en onzekere kenmerken verdient beeldvormende follow-up op kortere termijn middels een meer-fasen CT de voorkeur. Hoewel protocollen hiervoor verschillen bestaat de meest geijkte methode hiervoor uit drie scanfasen: een blanco CT, een CT 60-90 seconden na contrastbolusinjectie en een late contrastfase na 15 minuten. Eventuele washoutprofielen kunnen ondersteunend zijn in het bevestigen danwel uitsluiten van een maligniteit. Men dient echter rekening te houden met het feit dat de waarde van washout bij homogene noduli > 10 HU in patiënten zonder een bekende maligniteit in de voorgeschiedenis relatief beperkt is (Corwin et al. 2022). Eerdere studies die over washout van bijnierincidentalomen rapporteerden lijken onderhevig aan een selectie bias door de inclusie van ook patiënten met bekende metastasen. Hierdoor is de gerapporteerde prevalentie van maligne noduli mogelijk niet representatief voor de prevalentie van dergelijke maligniteiten in de algehele populatie (zonder kanker). Daarnaast is in kleinere noduli de waarde van washout beperkter en het risico op foute meetwaarden groter. Ook Sabet et al. rapporteren lage positieve en negatieve Likelihood Ratios voor washout van bijnier noduli. De werkgroep beschouwt multi-fase washout CT derhalve als een mogelijkheid tot vervolg beeldvorming, maar erkent dat deze methode beperkingen heeft, voor zowel relatieve als absolute contrast washout.

Recentelijk is de waarde van FDG-PET in de risicostratificatie voor bijnierincidentalomen erkend. Salgues et al. (2021) rapporteren een sensitiviteit van 90%, een specificiteit van 92.6%, een PPV van 69.2% en een NPV van 98%, waarmee wordt ondersteund dat afwezigheid van verhoogde FDG-uptake maligniteiten goed kan uitsluiten. Uit deze en andere studies blijkt echter ook de mogelijkheid tot fout-positieven, waardoor conclusies over FDG-PET beelden preferentieel bezien moeten worden in combinatie met uitslagen van andere diagnostische testen. Voor zowel MRI als FDG-PET geldt dat gezien de lage aantallen in gerapporteerde studies de bewijskracht nog relatief lager ligt en de werkgroep aanraadt uitslagen hiervan in een shared decision proces te bespreken. Deze technieken kunnen wel een alternatief zijn voor wash-out CT scans bij aspecifieke incidentalomen groter dan 1 cm. Karakteristieken van deze overige beeldvormende technieken vallen echter buiten het bestek van deze module en uitgangsvraag.

In een recente update van de ESE richtlijn van Fassnacht et al. (2023) wordt meer gewicht toegekend aan een densiteit onder de 10 HU, en is het afkappunt van 4 cm in diameter minder leidend. De werkgroep onderstreept deze aanbevelingen. Wel kan de grens van 4 cm nog gebruikt worden om patiëntgroepen te stratificeren met bijniernoduli welke een densiteit tonen van 11-20 HU. Bij de kleinere noduli van 1 tot 4 cm kan direct aanvullende beeldvorming in de vorm van FDG-PET, meerfasen CT of MRI helpen te differentieren. De keuze voor het type beeldvorming kan afhankelijk van lokale ervaring en voorkeur. Meer dan 90% incidentalomen in deze groep zijn ook benigne. Additioneel kan follow-up CT of MRI worden ingezet na 6-12 maanden. Bij groei van meer dan 20% (tenminste meer dan 5 mm) wordt overleg in een MDO geadviseerd. Daarentegen kan bij afwezigheid van groei de patiënt worden ontslagen uit beeldvormende follow-up. De bewijskracht voor aanhoudende interval follow-up geadviseerd in de nieuwe Europese richtlijn is laag. Rekening houdend met stijgende zorgkosten en toekomstig mogelijk beperktere scancapaciteit in de Nederlandse situatie vindt de werkgroep het derhalve te verdedigen eventuele follow-up bij groei minder dan 20% kritisch te bezien, en waar nodig te overleggen binnen een MDO.

Hoewel specifieke endocrinologische work-up buiten het bestek van deze module valt vindt de werkgroep het belangrijk te benadrukken bij iedere patiënt met een incidentaloom groter dan 1 cm biochemische evaluatie te verrichten en klinisch onderzoek te doen naar tekenen van hormonale overproductie. Bij hormonaal actieve incidentalomen en klinische afwijkingen wordt bespreking binnen een MDO geadviseerd, wanneer chirurgie wordt overwogen. Bij milde autonome cortisol secretie (MACS), gedefinieerd als cortisol concentratie boven 50nmol/L na een 1 mg Dexamethason suppressie test zonder klinisch kenmerken van Cushing, wordt vervolg door de endocrinoloog geadviseerd.

Aangezien in meerdere gevallen initiële detectie van de betreffende incidentalomen plaats zal vinden op een CT na contrast (meestal portoveneuze contrastfase) zullen grootte en densiteit bepalen of verder vervolg in de vorm van een specifieke vervolgscan nodig zal zijn. De radioloog kan hierbij verder adequate zorg helpen stroomlijnen door gebruik te maken van gestandaardiseerde verslaglegging (zie module ‘Radiologieverslag’), waarin duidelijke vermelding van kenmerken die een maligniteit uitsluiten of meer waarschijnlijk maken, van belang zijn.

Onderbouwing

Achtergrond

Op dit moment is er nog geen Nederlandse richtlijn beschikbaar die richting kan geven aan beleid en follow up met betrekking tot het bijnier incidentaloom. De prevalentie van gevonden incidentalomen in de populatie is aanzienlijk (tot 0.35-9% van alle abdominale CT scans, zelfs oplopend tot 10% in de oudere bevolking) en kan bovendien stijgen bij toename van beeldvorming (Sabet, 2016). Gezien het feit dat zowel beeldvormende studies nog steeds toenemen en dat er sprake is van toenemende vergrijzing, is de verwachting dat ook het aantal gevonden bijnierincidentalomen zal toenemen. Overdiagnostiek en overbehandeling van onterecht vermoedde maligniteit ligt hierbij op de loer. Definitieve preoperatieve diagnose van bijnierschorscarcinoom (ACC) is niet goed mogelijk enkel op basis van cytologie of biochemische testen. Het is daarom belangrijk aanknopingspunten voor een behandel- of vervolgbeleid te hebben op beeldvorming. In de literatuur worden specifieke criteria genoemd op CT en MRI welke meer voorspellend zijn voor een eventuele maligniteit. Deze module tracht de vraag te beantwoorden, welke ‘imaging features’ op CT belangrijk zijn om te onderscheiden bij het incidentaloom. Immers, tijdig uitsluiten van bijvoorbeeld een bijnierschorscarcinoom is van groot belang. Zo kan de groep patiënten met incidentalomen met suspecte kenmerken sneller worden gevonden en kan onnodige follow-up beeldvorming worden voorkomen bij incidentalomen met gunstige kenmerken.

Conclusies

No GRADE

No evidence was found regarding factors on CT scan on overall survival in patients with an adrenal incidentaloma found on the CT scan.

Source: -

Low GRADE

The evidence suggests there is low confidence in the sensitivity regarding factors tumor size, tumor heterogeneity, tumor irregularity, tumor margin, density and washout on a CT-scan in differentiating malignancy from benignancy in patients with an adrenal incidentaloma.

Source: Sabet, 2016; Foo, 2018; Corwin, 2022

Low GRADE

The evidence suggests there is low confidence in the negative predictive value regarding factors tumor size and washout on a CT-scan in differentiating malignancy from benignancy in patient with an adrenal incidentaloma.

Source: Corwin, 2022

Low GRADE

The evidence suggests there is low confidence in the specificity regarding factors tumor size, tumor heterogeneity, tumor irregularity, tumor margin, density and washout on a CT-scan in excluding malignancy from benignancy in patients with an adrenal incidentaloma.

Source: Sabet, 2016; Foo, 2018; Corwin, 2022

Low GRADE

The evidence suggests there is low confidence in the positive predictive value regarding factors tumor size and washout on a CT-scan in differentiating malignancy from benignancy in patient with an adrenal incidentaloma.

Source: Corwin, 2022

Low GRADE

The evidence suggests there is low confidence in the area under the curve regarding factors tumor size and density on a CT-scan in differentiating malignancy from benignancy in patient with an adrenal incidentaloma.

Source: Foo, 2018

Samenvatting literatuur

Description of studies

Sabet (2016) included 36 cohort studies in the systematic review however, regarding the scope of this review, only eleven studies which discussed patients without prior history of malignancy, were included (Birsen, 2014; Reginelli, 2014; Allan, 2013; Henning, 2009; Meyer, 2006; Mantero, 2000; Sworczak, 2000; Bergestrom, 2000; Kasperlik, 1997; Herrera, 1991; Hubbard, 1989). The cohort studies were included if the CT scan was discussed as diagnostic test, a gold standard test (operation, biopsy, Fine Needle Aspiration (FNA) or follow-up for more than six months) was performed, full explanation of the imaging procedure was present and a clear description of criteria for the index test with accepted thresholds was present. In total 1985 patients were included.

The included studies in the review described factors which diagnosed malignancy of the adrenal incidentaloma on the CT scan. Eleven studies described the factor size, three studies described the factor mass appearance and one study described the factor density. There were different reference tests used: Operation, follow-up, biopsy and FNA.

Sabet (2016) reported pooled estimates for sensitivity, specificity, positive and negative likelihood ratios for size cut-offs of the adrenal gland. Sabet (2016) also reported sensitivity, specificity, positive and negative likelihood ratios for different appearance characteristics and for different densities of the adrenal masses. Interestingly, they included a group of patients with a known malignancy.

Foo (2018) conducted a retrospective analysis with prospectively collected data from the Endocrine Surgery Database. Patients referred for evaluation of an AI in the period between 2004 and 2014 were included. Patients with symptoms presenting for investigation of adrenal tumors and patients with known extra-adrenal primary cancer screened for metastatic disease, were excluded. In total 96 patients were included in the study with a median age of 59 years and mean tumor diameter of 34mm. Foo (2018) performed a univariate analysis of factors associating with malignancy. Secondary, the diagnostic accuracy of the Scaled Score Algorithm, developed by the research group of the Cleveland Clinic, was measured. The Scaled Score Algorithm contains two factors: Tumor size and tumor density (Birsen, 2014). Both factors are scored. Tumor size <40 mm, between 40 and 60 mm or > 60 mm scored 1, 2 or 3 respectively. Tumor density on non-contrast CT <10 HU, between 10 and 20 HU and >20 HU scored 1, 2 or 3 respectively. Sum scores were calculated.

The reference test comprised histopathological examination for surgical cases or follow-up for at least six months for non-surgical cases.

The overall prevalence of malignancy in the study of Foo (2018) was 8%.

Foo (2018) calculated sensitivity, specificity and area under the ROC curve for the total algorithm score of 5 or higher.

Corwin (2022) performed a retrospective cohort study including data from six institutions in the United States between 2003 and 2017. Patients with the age of eighteen years or older who underwent an adrenal washout CT examination were included. In total 336 nodules in 299 patients were included in the analysis. There were no patient characteristics reported. The diagnostic accuracy of the factors nodule size (<4 cm versus ≥ 4 cm) and washout (presence versus absence of washout ≥ 60%) on CT was calculated.

The reference standard comprised different tests. At first, any available pathological specimen was used as reference standard. In absence of pathological specimens, an abdominal CT, chest CT, lumbar spine CT, lumbar spine MRI or PET/CT examinations performed at least one year before or after the washout CT examinations were used as reference standard. In patients with no pathology or image follow-up, medical records were reviewed to identify clinical notes.

The overall prevalence of malignancy in the study of Corwin (2022) was 1.5%.

Corwin (2022) calculated sensitivity, specificity, positive and negative predictive value of the factors washout and nodule size on CT.

It should be noted that the negative predictive value in this study described the predictive value of presence of adrenal malignancy. The positive predictive value described the predictive value of presence of benign nodules. A washout >60% and a nodule size of <4 cm were the chosen characteristics for absence of disease.

Results

Overall survival

No studies reported overall survival.

Diagnostic accuracy

Three studies reported diagnostic accuracy (Sabet, 2016; Foo, 2018; Corwin, 2022).

Sensitivity

All three studies reported sensitivity (Sabet, 2016; Foo, 2018; Corwin, 2022). The systematic review of Sabet (2016) reported sensitivity of different factors on CT: Tumor size, mass appearance, tumor density. Foo (2018) reported sensitivity of the Scaled Score Algorithm which includes the factors tumor size and tumor density. Corwin (2022) reported sensitivity of washout on CT (in combination with nodule size).

All results regarding sensitivity are summarized in table 1.

Table 1. Sensitivity regarding factors on CT for diagnosis of adrenal malignancy (Sabet, 2016; Foo, 2018; Corwin, 2022)

Study	Factor(s)	Sensitivity (95%CI)
Sabet (2016)	Cut-off 3 cm	0.91 (0.83-0.95) ^a
	Cut-off 4 cm	0.91 (0.82-0.96) ^a
	Cut-off 5 cm	0.78 (0.67-0.87) ^a
	Cut-off 6 cm	0.74 (0.63-0.82) ^a
	Tumor heterogeneity (3 studies included)	0.79 0.93 0.75
	Tumor irregularity (2 studies included)	0.41 0.50
	Tumor rough margin (1 study included)	0.56
	10 HU density (1 study included)	1
	16 HU density (1 study included)	0.95
	20 HU density (1 study included)	1
Foo (2018)	Scaled score algorithm (tumor size and tumor density)	0.75
Corwin (2022)	≥ 60% washout	0.75 (0.70-0.80) ^b
Corwin (2022)	≥ 60% washout and nodule size < 4 cm	0.77 (0.72-0.82) ^b
^a Pooled data from SR; ^b diagnostic accuracy in nodules

Negative Predictive Value (NPV)

One study reported NPV regarding factors on CT for diagnosis of adrenal malignancy (Corwin, 2022). Corwin (2022) reported a NPV for ≥ 60% washout on CT of 4.8% (95%CI 3.0-7.5) and a NPV for ≥ 60% washout and nodule size < 4 cm on CT of 1.4% (95%CI 1.1-1.8).

Specificity

All three studies reported specificity (Sabet, 2016; Foo, 2018; Corwin, 2022). The systematic review of Sabet (2016) reported specificity of different factors on CT: Tumor size, mass appearance, tumor density. Foo (2018) reported specificity of the Scaled Score Algorithm which includes the factors tumor size and tumor density. Corwin (2022) reported specificity of washout on CT (in combination with nodule size).

All results regarding sensitivity are summarized in table 2.

Table 2. Specificity regarding factors on CT for diagnosis of adrenal malignancy (Sabet, 2016; Foo, 2018; Corwin, 2022)

Study	Factor(s)	Specificity (95%CI)
Sabet (2016)	Cut-off 3 cm	0.44 (0.28-0.62) ^a
	Cut-off 4 cm	0.71 (0.55-0.83) ^a
	Cut-off 5 cm	0.82 (0.65-0.91) ^a
	Cut-off 6 cm	0.85 (0.69-0.94) ^a
	Tumor heterogeneity (3 studies included)	0.71 1 0.78
	Tumor irregularity (2 studies included)	0.93 0.98
	Tumor rough margin (1 study included)	0.90
	10 HU density (1 study included)	0.65
	16 HU density (1 study included)	1
	20 HU density (1 study included)	0.81
Foo (2018)	Scaled score algorithm (tumor size and tumor density)	0.87
Corwin (2022)	≥ 60% washout	0.80 (0.28-0.99) ^b
Corwin (2022)	≥ 60% washout and nodule size < 4 cm	1 (0.02-1) ^b
^a Pooled data from SR; ^b diagnostic accuracy in nodules

Positive Predictive Value (PPV)

One study reported PPV regarding factors on CT for diagnosis of adrenal malignancy (Corwin, 2022). Corwin (2022) reported a PPV for ≥ 60% washout on CT of 99.6 percent (95%CI 97.9-99.9) and a PPV for ≥ 60% washout and nodule size < 4 cm on CT of 100% (95%CI not available).

Area under the ROC curve

One study reported Area under the ROC curve (AUC-ROC curve) for the diagnostic accuracy of the Scaled Score Algorithm including the factors tumor size and tumor density (Foo, 2018). Foo (2018) reported an AUC-ROC curve of 0.81 (95%CI 0.52-1.00).

Level of evidence of the literature

The level of evidence regarding the outcome measure sensitivity was downgraded to low GRADE because of study limitations (-1; risk of bias regarding possible selection bias and use of different reference standards in the studies), applicability (-1; bias due to indirectness because the systematic review included study populations with functional and non-functional tumors) and number of included patients (-1; imprecision because reported confidence intervals are wide intervals).

The level of evidence regarding the outcome measure negative predictive value was downgraded to low GRADE because of study limitations (-2; risk of bias regarding blinded test interpretation, use of different reference standards and flow and timing) and number of included patients (-1; imprecision because of small sample size).

The level of evidence regarding the outcome measure specificity was downgraded to low GRADE because of study limitations (-1; risk of bias regarding possible selection bias and use of different reference standards in the studies), applicability (-1; bias due to indirectness because the systematic review included study populations with functional and non-functional tumors) and number of included patients (-1; imprecision because of wide confidence intervals).

The level of evidence regarding the outcome measure positive predictive value was downgraded to low GRADE because of study limitations (-2; risk of bias regarding blinded test interpretation, use of different reference standards and flow and timing) and number of included patients (-1; imprecision because of small sample size).

The level of evidence regarding the outcome measure area under the ROC curve was downgraded to low GRADE because of study limitations (-2; risk of bias regarding unclear methods of index and reference test interpretation and flow and timing) and number of included patients (-1; imprecision because of small sample size).

Zoeken en selecteren

A systematic review of the literature was performed to answer the following question:

What is the diagnostic accuracy and effect on overall survival of a diagnostic model or multiple diagnostic factors on CT scan to diagnose malignancy in patients with an adrenal incidentaloma discovered on a CT?

P (Patients)	Patients with an incidentaloma suspected of malignancy discovered on a CT-scan and without prior history of malignancy
I (Intervention)	Diagnostic model or multiple diagnostic factors on CT-scan to diagnose malignancy of the adrenal incidentaloma
C (Control)	No use of a diagnostic model
R (Reference)	Histologic or pathological examination of the removed adrenal gland or follow-up (clinical or imaging)
O (Outcomes)	Overall survival, diagnostic accuracy (sensitivity, specificity, positive predictive value, negative predictive value, area under the ROC curve)

Relevant outcome measures

The guideline development group considered overall survival, sensitivity and negative predictive value as a critical outcome measure for decision making and specificity and positive predictive value and clinical outcomes as an important outcome measure for decision making.

A priori, the working group did not define the outcome measures listed as above but used the definitions used in the studies.

The working group defined a maximum of ten patients per 1000 false negative as clinically (patient) important.

The working group defined the following difference as minimal clinically (patient) important difference regarding overall survival: An effect of >5% or >3% combined with HR<0.70 was considered clinically relevant (BOM, 2018)

Search and select (Methods)

The databases Medline (via OVID) and Embase (via Embase.com) were searched with relevant search terms until 18-8-2022. The detailed search strategy is depicted under the tab Methods. The systematic literature search resulted in 218 hits. Studies were selected based on the following criteria:

The study population had to meet the criteria as defined in the PICRO;
The index test had to be as defined in the PICRO;
One or more reported outcomes had to be as defined in the PICRO;
Research type: Systematic review, randomized-controlled trial, observational cohort study, cross-sectional study
Articles written in English or Dutch

21 studies were initially selected based on title and abstract screening. After reading the full text, eighteen studies were excluded (see the table with reasons for exclusion under the tab Methods), and three studies were included.

Results

Three studies were included in the analysis of the literature, one systematic review and two individual studies. Important study characteristics and results are summarized in the evidence table. The assessment of the risk of bias is summarized in the risk of bias tables.

Referenties

Allan BJ, Thorson CM, Van Haren RM, Parikh PP, Lew JI. Risk of concomitant malignancy in hyperfunctioning adrenal incidentalomas. J Surg Res. 2013 Sep;184(1):241-6. doi: 10.1016/j.jss.2013.03.032. Epub 2013 Mar 31. PMID: 23562276.
Bergström M, Juhlin C, Bonasera TA, Sundin A, Rastad J, Akerström G, Långström B. PET imaging of adrenal cortical tumors with the 11beta-hydroxylase tracer 11C-metomidate. J Nucl Med. 2000 Feb;41(2):275-82. PMID: 10688111.
Birsen O, Akyuz M, Dural C, Aksoy E, Aliyev S, Mitchell J, Siperstein A, Berber E. A new risk stratification algorithm for the management of patients with adrenal incidentalomas. Surgery. 2014 Oct;156(4):959-65. doi: 10.1016/j.surg.2014.06.042. PMID: 25239353.
Chomsky-Higgins K, Seib C, Rochefort H, Gosnell J, Shen WT, Kahn JG, Duh QY, Suh I. Less is more: cost-effectiveness analysis of surveillance strategies for small, nonfunctional, radiographically benign adrenal incidentalomas. Surgery. 2018 Jan;163(1):197-204. doi: 10.1016/j.surg.2017.07.030. Epub 2017 Nov 9. PMID: 29129360.
Corwin MT, Badawy M, Caoili EM, Carney BW, Colak C, Elsayes KM, Gerson R, Klimkowski SP, McPhedran R, Pandya A, Pouw ME, Schieda N, Song JH, Remer EM. Incidental Adrenal Nodules in Patients Without Known Malignancy: Prevalence of Malignancy and Utility of Washout CT for Characterization-A Multiinstitutional Study. AJR Am J Roentgenol. 2022 Nov;219(5):804-812. doi: 10.2214/AJR.22.27901. Epub 2022 Jun 22. PMID: 35731098.
Fassnacht M, Arlt W, Bancos I, Dralle H, Newell-Price J, Sahdev A, Tabarin A, Terzolo M, Tsagarakis S, Dekkers OM. Management of adrenal incidentalomas: European Society of Endocrinology Clinical Practice Guideline in collaboration with the European Network for the Study of Adrenal Tumors. Eur J Endocrinol. 2016 Aug;175(2):G1-G34. doi: 10.1530/EJE-16-0467. PMID: 27390021.
Fassnacht M, Tsagarakis S, Terzolo M, Tabarin A, Sahdev A, Newell-Price J, Pelsma I, Marina L, Lorenz K, Bancos I, Arlt W, Dekkers OM. European Society of Endocrinology clinical practice guidelines on the management of adrenal incidentalomas, in collaboration with the European Network for the Study of Adrenal Tumors. Eur J Endocrinol. 2023 July; 189 (1): G1-G42. Doi: 10.1093/ejendo/lvad066.
Foo E, Turner R, Wang KC, Aniss A, Gill AJ, Sidhu S, Clifton-Bligh R, Sywak M. Predicting malignancy in adrenal incidentaloma and evaluation of a novel risk stratification algorithm. ANZ J Surg. 2018 Mar;88(3):E173-E177. doi: 10.1111/ans.13868. Epub 2017 Jan 24. PMID: 28118677.
Hennings J, Hellman P, Ahlström H, Sundin A. Computed tomography, magnetic resonance imaging and 11C-metomidate positron emission tomography for evaluation of adrenal incidentalomas. Eur J Radiol. 2009 Feb;69(2):314-23. doi: 10.1016/j.ejrad.2007.10.024. Epub 2007 Dec 20. PMID: 18082990.
Herrera MF, Grant CS, van Heerden JA, Sheedy PF, Ilstrup DM. Incidentally discovered adrenal tumors: an institutional perspective. Surgery. 1991 Dec;110(6):1014-21. PMID: 1745970.
Hubbard MM, Husami TW, Abumrad NN. Nonfunctioning adrenal tumors. Dilemmas in management. Am Surg. 1989 Aug;55(8):516-22. PMID: 2764401.
Kasperlik-Zeluska AA, Rosłonowska E, Słowinska-Srzednicka J, Migdalska B, Jeske W, Makowska A, Snochowska H. Incidentally discovered adrenal mass (incidentaloma): investigation and management of 208 patients. Clin Endocrinol (Oxf). 1997 Jan;46(1):29-37. doi: 10.1046/j.1365-2265.1997.d01-1751.x. PMID: 9059555.
Mantero F, Terzolo M, Arnaldi G, Osella G, Masini AM, Alì A, Giovagnetti M, Opocher G, Angeli A. A survey on adrenal incidentaloma in Italy. Study Group on Adrenal Tumors of the Italian Society of Endocrinology. J Clin Endocrinol Metab. 2000 Feb;85(2):637-44. doi: 10.1210/jcem.85.2.6372. PMID: 10690869.
Meyer A, Behrend M. Indications and results of surgery for incidentally found adrenal tumors. Urol Int. 2006;77(2):173-8. doi: 10.1159/000093915. PMID: 16888426.
Reginelli A, Di Grezia G, Izzo A, D'andrea A, Gatta G, Cappabianca S, Squillaci E, Grassi R. Imaging of adrenal incidentaloma: our experience. Int J Surg. 2014;12 Suppl 1:S126-31. doi: 10.1016/j.ijsu.2014.05.029. Epub 2014 May 23. PMID: 24862667.
Sabet FA, Majdzadeh R, Mostafazadeh Davani B, Heidari K, Soltani A. Likelihood ratio of computed tomography characteristics for diagnosis of malignancy in adrenal incidentaloma: systematic review and meta-analysis. J Diabetes Metab Disord. 2016 Apr 21;15:12. doi: 10.1186/s40200-016-0224-z. PMID: 27104171; PMCID: PMC4839087.
Sworczak K, Babńiska A, Stanek A, Lewczuk A, Siekierska-Hellmann M, Błaut K, Drobińska A, Basiński A, Lachński AJ, Czaplińska-Kałas H, Gruca Z. Clinical and histopathological evaluation of the adrenal incidentaloma. Neoplasma. 2001;48(3):221-6. PMID: 11583293.

Evidence tabellen

Evidence table for diagnostic test accuracy studies

Research question: What is the diagnostic accuracy and effect on overall survival of a diagnostic model or multiple diagnostic factors on CT scan to diagnose malignancy in patients with an adrenal incidentaloma discovered on a CT?

Study reference	Study characteristics	Patient characteristics	Index test (test of interest)	Reference test	Follow-up	Outcome measures and effect size	Comments
Sabet, 2016	Systematic review of cohort studies Literature search up to January 2016 A: Birsen, 2014 B: Reginelli, 2014 C: Allan, 2013 D: Henning, 2009 E: Meyer, 2006 F: Mantero, 2000 G: Sworczak, 2000 H: Bergestrom, 2000 I: Kasperlik, 1997 J: Herrera, 1991 K: Hubbard, 1989 Design and country: A: Retrospective cohort study, USA B: Retrospective cohort study, Italy C: Prospective cohort study, USA D: Retrospective and prospective cohort study, Sweden E: Retrospective cohort study, Germany F: Retrospective cohort study, Italy G: Prospective cohort study, Poland H: Prospective cohort study, Sweden I: Prospective cohort study, Poland J: Retrospective cohort study, USA K: Retrospective cohort study, USA Funding and conflicts of interest: Authors of the review declare to have no competing interests.	Inclusion criteria SR: - Original articles - Published after 1970 in English - Discussed CT scan as diagnostic test - Gold standard test (operation, biopsy, FNA or follow-up for more than 6 months) was performed - Presence of full explanation of imaging procedure that follows standard method of CT scanning - Presence of clearly described criteria for index test with accepted thresholds Exclusion criteria SR: - Articles overlapping with others - Articles without any case of malignancy or benign mass - Case report or case series articles 36 studies included Important patient characteristics at baseline: Number of patients, mean age (range) in years: A: N=157, NR B: N=35, NR (25-89) C: N=49, 51 (NR) D: N=38, 67.5 (45-81) 60 (24-77) E: N=52, 56.4 (NR) F: N=1004, 58 (15-86) G: N=57, 54.7 (34-79) H: N=15, NR (42-78) I: N=208, 52 (14-76) J: N=342, 62 (2-86) K: N=28, NR (22-74) Description of the mass: A: Non-functional B: Non-functional C: Functional D: Non-functional E: Non-functional F: Non-functional G: Non-functional H: >1 cm functional, non-functional I: Non-functional J: Non-functional K: Non-functional	Describe index factors: Size (11 studies), mass appearance (3 studies), density (1 study)	Describe reference test¹: A: Operation, follow-up B: Operation, follow-up C: Operation D: Operation, follow-up E: Operation F: Operation G: Operation H: Operation, biopsy I: Operation (>4 cm), biopsy J: Operation, FNA, follow-up K: Operation, FNA, follow-up Prevalence (%) [based on refence test at specified cut-off point]: Not reported	Endpoint of follow-up: Not reported For how many participants were no complete outcome data available: Not reported	Outcome measures and effect size (include 95%CI and p-value if available)⁴: Outcome measure-1 Defined as LR for size: Pooled estimate of sensitivity, specificity, positive and negative LR for different size cut-offs of the adrenal gland mass: Cut-off 3 cm Co-sensitivity [95% CI]: 0.91 [0.83-0.95] Co-specificity [95% CI]: 0.44 [0.28-0.62] Pooled positive LR [95% CI]: 1.6 [1.2-2.2] Pooled negative LR [95% CI]: 0.21 [0.10-0.42] Cut-off 4 cm Co-sensitivity [95% CI]: 0.91 [0.82-0.96] Co-specificity [95% CI]: 0.71 [0.55-0.83] Pooled positive LR [95% CI]: 3.1 [2-4.9] Pooled negative LR [95% CI]: 0.13 [0.06-0.25] Cut-off 5 cm Co-sensitivity [95% CI]: 0.78 [0.67-0.87] Co-specificity [95% CI]: 0.82 [0.65-0.91] Pooled positive LR [95% CI]: 4.3 [2.1-8.9] Pooled negative LR [95% CI]: 0.26 [0.16-0.44] Cut-off 6 cm Co-sensitivity [95% CI]: 0.74 [0.63-0.82] Co-specificity [95% CI]: 0.85 [0.69-0.94] Pooled positive LR [95% CI]: 5.0 [2.4-10.8] Pooled negative LR [95% CI]: 0.31 [0.22-0.43] Outcome measure-2 Defined as LR for mass appearance: Sensitivity, specificity, positive and negative LR for different appearance characteristics of the adrenal mass: Heterogeneity (3 included studies) Sensitivity: 0.79; 0.93; 0.75 Specificity: 0.71; 1; 0.78 Positive LR: 2.72; ¥ ; 3.4 Negative LR: 0.29; 0.07; 0.32 Irregularity (2 studies included) Sensitivity: 0.41; 0.50 Specificity: 0.93; 0.98 Positive LR: 5.85; 0.45 Negative LR: 0.63; NR Rough margin (1 study included) Sensitivity: 0.56 Specificity: 0.90 Positive LR: 5.6 Negative LR: 0.48 Outcome measure-3 Defined as LR for mass density: Sensitivity, specificity, positive and negative LR for different densities of the adrenal mass: 10 HU (1 study included) Sensitivity: 1 Specificity: 0.65 Positive LR: 2.85 Negative LR: 0 16 HU (1 study included) Sensitivity: 0.95 Specificity: 1 Positive LR: ¥ Negative LR: 0.05 20 HU (1 study included) Sensitivity: 1 Specificity: 0.81 Positive LR: 5.26 Negative LR: 0	Study quality (ROB): QUADAS score (out of 14) A: 13 B: 11 C: 12 D: 13 E: 12 F: 11 G: 12 H: 11 I: 12 J: 12 K: 12 Authors conclusion: As a conclusion, an evidence-based flowchart is suggested in which among the patients without history of malignancy adrenal masses smaller than 4 cm or the ones larger than 4 cm with density of less than 10 HU can be just followed up but the lesions larger than 4 cm with density more than 10 HU should be gone under additional diagnostic procedure. Different appearances of the mass do not show a potent positive or negative LR.
Foo, 2018	Type of study²: Retrospective analysis of prospective cohort Setting and country: Single center study, Australia Funding and conflicts of interest: Not reported	Inclusion criteria: - Consecutive patients referred for evaluation of AI between 2004 and 2014 Exclusion criteria: - Symptomatic patients presenting for investigation of adrenal tumors - Patients with known extra-adrenal primary cancer screened for metastatic disease N=96 Prevalence:8.2% Median age in years (range): 59 (25-77) Sex: 48% male / 52% female Other important characteristics: Mean tumor diameter in mm (SD): 34 (18.8)	Describe index factors: Age, gender, previous history of malignancy, tumor size, density, percentage washout on contrast CT, mass appearance, calcification on CT and cortisol levels Description of Scaled Score Algorithm published by research group from the Cleveland Clinic: Tumor size <40, 40-60 or >60 mm scored 1,2 or 3 respectively. Tumor density on non-contrast CT <10 HU, 10-20 HU or >20 HU scored 1, 2 or 3. Cut-off point(s): Sensitivity, specificity and positive likelihood ratios were calculated for total score cut-off 5.	Describe reference test³: Histopathology for surgical cases or follow-up for at least six months. Cut-off point(s): Not reported	Time between the index test and reference test: Depending on reference test (follow-up at least six months) For how many participants were no complete outcome data available: N=32 (33%) N=23: Not both size and density measurements on CT N=9: Hormonally active tumors	Outcome measures and effect size (include 95%CI and p-value if available)⁴: Diagnostic accuracy Scaled Score Algorithm: Score 5-6: N=11 (17%) Sensitivity: 75% Specificity: 87% Area under ROC curve [95% CI]: 0.81 [0.52-1.00]	Authors conclusion: We propose that a combination of variables, including size, density and percentage washout on contrast CT, need to be included in order to improve on current risk stratification models for the management of AI. Factors age, gender, history of malignancy, tumor size, density, percentage washout on contrast CT, mass appearance, calcification on CT and cortisol levels are in the univariate analysis and therefore results are not displayed.
Corwin, 2022	Type of study: Retrospective cohort study Setting and country: Six institutions, United States Funding and conflicts of interest: The author declares there are no disclosures relevant to the subject matter of this article	Inclusion criteria: - Patients 18 years or older who underwent adrenal washout CT examination - CT examinations between 2003 and 2017 Exclusion criteria: - CT report did not describe adrenal nodules or nodules measuring less than 1 cm in short-axis diameter - History of malignancy or clinical suspicion for a functional adrenal tumor at time of washout CT - Clear evidence of metastatic malignancy on washout CT - Artifact on washout CT - Adrenal nodule characteristics: Unenhanced attenuation < 10 HUR, absence of enhancement > 10 HU, heterogeneity on unenhanced images, other suspicious features including cystic or necrotic appearance - No reference standard available N= 336 nodules (in 299 patients) Prevalence: N=5 (1.5%) No patient characteristics reported.	Describe index factors: Nodule size (<4 cm versus ≥ 4 cm) and washout (presence versus absence of washout ≥ 60%) Cut-off point(s): See above for cut-off points for different factors.	Describe reference test: Any available pathologic specimen from surgical resection or percutaneous biopsy In absence of pathologic specimen: Abdominal CT, chest CT, lumbar spine CT, lumbar spine MRI or PET/CT examinations performed at least one year before or after the washout CT examinations In patients with no pathology or image follow-up or indeterminate growth rate (4-7 mm per year) EMR was reviewed to identify clinical notes Cut-off point(s): Benignancy was defined as either no change in size or growth of 3 mm per year or less. Malignancy was defined as growth of 8 mm per year or more. In patients with no pathology or image follow-up or indeterminate growth rate (4-7 mm per year) EMR was reviewed to identify clinical notes. Benignancy was defined as no clinical evidence of adrenal malignancy documented at least 5 years after the wash-out CT examination.	Time between the index test and reference test: Not reported For how many participants were no complete outcome data available: No missing data reported.	Outcome measures and effect size (include 95%CI and p-value if available): Diagnostic performance of 60% washout or more for differentiating benign versus malignant nodules: Sensitivity: 75.5% (95%CI 70.4-80.1) Specificity:80% (95%CI 28.4-99.5) PPV: 99.6% (95%CI 97.7-99.9) NPV: 4.8% (95%CI 3.0-7.5) Diagnostic performance of 60% washout or more and nodules < 4 cm for differentiating benign versus malignant nodules: Sensitivity: 77.5% (95%CI 72.4-82.1) Specificity:100% (95%CI 2.5-100) PPV: 100% (95%CI NA) NPV: 1.4% (95%CI 1.1-1.8)	Authors conclusion: Our findings suggest that washout CT has limited utility in the evaluation of incidental adrenal nodules smaller than 4 cm in patients without known malignancy. Very low prevalence of malignancy and therefore wide confidence intervals regarding specificity and positive predictive value (PPV). Pheochromocytoma’s were excluded from the diagnostic performance analysis.
¥ Indicates significantly high positive LR

Study reference

Study characteristics

Patient characteristics

Index test

(test of interest)

Reference test

Follow-up

Outcome measures and effect size

Comments

Sabet, 2016

Systematic review of cohort studies

Literature search up to January 2016

A: Birsen, 2014

B: Reginelli, 2014

C: Allan, 2013

D: Henning, 2009

E: Meyer, 2006

F: Mantero, 2000

G: Sworczak, 2000

H: Bergestrom, 2000

I: Kasperlik, 1997

J: Herrera, 1991

K: Hubbard, 1989

Design and country:

A: Retrospective cohort study, USA

B: Retrospective cohort study, Italy

C: Prospective cohort study, USA

D: Retrospective and prospective cohort study, Sweden

E: Retrospective cohort study, Germany

F: Retrospective cohort study, Italy

G: Prospective cohort study, Poland

H: Prospective cohort study, Sweden

I: Prospective cohort study, Poland

J: Retrospective cohort study, USA

K: Retrospective cohort study, USA

Funding and conflicts of interest:

Authors of the review declare to have no competing interests.

Inclusion criteria SR:

- Original articles

- Published after 1970 in English

- Discussed CT scan as diagnostic test

- Gold standard test (operation, biopsy, FNA or follow-up for more than 6 months) was performed

- Presence of full explanation of imaging procedure that follows standard method of CT scanning

- Presence of clearly described criteria for index test with accepted thresholds

Exclusion criteria SR:

- Articles overlapping with others

- Articles without any case of malignancy or benign mass

- Case report or case series articles

36 studies included

Important patient characteristics at baseline:

Number of patients, mean age (range) in years:

A: N=157, NR

B: N=35, NR (25-89)

C: N=49, 51 (NR)

D: N=38, 67.5 (45-81) 60 (24-77)

E: N=52, 56.4 (NR)

F: N=1004, 58 (15-86)

G: N=57, 54.7 (34-79)

H: N=15, NR (42-78)

I: N=208, 52 (14-76)

J: N=342, 62 (2-86)

K: N=28, NR (22-74)

Description of the mass:

A: Non-functional

B: Non-functional

C: Functional

D: Non-functional

E: Non-functional

F: Non-functional

G: Non-functional

H: >1 cm functional, non-functional

I: Non-functional

J: Non-functional

K: Non-functional

Describe index factors:

Size (11 studies), mass appearance (3 studies), density (1 study)

Describe reference test¹:

A: Operation, follow-up

B: Operation, follow-up

C: Operation

D: Operation, follow-up

E: Operation

F: Operation

G: Operation

H: Operation, biopsy

I: Operation (>4 cm), biopsy

J: Operation, FNA, follow-up

K: Operation, FNA, follow-up

Prevalence (%)

[based on refence test at specified cut-off point]:

Not reported

Endpoint of follow-up:

Not reported

For how many participants were no complete outcome data available:

Not reported

Outcome measures and effect size (include 95%CI and p-value if available)⁴:

Outcome measure-1

Defined as LR for size: Pooled estimate of sensitivity, specificity, positive and negative LR for different size cut-offs of the adrenal gland mass:

Cut-off 3 cm

Co-sensitivity [95% CI]: 0.91 [0.83-0.95]

Co-specificity [95% CI]: 0.44 [0.28-0.62]

Pooled positive LR [95% CI]: 1.6 [1.2-2.2]

Pooled negative LR [95% CI]: 0.21 [0.10-0.42]

Cut-off 4 cm

Co-sensitivity [95% CI]: 0.91 [0.82-0.96]

Co-specificity [95% CI]: 0.71 [0.55-0.83]

Pooled positive LR [95% CI]: 3.1 [2-4.9]

Pooled negative LR [95% CI]: 0.13 [0.06-0.25]

Cut-off 5 cm

Co-sensitivity [95% CI]: 0.78 [0.67-0.87]

Co-specificity [95% CI]: 0.82 [0.65-0.91]

Pooled positive LR [95% CI]: 4.3 [2.1-8.9]

Pooled negative LR [95% CI]: 0.26 [0.16-0.44]

Cut-off 6 cm

Co-sensitivity [95% CI]: 0.74 [0.63-0.82]

Co-specificity [95% CI]: 0.85 [0.69-0.94]

Pooled positive LR [95% CI]: 5.0 [2.4-10.8]

Pooled negative LR [95% CI]: 0.31 [0.22-0.43]

Outcome measure-2

Defined as LR for mass appearance: Sensitivity, specificity, positive and negative LR for different appearance characteristics of the adrenal mass:

Heterogeneity (3 included studies)

Sensitivity: 0.79; 0.93; 0.75

Specificity: 0.71; 1; 0.78

Positive LR: 2.72; ¥ ; 3.4

Negative LR: 0.29; 0.07; 0.32

Irregularity (2 studies included)

Sensitivity: 0.41; 0.50

Specificity: 0.93; 0.98

Positive LR: 5.85; 0.45

Negative LR: 0.63; NR

Rough margin (1 study included)

Sensitivity: 0.56

Specificity: 0.90

Positive LR: 5.6

Negative LR: 0.48

Outcome measure-3

Defined as LR for mass density: Sensitivity, specificity, positive and negative LR for different densities of the adrenal mass:

10 HU (1 study included)

Sensitivity: 1

Specificity: 0.65

Positive LR: 2.85

Negative LR: 0

16 HU (1 study included)

Sensitivity: 0.95

Specificity: 1

Positive LR: ¥

Negative LR: 0.05

20 HU (1 study included)

Sensitivity: 1

Specificity: 0.81

Positive LR: 5.26

Negative LR: 0

Study quality (ROB):
QUADAS score (out of 14)
A: 13

B: 11

C: 12

D: 13

E: 12

F: 11

G: 12

H: 11

I: 12

J: 12

K: 12

Authors conclusion:
As a conclusion, an evidence-based flowchart is suggested in which among the patients without history of malignancy adrenal masses smaller than 4 cm or the ones larger than 4 cm with density of less than 10 HU can be just followed up but the lesions larger than 4 cm with density more than 10 HU should be gone under additional diagnostic procedure.

Different appearances of the mass do not show a potent positive or negative LR.

Foo, 2018

Type of study²: Retrospective analysis of prospective cohort

Setting and country: Single center study, Australia

Funding and conflicts of interest: Not reported

Inclusion criteria:

- Consecutive patients referred for evaluation of AI between 2004 and 2014

Exclusion criteria:

- Symptomatic patients presenting for investigation of adrenal tumors

- Patients with known extra-adrenal primary cancer screened for metastatic disease

N=96

Prevalence:8.2%

Median age in years (range): 59 (25-77)

Sex: 48% male / 52% female

Other important characteristics:

Mean tumor diameter in mm (SD): 34 (18.8)

Describe index factors:

Age, gender, previous history of malignancy, tumor size, density, percentage washout on contrast CT, mass appearance, calcification on CT and cortisol levels

Description of Scaled Score Algorithm published by research group from the Cleveland Clinic:

Tumor size <40, 40-60 or >60 mm scored 1,2 or 3 respectively.

Tumor density on non-contrast CT <10 HU, 10-20 HU or >20 HU scored 1, 2 or 3.

Cut-off point(s):

Sensitivity, specificity and positive likelihood ratios were calculated for total score cut-off 5.

Describe reference test³:

Histopathology for surgical cases or follow-up for at least six months.

Cut-off point(s): Not reported

Time between the index test and reference test: Depending on reference test (follow-up at least six months)

For how many participants were no complete outcome data available:

N=32 (33%)

N=23: Not both size and density measurements on CT
N=9: Hormonally active tumors

Outcome measures and effect size (include 95%CI and p-value if available)⁴:

Diagnostic accuracy Scaled Score Algorithm:

Score 5-6: N=11 (17%)

Sensitivity: 75%

Specificity: 87%

Area under ROC curve [95% CI]: 0.81 [0.52-1.00]

Authors conclusion:

We propose that a combination of variables, including size, density and percentage washout on contrast CT, need to be included in order to improve on current risk stratification models for the management of AI.

Factors age, gender, history of malignancy, tumor size, density, percentage washout on contrast CT, mass appearance, calcification on CT and cortisol levels are in the univariate analysis and therefore results are not displayed.

Corwin, 2022

Type of study: Retrospective cohort study

Setting and country: Six institutions, United States

Funding and conflicts of interest: The author declares there are no disclosures relevant to the subject matter of this article

Inclusion criteria:

- Patients 18 years or older who underwent adrenal washout CT examination

- CT examinations between 2003 and 2017

Exclusion criteria:

- CT report did not describe adrenal nodules or nodules measuring less than 1 cm in short-axis diameter

- History of malignancy or clinical suspicion for a functional adrenal tumor at time of washout CT

- Clear evidence of metastatic malignancy on washout CT

- Artifact on washout CT

- Adrenal nodule characteristics: Unenhanced attenuation < 10 HUR, absence of enhancement > 10 HU, heterogeneity on unenhanced images, other suspicious features including cystic or necrotic appearance

- No reference standard available

N= 336 nodules (in 299 patients)

Prevalence: N=5 (1.5%)

No patient characteristics reported.

Describe index factors:

Nodule size (<4 cm versus ≥ 4 cm) and washout (presence versus absence of washout ≥ 60%)

Cut-off point(s):

See above for cut-off points for different factors.

Describe reference test:

Any available pathologic specimen from surgical resection or percutaneous biopsy
In absence of pathologic specimen: Abdominal CT, chest CT, lumbar spine CT, lumbar spine MRI or PET/CT examinations performed at least one year before or after the washout CT examinations
In patients with no pathology or image follow-up or indeterminate growth rate (4-7 mm per year) EMR was reviewed to identify clinical notes

Cut-off point(s):

Benignancy was defined as either no change in size or growth of 3 mm per year or less.

Malignancy was defined as growth of 8 mm per year or more.

In patients with no pathology or image follow-up or indeterminate growth rate (4-7 mm per year) EMR was reviewed to identify clinical notes. Benignancy was defined as no clinical evidence of adrenal malignancy documented at least 5 years after the wash-out CT examination.

Time between the index test and reference test: Not reported

For how many participants were no complete outcome data available: No missing data reported.

Outcome measures and effect size (include 95%CI and p-value if available):

Diagnostic performance of 60% washout or more for differentiating benign versus malignant nodules:

Sensitivity: 75.5% (95%CI 70.4-80.1)

Specificity:80% (95%CI 28.4-99.5)

PPV: 99.6% (95%CI 97.7-99.9)

NPV: 4.8% (95%CI 3.0-7.5)

Diagnostic performance of 60% washout or more and nodules < 4 cm for differentiating benign versus malignant nodules:

Sensitivity: 77.5% (95%CI 72.4-82.1)

Specificity:100% (95%CI 2.5-100)

PPV: 100% (95%CI NA)

NPV: 1.4% (95%CI 1.1-1.8)

Authors conclusion:

Our findings suggest that washout CT has limited utility in the evaluation of incidental adrenal nodules smaller than 4 cm in patients without known malignancy.

Very low prevalence of malignancy and therefore wide confidence intervals regarding specificity and positive predictive value (PPV).

Pheochromocytoma’s were excluded from the diagnostic performance analysis.

¥ Indicates significantly high positive LR

² In geval van een case-control design moeten de patiëntkarakteristieken per groep (cases en controls) worden uitgewerkt. NB; case control studies zullen de accuratesse overschatten (Lijmer et al., 1999)

³ De referentiestandaard is de test waarmee definitief wordt aangetoond of iemand al dan niet ziek is. Idealiter is de referentiestandaard de Gouden standaard (100% sensitief en 100% specifiek). Let op! dit is niet de “comparison test/index 2”.

⁴ Beschrijf de statistische parameters voor de vergelijking van de indextest(en) met de referentietest, en voor de vergelijking tussen de indextesten onderling (als er twee of meer indextesten worden vergeleken).

Risk of bias tables

Table of quality assessment for systematic reviews of diagnostic studies

Based on AMSTAR checklist (Shea et al.; 2007, BMC Methodol 7: 10; doi:10.1186/1471-2288-7-10) and PRISMA checklist (Moher et al 2009, PLoS Med 6: e1000097; doi:10.1371/journal.pmed1000097)

Study First author, year	Appropriate and clearly focused question?¹ Yes/no/unclear	Comprehensive and systematic literature search?² Yes/no/unclear	Description of included and excluded studies?³ Yes/no/unclear	Description of relevant characteristics of included studies?⁴ Yes/no/unclear	Assessment of scientific quality of included studies?⁵ Yes/no/unclear	Enough similarities between studies to make combining them reasonable?⁶ Yes/no/unclear	Potential risk of publication bias taken into account?⁷ Yes/no/unclear	Potential conflicts of interest reported?⁸ Yes/no/unclear
Sabet, 2016	Yes	Yes	No, no description of excluded studies	No, no clear description of prevalence, incomplete outcome data and end-point of follow-up	Yes, QUADAS	Unclear, individual study populations included functional and non-functional tumors	No	No, no potential conflicts of interest reported of individual studies

Study

First author, year

Appropriate and clearly focused question?¹

Yes/no/unclear

Comprehensive and systematic literature search?²

Yes/no/unclear

Description of included and excluded studies?³

Yes/no/unclear

Description of relevant characteristics of included studies?⁴

Yes/no/unclear

Assessment of scientific quality of included studies?⁵

Yes/no/unclear

Enough similarities between studies to make combining them reasonable?⁶

Yes/no/unclear

Potential risk of publication bias taken into account?⁷

Yes/no/unclear

Potential conflicts of interest reported?⁸

Yes/no/unclear

Sabet, 2016

Yes

No, no description of excluded studies

No, no clear description of prevalence, incomplete outcome data and end-point of follow-up

Yes, QUADAS

Unclear, individual study populations included functional and non-functional tumors

No, no potential conflicts of interest reported of individual studies

Risk of bias assessment diagnostic accuracy studies (QUADAS II, 2011)

discovered on a CT?

Study reference	Patient selection	Index test	Reference standard	Flow and timing	Comments with respect to applicability
Foo, 2018	Was a consecutive or random sample of patients enrolled? Unclear, due to single-center study a selection bias might have occurred Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes	Were the index test results interpreted without knowledge of the results of the reference standard? Unclear, no specification of interpretation method of index test. If a threshold was used, was it pre-specified? Yes, tumor size and density thresholds were specified	Is the reference standard likely to correctly classify the target condition? Yes Were the reference standard results interpreted without knowledge of the results of the index test? Unclear, no specification of interpretation method of reference test	Was there an appropriate interval between index test(s) and reference standard? Unclear, no definitions of intervals Did all patients receive a reference standard? Yes Did patients receive the same reference standard? No Were all patients included in the analysis? No, but clear reasons for exclusion	Are there concerns that the included patients do not match the review question? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? No Are there concerns that the target condition as defined by the reference standard does not match the review question? No
CONCLUSION: Could the selection of patients have introduced bias? RISK: UNCLEAR	CONCLUSION: Could the conduct or interpretation of the index test have introduced bias? RISK: UNCLEAR	CONCLUSION: Could the reference standard, its conduct, or its interpretation have introduced bias? RISK: UNCLEAR	CONCLUSION Could the patient flow have introduced bias? RISK: HIGH
Corwin, 2022	Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes	Were the index test results interpreted without knowledge of the results of the reference standard? No, only the first investigator at each institution reviewed both the adrenal washout CT examinations and reference standard If a threshold was used, was it pre-specified? Yes	Is the reference standard likely to correctly classify the target condition? Unclear, different reference standards were used. Were the reference standard results interpreted without knowledge of the results of the index test? Unclear	Was there an appropriate interval between index test(s) and reference standard? Yes Did all patients receive a reference standard? Yes Did patients receive the same reference standard? No Were all patients included in the analysis? No, patients with pheochromocytomas were excluded from the analysis.	Are there concerns that the included patients do not match the review question? Unclear, no patient characteristic reported Are there concerns that the index test, its conduct, or interpretation differ from the review question? No Are there concerns that the target condition as defined by the reference standard does not match the review question? No
	CONCLUSION: Could the selection of patients have introduced bias? RISK: LOW	CONCLUSION: Could the conduct or interpretation of the index test have introduced bias? RISK: HIGH	CONCLUSION: Could the reference standard, its conduct, or its interpretation have introduced bias? RISK: HIGH	CONCLUSION Could the patient flow have introduced bias? RISK: HIGH

Study reference

Patient selection

Index test

Reference standard

Flow and timing

Comments with respect to applicability

Foo, 2018

Was a consecutive or random sample of patients enrolled?

Unclear, due to single-center study a selection bias might have occurred

Was a case-control design avoided?

Yes

Did the study avoid inappropriate exclusions?

Yes

Were the index test results interpreted without knowledge of the results of the reference standard?

Unclear, no specification of interpretation method of index test.

If a threshold was used, was it pre-specified?

Yes, tumor size and density thresholds were specified

Is the reference standard likely to correctly classify the target condition?

Yes

Were the reference standard results interpreted without knowledge of the results of the index test?

Unclear, no specification of interpretation method of reference test

Was there an appropriate interval between index test(s) and reference standard?

Unclear, no definitions of intervals

Did all patients receive a reference standard?

Yes

Did patients receive the same reference standard?

Were all patients included in the analysis?

No, but clear reasons for exclusion

Are there concerns that the included patients do not match the review question?

Are there concerns that the index test, its conduct, or interpretation differ from the review question?

Are there concerns that the target condition as defined by the reference standard does not match the review question?

CONCLUSION:

Could the selection of patients have introduced bias?

RISK: UNCLEAR

CONCLUSION:

Could the conduct or interpretation of the index test have introduced bias?

RISK: UNCLEAR

CONCLUSION:

Could the reference standard, its conduct, or its interpretation have introduced bias?

RISK: UNCLEAR

CONCLUSION

Could the patient flow have introduced bias?

RISK: HIGH

Corwin, 2022

Was a consecutive or random sample of patients enrolled?

Yes

Was a case-control design avoided?

Yes

Did the study avoid inappropriate exclusions?

Yes

Were the index test results interpreted without knowledge of the results of the reference standard?

No, only the first investigator at each institution reviewed both the adrenal washout CT examinations and reference standard

If a threshold was used, was it pre-specified?

Yes

Is the reference standard likely to correctly classify the target condition?

Unclear, different reference standards were used.

Were the reference standard results interpreted without knowledge of the results of the index test?

Unclear

Was there an appropriate interval between index test(s) and reference standard?

Yes

Did all patients receive a reference standard?

Yes

Did patients receive the same reference standard?

Were all patients included in the analysis?

No, patients with pheochromocytomas were excluded from the analysis.

Are there concerns that the included patients do not match the review question?

Unclear, no patient characteristic reported

Are there concerns that the index test, its conduct, or interpretation differ from the review question?

Are there concerns that the target condition as defined by the reference standard does not match the review question?

CONCLUSION:

Could the selection of patients have introduced bias?

RISK: LOW

CONCLUSION:

Could the conduct or interpretation of the index test have introduced bias?

RISK: HIGH

CONCLUSION:

Could the reference standard, its conduct, or its interpretation have introduced bias?

RISK: HIGH

CONCLUSION

Could the patient flow have introduced bias?

RISK: HIGH

Table of excluded studies

Reference	Reason for exclusion
Marty M, Gaye D, Perez P, Auder C, Nunes ML, Ferriere A, Haissaguerre M, Tabarin A. Diagnostic accuracy of computed tomography to identify adenomas among adrenal incidentalomas in an endocrinological population. Eur J Endocrinol. 2018 May;178(5):439-446. doi: 10.1530/EJE-17-1056. Epub 2018 Feb 21. PMID: 29467231.	Wrong target condition: Diagnosis of benign adrenal incidentalomas
Cho YY, Suh S, Joung JY, Jeong H, Je D, Yoo H, Park TK, Min YK, Kim KW, Kim JH. Clinical characteristics and follow-up of Korean patients with adrenal incidentalomas. Korean J Intern Med. 2013 Sep;28(5):557-64. doi: 10.3904/kjim.2013.28.5.557. Epub 2013 Aug 14. PMID: 24009451; PMCID: PMC3759761.	Wrong study population: Nonfunctional and functional adrenal incidentalomas
Moawad AW, Ahmed A, Fuentes DT, Hazle JD, Habra MA, Elsayes KM. Machine learning-based texture analysis for differentiation of radiologically indeterminate small adrenal tumors on adrenal protocol CT scans. Abdom Radiol (NY). 2021 Oct;46(10):4853-4863. doi: 10.1007/s00261-021-03136-2. Epub 2021 Jun 3. PMID: 34085089.	Wrong study population: Indeterminate incidentalomas
Wale DJ, Wong KK, Viglianti BL, Rubello D, Gross MD. Contemporary imaging of incidentally discovered adrenal masses. Biomed Pharmacother. 2017 Mar;87:256-262. doi: 10.1016/j.biopha.2016.12.090. Epub 2017 Jan 4. PMID: 28063406.	Wrong study design
Dinnes J, Bancos I, Ferrante di Ruffano L, Chortis V, Davenport C, Bayliss S, Sahdev A, Guest P, Fassnacht M, Deeks JJ, Arlt W. MANAGEMENT OF ENDOCRINE DISEASE: Imaging for the diagnosis of malignancy in incidentally discovered adrenal masses: a systematic review and meta-analysis. Eur J Endocrinol. 2016 Aug;175(2):R51-64. doi: 10.1530/EJE-16-0461. Epub 2016 Jun 2. PMID: 27257145; PMCID: PMC5065077.	Wrong tests: MRI, PET and wrong target condition: Detection ACC or adrenal metastases
Kahramangil B, Kose E, Remer EM, Reynolds JP, Stein R, Rini B, Siperstein A, Berber E. A Modern Assessment of Cancer Risk in Adrenal Incidentalomas: Analysis of 2219 Patients. Ann Surg. 2022 Jan 1;275(1):e238-e244. doi: 10.1097/SLA.0000000000004048. PMID: 32541223.	Univariate analysis
Cyranska-Chyrek E, Szczepanek-Parulska E, Olejarz M, Ruchala M. Malignancy Risk and Hormonal Activity of Adrenal Incidentalomas in a Large Cohort of Patients from a Single Tertiary Reference Center. Int J Environ Res Public Health. 2019 May 27;16(10):1872. doi: 10.3390/ijerph16101872. PMID: 31137898; PMCID: PMC6571894.	Wrong study design
Ohno Y, Sone M, Taura D, Yamasaki T, Kojima K, Honda-Kohmo K, Fukuda Y, Matsuo K, Fujii T, Yasoda A, Ogawa O, Inagaki N. Evaluation of quantitative parameters for distinguishing pheochromocytoma from other adrenal tumors. Hypertens Res. 2018 Mar;41(3):165-175. doi: 10.1038/s41440-017-0002-4. Epub 2018 Jan 18. PMID: 29348428.	Wrong comparison: Pheochromocytomas versus other adrenal tumors
Zekan D, King RS, Hajiran A, Patel A, Deem S, Luchey A. Diagnostic dilemmas: a multi-institutional retrospective analysis of adrenal incidentaloma pathology based on radiographic size. BMC Urol. 2022 Apr 30;22(1):73. doi: 10.1186/s12894-022-01024-5. PMID: 35501776; PMCID: PMC9063092.	Wrong comparison: Radiology versus pathological factors after adrenalectomy
Haan RR, Visser JBR, Pons E, Feelders RA, Kaymak U, Hunink MGM, Visser JJ. Patient-specific workup of adrenal incidentalomas. Eur J Radiol Open. 2017 Sep 7;4:108-114. doi: 10.1016/j.ejro.2017.08.002. PMID: 28932767; PMCID: PMC5596359.	Wrong design: Prediction model for overall factors (not specific diagnostic factors on CT)
Crimì F, Quaia E, Cabrelle G, Zanon C, Pepe A, Regazzo D, Tizianel I, Scaroni C, Ceccato F. Diagnostic Accuracy of CT Texture Analysis in Adrenal Masses: A Systematic Review. Int J Mol Sci. 2022 Jan 7;23(2):637. doi: 10.3390/ijms23020637. PMID: 35054823; PMCID: PMC8776161.	Wrong outcomes
Schloetelburg W, Ebert I, Petritsch B, Weng AM, Dischinger U, Kircher S, Buck AK, Bley TA, Deutschbein T, Fassnacht M. Adrenal wash-out CT: moderate diagnostic value in distinguishing benign from malignant adrenal masses. Eur J Endocrinol. 2021 Dec 10;186(2):183-193. doi: 10.1530/EJE-21-0650. PMID: 34813495; PMCID: PMC8679842.	Univariate analysis
Al-Waeli DK, Mansour AA, Haddad NS. Reliability of adrenal computed tomography in predicting the functionality of adrenal incidentaloma. Niger Postgrad Med J. 2020 Apr-Jun;27(2):101-107. doi: 10.4103/npmj.npmj_156_19. PMID: 32295940.	Wrong study population: Functional adrenal incidentalomas
Clark TJ, Hsu LD, Hippe D, Cowan S, Carnell J, Wang CL. Evaluation of diagnostic accuracy: multidetector CT image noise correction improves specificity of a Gaussian model-based algorithm used for characterization of incidental adrenal nodules. Abdom Radiol (NY). 2019 Mar;44(3):1033-1043. doi: 10.1007/s00261-018-1871-y. PMID: 30600378.	Wrong target condition: Detection of adrenal incidentaloma
Iñiguez-Ariza NM, Kohlenberg JD, Delivanis DA, Hartman RP, Dean DS, Thomas MA, Shah MZ, Herndon J, McKenzie TJ, Arlt W, Young WF Jr, Bancos I. Clinical, Biochemical, and Radiological Characteristics of a Single-Center Retrospective Cohort of 705 Large Adrenal Tumors. Mayo Clin Proc Innov Qual Outcomes. 2017 Dec 21;2(1):30-39. doi: 10.1016/j.mayocpiqo.2017.11.002. PMID: 30225430; PMCID: PMC6124341.	Wrong study population: Patients with adrenal tumors > 4 centimeters
Hanna FWF, Issa BG, Sim J, Keevil B, Fryer AA. Management of incidental adrenal tumours. BMJ. 2018 Jan 18;360:j5674. doi: 10.1136/bmj.j5674. PMID: 29348269.	Wrong study design
Ahn SH, Kim JH, Baek SH, Kim H, Cho YY, Suh S, Kim BJ, Hong S, Koh JM, Lee SH, Song KH. Characteristics of Adrenal Incidentalomas in a Large, Prospective Computed Tomography-Based Multicenter Study: The COAR Study in Korea. Yonsei Med J. 2018 Jun;59(4):501-510. doi: 10.3349/ymj.2018.59.4.501. PMID: 29749133; PMCID: PMC5949292.	Wrong indextest: HPA axis test and wrong comparison: COAR versus SIE cohort
Helck A, Hummel N, Meinel FG, Johnson T, Nikolaou K, Graser A. Can single-phase dual-energy CT reliably identify adrenal adenomas? Eur Radiol. 2014 Jul;24(7):1636-42. doi: 10.1007/s00330-014-3192-z. Epub 2014 May 8. PMID: 24804633.	Univariate analysis

Verantwoording

Autorisatiedatum en geldigheid

Laatst beoordeeld : 07-05-2024

Laatst geautoriseerd : 07-05-2024

Geplande herbeoordeling : 01-01-2025

Initiatief en autorisatie

Initiatief:

Nederlandse Vereniging voor Heelkunde

Geautoriseerd door:

Nederlandse Internisten Vereniging
Nederlandse Vereniging voor Anesthesiologie
Nederlandse Vereniging voor Heelkunde
Nederlandse Vereniging voor Pathologie
Nederlandse Vereniging voor Radiologie
Nederlandse Vereniging voor Radiotherapie en Oncologie
Nederlandse Vereniging voor Urologie
Vereniging Klinische Genetica Nederland

Algemene gegevens

De ontwikkeling/herziening van deze richtlijnmodule werd ondersteund door het Kennisinstituut van de Federatie Medisch Specialisten www.demedischspecialist.nl/kennisinstituut) en werd gefinancierd uit de Kwaliteitsgelden Medisch Specialisten (SKMS). De financier heeft geen enkele invloed gehad op de inhoud van de richtlijnmodule.

Samenstelling werkgroep

Voor het ontwikkelen van de richtlijnmodule is in 2021 een multidisciplinaire werkgroep ingesteld, bestaande uit vertegenwoordigers van alle relevante specialismen en patiëntvertegenwoordigers (zie hiervoor de Samenstelling van de werkgroep) die betrokken zijn bij de zorg voor patiënten met bijniertumoren.

Werkgroep

Prof. dr. M.R. (Menno) Vriens, endocrien oncologisch chirurg, werkzaam in het UMC Utrecht te Utrecht, NVvH (voorzitter)
Prof. dr. S. (Schelto) Kruijff, endocrien oncologisch chirurg, werkzaam in het UMCG te Groningen, NVvH (voorzitter)
Prof. dr. R.A. (Richard) Feelders, internist-endocrinoloog, werkzaam in het Erasmus MC te Rotterdam, NIV
Prof. dr. H.R. (Harm) Haak, internist, werkzaam in het Máxima MC te Eindhoven, NIV
Drs. J.F. (Julia) Heusdens, anesthesioloog, werkzaam in het UMC Utrecht te Utrecht, NVA
Prof. dr. R.R. (Ronald) de Krijger, patholoog, werkzaam in het UMC Utrecht/Prinses Máxima Centrum te Utrecht, NVVP
Drs. J. (Jeroen) Vister, radioloog, werkzaam in het Universitair Medisch Centrum Groningen te Groningen, NVvR
Dr. M.R. (Max) Dahele, radiotherapeut-oncoloog, werkzaam in het Amsterdam UMC te Amsterdam, NVRO
Dr. J.F. (Hans) Langenhuijsen, uroloog, werkzaam in het Radboudumc te Nijmegen, NVU
Dr. B.P.M. (Bernadette) van Nesselrooij, klinisch geneticus, werkzaam in het UMC Utrecht te Utrecht, VKGN
J.G. (Johan) Beun, manager/coördinator BijnierNET, BijnierNET
D.D. (Diana) Kwast-Hoekstra, MScN.RN. Verplegingswetenschapper en patientvertegenwoordiger, Bijniervereniging NVACP (tot 31-12-2022)
Drs. N.T.M. (Nick) van der Meij, verpleegkundig specialist, werkzaam in het UMC Utrecht, te Utrecht, LWEV

Met ondersteuning van

dr. A. (Anja) van der Hout, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
drs. S. (Sarah) van Duijn, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
drs. M. (Miriam) te Lintel Hekkert, adviseur, Kennisinstituut van de Federatie Medisch Specialisten
drs. I. (Ingeborg) van Dusseldorp, medisch informatiespecialist, Kennisinstituut van de Federatie Medisch Specialisten

Belangenverklaringen

De Code ter voorkoming van oneigenlijke beïnvloeding door belangenverstrengeling is gevolgd. Alle werkgroepleden hebben schriftelijk verklaard of zij in de laatste drie jaar directe financiële belangen (betrekking bij een commercieel bedrijf, persoonlijke financiële belangen, onderzoeksfinanciering) of indirecte belangen (persoonlijke relaties, reputatiemanagement) hebben gehad. Gedurende de ontwikkeling of herziening van een module worden wijzigingen in belangen aan de voorzitter doorgegeven. De belangenverklaring wordt opnieuw bevestigd tijdens de commentaarfase.

Een overzicht van de belangen van werkgroepleden en het oordeel over het omgaan met eventuele belangen vindt u in onderstaande tabel. De ondertekende belangenverklaringen zijn op te vragen bij het secretariaat van het Kennisinstituut van de Federatie Medisch Specialisten.

Werkgroeplid	Functie	Nevenfuncties	Gemelde belangen	Ondernomen actie
Vriens (voorzitter)	Chirurg UMC Utrecht	Bestuurslid NVvH (tot mei 2021)	Geen	Geen restricties
Kruijff (voorzitter)	Endocrien chirurg UMCG Groningen	Geen	Geen	Geen restricties
Feelders	- Professor -internist-endocrinoloog Erasmus MC - Adjunct Professor of Medicine New York University U.S.A.	- Medisch adviseur NVCAP, onbetaald - Bestuurslid Dutch Adrenal Network, onbetaald - Consultant Recordati, betaald	Geen	Geen restricties
Beun	Coordinator van de Stichting BijnierNET, parttime	Geen	Geen	Geen restricties
Langenhuijsen	Uroloog Radboudumc, Niijmegen	Bestuurslid Radboudumc Expertisecentrum Bijnierziekten Voorzitter eUROGEN WS 3 Rare genito-urological cancers en Expertise Area coordinator Adrenal tumours	ZonMw gefinancieerd onderzoek, DoelmatigheidsOnderzoek "Pentixafor PET/CT vs veneuze bijniervenesampling bij subtypering primair hyperaldosteronisme" i..s.m. PentixaPharm GmBH	Geen restricties
De Krijger	- Patholoog, UMC Utrecht, 0,2 fte - Patholoog, Prinses Maxima Centrum voor kinderoncologie, 0,7 fte	- Board member of Perined, Dutch organization supporting perinatal registries (vacatiegeld) - Council Member European Society of Pathology (onbetaald) - International Panel Member of Wilms tumor panel of SIOP Renal Tumor Study Group (onbetaald) - Chair International (European) pediatrie liver tumor panel (PHITT trial) (onbetaald) - Chairmen Dutch/Belgian working group on Pediatrie Pathology (onbetaald) - Associate editor Pediatrie and Developmental Pathology (onbetaald) - Member editorial board Endocrine Pathology (onbetaald) - Member editorial board Virchows Archiv (onbetaald) - Member editorial board Frontiers in Endocrinology (onbetaald) - Editor-in-Chief Cancers, section Pediatrie Oncology (honorarium) - Member editorial board WHO Endocrine and Neuroendocrine Tumors, 5th edition (onbetaald)	Geen	Geen restricties
Heusdens	Anesthesioloog UMC Utrecht	Geen	Geen	Geen restricties
Haak	- Internist- endocrinoloog Maxima MC tot 01-09-2023, daarna nul-aanstelling en pensioen - Hoogleraar acute interne geneeskunde MUMC/UM, tot 01-02-2024	- Lid algemeen bestuurd BijnierNET - Voorzitter Bijniernetwerk Nederland D.A.N. - Raad van Toezicht Kempenhaeghe, betaald	Incidenteel grant van HRA	Geen restricties
Dahele	Radiotherapeut/VHD afdeling radiotherapie Amsterdam UMC (locatie VUmc)	Geen	Onderzoek financiering van: Varian Medical Systems (niet gerelateerd aan bijniertumoren)	Geen restricties
Van Nesselrooij	Klinisch Genetica, UMC Utrecht (0,8fte)	Secretaris van de VKGN (tot 01-01-2023)	Geen	Geen restricties
Kwast (tot 13-12-2022)	Bestuurslid Bijniervereniging NVACP te Nijkerk (onbetaald) (tot 13-12-2022)	Redactielid Bijniervereniging NVACP (onbetaald) (tot 13-12-2022)	Geen	Geen restricties
Vister	Radioloog, UMCG	Geen	Geen	Geen restricties
van der Meij	Verpleegkundig specialist AGZ, UMC Utrecht, afdeling Endocriene oncologie	Geen	Geen	Geen restricties

Inbreng patiëntenperspectief

Er werd aandacht besteed aan het patiëntenperspectief door Patiëntenfederatie Nederland, BijnierNET, Bijniervereniging NVACP, Nederlandse Federatie van Kankerpatiënten organisaties (NFK), Nierstichting, Nierpatiëntenvereniging Nederland uit te nodigen voor de invitational conference en afgevaardigden van BijnierNET en Bijniervereniging NVACP in de werkgroep. Het verslag van de invitational conference (zie bijlage) is besproken in de werkgroep. De verkregen input is meegenomen bij het opstellen van de uitgangsvragen, de keuze voor de uitkomstmaten en bij het opstellen van de overwegingen. De conceptrichtlijn is tevens voor commentaar voorgelegd aan de patiëntenorganisaties: Bijniervereniging NVACP, Patiëntenfederatie Nederland, BijnierNET, Nederlandse Federatie van Kankerpatiënten organisaties (NFK), Nierstichting, Nierpatiëntenvereniging Nederland, Nederlandse Hypofyse Stichting en de eventueel aangeleverde commentaren zijn bekeken en verwerkt.

Wkkgz & Kwalitatieve raming van mogelijke substantiële financiële gevolgen

Kwalitatieve raming van mogelijke financiële gevolgen in het kader van de Wkkgz

Bij de richtlijn is conform de Wet kwaliteit, klachten en geschillen zorg (Wkkgz) een kwalitatieve raming uitgevoerd of de aanbevelingen mogelijk leiden tot substantiële financiële gevolgen. Bij het uitvoeren van deze beoordeling zijn richtlijnmodules op verschillende domeinen getoetst (zie het stroomschema op de Richtlijnendatabase).

Uit de kwalitatieve raming blijkt dat er waarschijnlijk geen substantiële financiële gevolgen zijn, zie onderstaande tabel.

Module	Uitkomst raming	Toelichting
Module Diagnostiek morbus Conn	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Behandeling morbus Conn	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Behandeling Cushing	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Behandeling feochromocytoom	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Expertisecentrum ACC	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Biopsie bij ongedefinieerde retroperitoneale massa	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Kenmerken CT-scan incidentaloom	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Autonome cortisol (hyper)secretie (subklinische Cushing)	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Behandeling bijniermetastasen	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Minimaal invasieve chirurgie	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Genetisch testen en chirurgisch beleid	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Pathologieverslag	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Radiologieverslag	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Follow-up	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.
Module Aandacht bijnierschorsinsufficiëntie	geen financiële gevolgen	Uit de toetsing volgt dat de aanbeveling niet breed toepasbaar is (<5.000 patiënten) en daarom naar verwachting geen substantiële financiële gevolgen zal hebben voor de collectieve uitgaven.

Werkwijze

AGREE

Deze richtlijnmodule is opgesteld conform de eisen vermeld in het rapport Medisch Specialistische Richtlijnen 2.0 van de adviescommissie Richtlijnen van de Raad Kwaliteit. Dit rapport is gebaseerd op het AGREE II instrument (Appraisal of Guidelines for Research & Evaluation II; Brouwers, 2010).

Knelpuntenanalyse en uitgangsvragen

Tijdens de voorbereidende fase inventariseerde de werkgroep de knelpunten in de zorg voor patiënten met bijniertumoren. Tevens zijn er knelpunten aangedragen door de NVvH, NVU, NOV, NVRO, VKGN, Bijniervereniging NVACP, IKNL, NAPA (vakgroep interne geneeskunde), Belangenvereniging Von Hippel-Lindau via een invitational conference. Een verslag hiervan is opgenomen in de bijlage.

Op basis van de uitkomsten van de knelpuntenanalyse zijn door de werkgroep concept-uitgangsvragen opgesteld en definitief vastgesteld.

Uitkomstmaten

Na het opstellen van de zoekvraag behorende bij de uitgangsvraag inventariseerde de werkgroep welke uitkomstmaten voor de patiënt relevant zijn, waarbij zowel naar gewenste als ongewenste effecten werd gekeken. Hierbij werd een maximum van acht uitkomstmaten gehanteerd. De werkgroep waardeerde deze uitkomstmaten volgens hun relatieve belang bij de besluitvorming rondom aanbevelingen, als cruciaal (kritiek voor de besluitvorming), belangrijk (maar niet cruciaal) en onbelangrijk. Tevens definieerde de werkgroep tenminste voor de cruciale uitkomstmaten welke verschillen zij klinisch (patiënt) relevant vonden.

Methode literatuursamenvatting

Een uitgebreide beschrijving van de strategie voor zoeken en selecteren van literatuur is te vinden onder ‘Zoeken en selecteren’ onder Onderbouwing. Indien mogelijk werd de data uit verschillende studies gepoold in een random-effects model. Review Manager 5.4 werd gebruikt voor de statistische analyses. De beoordeling van de kracht van het wetenschappelijke bewijs wordt hieronder toegelicht.

Beoordelen van de kracht van het wetenschappelijke bewijs

De kracht van het wetenschappelijke bewijs werd bepaald volgens de GRADE-methode. GRADE staat voor ‘Grading Recommendations Assessment, Development and Evaluation’ (zie http://www.gradeworkinggroup.org/). De basisprincipes van de GRADE-methodiek zijn: het benoemen en prioriteren van de klinisch (patiënt) relevante uitkomstmaten, een systematische review per uitkomstmaat en een beoordeling van de bewijskracht per uitkomstmaat op basis van de acht GRADE-domeinen (domeinen voor downgraden: risk of bias, inconsistentie, indirectheid, imprecisie en publicatiebias; domeinen voor upgraden: dosis-effect relatie, groot effect en residuele plausibele confounding).

GRADE onderscheidt vier gradaties voor de kwaliteit van het wetenschappelijk bewijs: hoog, redelijk, laag en zeer laag. Deze gradaties verwijzen naar de mate van zekerheid die er bestaat over de literatuurconclusie, in het bijzonder de mate van zekerheid dat de literatuurconclusie de aanbeveling adequaat ondersteunt (Schünemann, 2013; Hultcrantz, 2017).

GRADE	Definitie
Hoog	er is hoge zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; het is zeer onwaarschijnlijk dat de literatuurconclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Redelijk	er is redelijke zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; het is mogelijk dat de conclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Laag	er is lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; er is een reële kans dat de conclusie klinisch relevant verandert wanneer er resultaten van nieuw grootschalig onderzoek aan de literatuuranalyse worden toegevoegd.
Zeer laag	er is zeer lage zekerheid dat het ware effect van behandeling dichtbij het geschatte effect van behandeling ligt; de literatuurconclusie is zeer onzeker.

Bij het beoordelen (graderen) van de kracht van het wetenschappelijk bewijs in richtlijnen volgens de GRADE-methodiek spelen grenzen voor klinische besluitvorming een belangrijke rol (Hultcrantz, 2017). Dit zijn de grenzen die bij overschrijding aanleiding zouden geven tot een aanpassing van de aanbeveling. Om de grenzen voor klinische besluitvorming te bepalen moeten alle relevante uitkomstmaten en overwegingen worden meegewogen. De grenzen voor klinische besluitvorming zijn daarmee niet één op één vergelijkbaar met het minimaal klinisch relevant verschil (Minimal Clinically Important Difference, MCID). Met name in situaties waarin een interventie geen belangrijke nadelen heeft en de kosten relatief laag zijn, kan de grens voor klinische besluitvorming met betrekking tot de effectiviteit van de interventie bij een lagere waarde (dichter bij het nuleffect) liggen dan de MCID (Hultcrantz, 2017).

Overwegingen (van bewijs naar aanbeveling)

Om te komen tot een aanbeveling zijn naast (de kwaliteit van) het wetenschappelijke bewijs ook andere aspecten belangrijk en worden meegewogen, zoals aanvullende argumenten uit bijvoorbeeld de biomechanica of fysiologie, waarden en voorkeuren van patiënten, kosten (middelenbeslag), aanvaardbaarheid, haalbaarheid en implementatie. Deze aspecten zijn systematisch vermeld en beoordeeld (gewogen) onder het kopje ‘Overwegingen’ en kunnen (mede) gebaseerd zijn op expert opinion. Hierbij is gebruik gemaakt van een gestructureerd format gebaseerd op het evidence-to-decision framework van de internationale GRADE Working Group (Alonso-Coello, 2016a; Alonso-Coello 2016b). Dit evidence-to-decision framework is een integraal onderdeel van de GRADE methodiek.

Formuleren van aanbevelingen

De aanbevelingen geven antwoord op de uitgangsvraag en zijn gebaseerd op het beschikbare wetenschappelijke bewijs en de belangrijkste overwegingen en een weging van de gunstige en ongunstige effecten van de relevante interventies. De kracht van het wetenschappelijk bewijs en het gewicht dat door de werkgroep wordt toegekend aan de overwegingen, bepalen samen de sterkte van de aanbeveling. Conform de GRADE-methodiek sluit een lage bewijskracht van conclusies in de systematische literatuuranalyse een sterke aanbeveling niet a priori uit en zijn bij een hoge bewijskracht ook zwakke aanbevelingen mogelijk (Agoritsas, 2017; Neumann, 2016). De sterkte van de aanbeveling wordt altijd bepaald door weging van alle relevante argumenten tezamen. De werkgroep heeft bij elke aanbeveling opgenomen hoe zij tot de richting en sterkte van de aanbeveling zijn gekomen.

In de GRADE-methodiek wordt onderscheid gemaakt tussen sterke en zwakke (of conditionele) aanbevelingen. De sterkte van een aanbeveling verwijst naar de mate van zekerheid dat de voordelen van de interventie opwegen tegen de nadelen (of vice versa), gezien over het hele spectrum van patiënten waarvoor de aanbeveling is bedoeld. De sterkte van een aanbeveling heeft duidelijke implicaties voor patiënten, behandelaars en beleidsmakers (zie onderstaande tabel). Een aanbeveling is geen dictaat, zelfs een sterke aanbeveling gebaseerd op bewijs van hoge kwaliteit (GRADE gradering HOOG) zal niet altijd van toepassing zijn, onder alle mogelijke omstandigheden en voor elke individuele patiënt.

Implicaties van sterke en zwakke aanbevelingen voor verschillende richtlijngebruikers
	Sterke aanbeveling	Zwakke (conditionele) aanbeveling
Voor patiënten	De meeste patiënten zouden de aanbevolen interventie of aanpak kiezen en slechts een klein aantal niet.	Een aanzienlijk deel van de patiënten zouden de aanbevolen interventie of aanpak kiezen, maar veel patiënten ook niet.
Voor behandelaars	De meeste patiënten zouden de aanbevolen interventie of aanpak moeten ontvangen.	Er zijn meerdere geschikte interventies of aanpakken. De patiënt moet worden ondersteund bij de keuze voor de interventie of aanpak die het beste aansluit bij zijn of haar waarden en voorkeuren.
Voor beleidsmakers	De aanbevolen interventie of aanpak kan worden gezien als standaardbeleid.	Beleidsbepaling vereist uitvoerige discussie met betrokkenheid van veel stakeholders. Er is een grotere kans op lokale beleidsverschillen.

Organisatie van zorg

In de knelpuntenanalyse en bij de ontwikkeling van de richtlijnmodule is expliciet aandacht geweest voor de organisatie van zorg: alle aspecten die randvoorwaardelijk zijn voor het verlenen van zorg (zoals coördinatie, communicatie, (financiële) middelen, mankracht en infrastructuur). Randvoorwaarden die relevant zijn voor het beantwoorden van deze specifieke uitgangsvraag zijn genoemd bij de overwegingen.

Commentaar- en autorisatiefase

De conceptrichtlijnmodule werd aan de betrokken (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd ter commentaar. De commentaren werden verzameld en besproken met de werkgroep. Naar aanleiding van de commentaren werd de conceptrichtlijnmodule aangepast en definitief vastgesteld door de werkgroep. De definitieve richtlijnmodule werd aan de deelnemende (wetenschappelijke) verenigingen en (patiënt) organisaties voorgelegd voor autorisatie en door hen geautoriseerd dan wel geaccordeerd.

Literatuur

Agoritsas T, Merglen A, Heen AF, Kristiansen A, Neumann I, Brito JP, Brignardello-Petersen R, Alexander PE, Rind DM, Vandvik PO, Guyatt GH. UpToDate adherence to GRADE criteria for strong recommendations: an analytical survey. BMJ Open. 2017 Nov 16;7(11):e018593. doi: 10.1136/bmjopen-2017-018593. PubMed PMID: 29150475; PubMed Central PMCID: PMC5701989.

Alonso-Coello P, Schünemann HJ, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Rada G, Rosenbaum S, Morelli A, Guyatt GH, Oxman AD; GRADE Working Group. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ. 2016 Jun 28;353:i2016. doi: 10.1136/bmj.i2016. PubMed PMID: 27353417.

Alonso-Coello P, Oxman AD, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Vandvik PO, Meerpohl J, Guyatt GH, Schünemann HJ; GRADE Working Group. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines. BMJ. 2016 Jun 30;353:i2089. doi: 10.1136/bmj.i2089. PubMed PMID: 27365494.

Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, Fervers B, Graham ID, Grimshaw J, Hanna SE, Littlejohns P, Makarski J, Zitzelsberger L; AGREE Next Steps Consortium. AGREE II: advancing guideline development, reporting and evaluation in health care. CMAJ. 2010 Dec 14;182(18):E839-42. doi: 10.1503/cmaj.090449. Epub 2010 Jul 5. Review. PubMed PMID: 20603348; PubMed Central PMCID: PMC3001530.

Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, Alper BS, Meerpohl JJ, Murad MH, Ansari MT, Katikireddi SV, Östlund P, Tranæus S, Christensen R, Gartlehner G, Brozek J, Izcovich A, Schünemann H, Guyatt G. The GRADE Working Group clarifies the construct of certainty of evidence. J Clin Epidemiol. 2017 Jul;87:4-13. doi: 10.1016/j.jclinepi.2017.05.006. Epub 2017 May 18. PubMed PMID: 28529184; PubMed Central PMCID: PMC6542664.

Medisch Specialistische Richtlijnen 2.0 (2012). Adviescommissie Richtlijnen van de Raad Kwalitieit. http://richtlijnendatabase.nl/over_deze_site/over_richtlijnontwikkeling.html

Neumann I, Santesso N, Akl EA, Rind DM, Vandvik PO, Alonso-Coello P, Agoritsas T, Mustafa RA, Alexander PE, Schünemann H, Guyatt GH. A guide for health professionals to interpret and use recommendations in guidelines developed with the GRADE approach. J Clin Epidemiol. 2016 Apr;72:45-55. doi: 10.1016/j.jclinepi.2015.11.017. Epub 2016 Jan 6. Review. PubMed PMID: 26772609.

Schünemann H, Brożek J, Guyatt G, et al. GRADE handbook for grading quality of evidence and strength of recommendations. Updated October 2013. The GRADE Working Group, 2013. Available from http://gdt.guidelinedevelopment.org/central_prod/_design/client/handbook/handbook.html.

Zoekverantwoording

Zoekacties zijn opvraagbaar. Neem hiervoor contact op met de Richtlijnendatabase.

Richtlijnendatabase

Diagnostiek en behandeling van bijniertumoren

Diagnostiek en behandeling van bijniertumoren

Diagnostiek bijnier incidentaloom

Uitgangsvraag

Aanbeveling

Overwegingen

Onderbouwing

Achtergrond

Conclusies

Samenvatting literatuur

Zoeken en selecteren

Referenties

Evidence tabellen

Verantwoording

Autorisatiedatum en geldigheid

Initiatief en autorisatie

Algemene gegevens

Samenstelling werkgroep

Belangenverklaringen

Inbreng patiëntenperspectief

Werkwijze

Zoekverantwoording

Bijlagen