How we help you

1. Higher AI performance

2. Shorter R&D cycle

3. Lower R&D cost
1. Higher AI performance
We offer millions of high-quality radiology and pathology scans, with optional annotations, plus clinical and molecular data for AI model training and validation—helping improve accuracy, reliability, and functionality. Our commercial-use datasets, collected from Japanese hospitals/clinics, are particularly ideal for R&D targeting Japan or Asian demographics. Longitudinal data for pharma and life sciences are also available.

Radiological and pathological scans
Radiological scans (X-ray, CT, MRI, mammography, PET, etc.), ultrasound scans, and pathology scans (WSI).

Annotations
Disease segmentations and radiology/pathology reports, annotated and verified by expert radiologists/pathologists. Lung field segmentations are also available. Note: Annotations are generally available only for major diseases.

Clinical information
Age, sex, height, weight, medical history/family history, surgical history, main complaint, major symptoms/progress, nursing observation record, referring department, diagnosis, etc. Molecular diagnostic results may be available, such as ER, PgR, HER2 positive/negative in the case of breast cancer.


2. Shorter R&D cycle
Our commercial-use datasets, ethically sourced, de-identified, and meticulously curated, are ready-to-use for your R&D in medical AI, drug discovery AI, and clinical research. We can provide dataset samples, dataset customization, and data-related consulting services, including medical feedback, customized AI development, and regulatory support for entering the Japanese medical device market.

Ready for secondary use
ethically sourced and prepared for secondary use

Anonymized
Rigorously de-identified to ensure legal compliance, safeguarding you from compliance risk.

Carefully curated
Data unsuitable for medical AI, such as cases with complications or excessive noise, are excluded. Our comprehensive standardization of images, clinical data, molecular data, and annotations minimizes additional pre-processing and verification.
3. Lower R&D cost
You do not need to heavily invest in collecting, selecting, annotating, and pre-processing various medical data by yourself. Our datasets are versatile and ready for use across a wide range of R&D applications, such as medical AI, drug discovery AI, and clinical research.
Medical AI
Examples: Image diagnosis, automatic contour extraction, and dose distribution creation.
Drug discovery AI
Examples: Drug screening, biomarker discovery, and therapeutic target identification.
Clinical research
Examples: Clinical trial design, drug toxicity/effectiveness evaluation, and histopathological assessment.


Get started now
Looking for de-identified datasets from Japan?
Looking for high-quality annotated medical image datasets?
Looking for radiological/pathological scans and clinical/molecular data, for specific diseases?
To celebrate DataHub's launch, we are offering a free lung cancer CT dataset (152 cases) with lesion and lung segmentations. Annotations have been added to a public dataset, and commercial use is permitted.
Additionally, for DataHub's 2nd anniversary, we are offering a free chest X-ray dataset of suspected lung cancer (50 cases) with lesion bounding boxes, and a free prostate cancer MRI dataset (PI-RADS 4 and 5, 50 cases) with lesion segmentations. Both are Japanese datasets available for commercial use.
FAQ
Radiological scans: DICOM or NIfTI Radiology image annotations: NRRD or NIfTI for segmentation and JSON for localization (e.g., Bounding boxes) Pathological scans: TIFF or DICOM or iSyntax or NDPI Pathology image annotations: GeoJSON Clinical data (including molecular data and radiology/pathology reports): EXCEL
We check the consistency of image quality, imaging conditions, diagnosis names, radiology/pathology findings, clinical data and annotation content. When necessary, we exclude inappropriate cases, such as those with noise or missing data, and standardize the data format. Annotations are performed by specialists such as radiologists, pathologists, orthopedic surgeons, or radiologic technologists with expertise in medical imaging, depending on the disease and modality. Depending on the difficulty and requirements, primary annotation may be performed by technologists and then reviewed by specialist physicians. Annotation types vary by dataset and may include segmentation, bounding boxes, classification labels, diagnosis labels, and findings information.
Yes. We provide datasets after removing or processing information that could identify individuals, such as patient IDs, names, dates of birth, examination dates, and other identifiers contained in clinical data, metadata, and image data. If personal information is embedded in the images, or if the images contain facial features or other information that may lead to individual identification, we perform masking or other appropriate processing as necessary while minimizing the impact on analysis.
Yes. The data are obtained and provided in a form suitable for secondary use, based on appropriate procedures at each medical institution. For each dataset, we confirm the necessary acquisition conditions, such as ethics review, opt-out procedures, informed consent, and internal approval at the medical institution, before providing the data.
Yes. The datasets can be used for R&D purposes, including medical AI, AI-driven drug discovery, medical devices, and clinical research. They can also be used in collaboration with external contractors, for regulatory submissions, academic publications, and marketing materials. We can also provide datasets to companies outside Japan.
Pricing varies based on data volume, rarity, and whether it includes clinical data, molecular data, or annotation. Please contact us for further details.
We issue an invoice for payment via bank transfer through Wise or SMBC Direct.
For custom datasets, you can specify requirements such as disease, modality, body part, number of cases, imaging conditions, manufacturer, slice thickness, clinical data, molecular data, annotation type, and exclusion criteria. In addition to extracting cases from existing datasets, we can also discuss additional annotations or the construction of new datasets based on your requirements.
Yes, we offer samples comprising 2-5 cases to aid in evaluating our scans, DICOM headers, clinical information, and annotation methodology against your specific needs.
For existing datasets, delivery is typically possible within approximately 1–2 weeks. For custom datasets or datasets requiring additional annotation, the delivery timeline depends on the number of cases, disease, modality, annotation requirements, and availability of clinical data. As a general guideline, delivery typically takes approximately 1–2 months. In principle, data are provided as password-protected files through secure cloud storage such as Box.
Yes, we provide medical feedback, customized AI development, and regulatory support for entering the Japanese medical device market. Our team comprises medical AI and MLOps engineers, software developers, diagnostic radiologists, radiation oncologists, pathologists, and dataset managers. Notably, our CEO, Changhee Han (Kallis), is a leading young medical AI researcher in Japan with over 2,000 first-author citations.
Callisto DataHub is provided by Callisto Inc., which operates a medical imaging data platform for medical AI and clinical research.
Need a custom dataset?
Tell us what you're looking for.
