HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Fine-Grained Image Classification
Fine Grained Image Classification On Oxford 2
Fine Grained Image Classification On Oxford 2
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Repository
ResNet-50-SAM
91.6
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
µ2Net (ViT-L/16)
95.3
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems
Assemble-ResNet-FGVC-50
94.3%
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network
NAT-M3
94.1
Neural Architecture Transfer
Mixer-B/16- SAM
92.5
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
ResNet-101 (ideal number of groups)
77.076
On the Ideal Number of Groups for Isometric Gradient Propagation
-
NAT-M2
93.5
Neural Architecture Transfer
PreResNet-101
85.5897
How to Use Dropout Correctly on Residual Networks with Batch Normalization
Mixer-S/16- SAM
88.7
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
SE-ResNet-101 (SAP)
86.011
Stochastic Subsampling With Average Pooling
-
µ2Net+ (ViT-L/16)
95.5
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems
EffNet-L2 (SAM)
97.10
Sharpness-Aware Minimization for Efficiently Improving Generalization
NAT-M4
94.3
Neural Architecture Transfer
ViT-B/16- SAM
93.1
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
BiT-L (ResNet)
96.62
Big Transfer (BiT): General Visual Representation Learning
ViT-S/16- SAM
92.9
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
ResNet-152-SAM
93.3
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
BiT-M (ResNet)
94.47
Big Transfer (BiT): General Visual Representation Learning
ViT-B/16
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
0 of 19 row(s) selected.
Previous
Next