解决ValueError: Expected input batch_size (40) to match target batch_size (8).

发布时间：2023-04-20 文章分类：电脑百科投稿人：赵颖字号：默认 | 大 | 超大打印

已解决！！！有bug不要放弃一定要细心追根溯源，花点时间很正常的。

1：bug出现的地方

根据报错的信息，我们可以定位在损失函数losses = loss_function_train(pred_scales, target_scales)，还有在损失函数的原函数处class CrossEntropyLoss2d(nn.Module):

2：什么原因导致的bug：

这是由于维度不匹配导致的，那是什么维度不匹配？，以及那两个维度不匹配的呢？。

①：在网上冲浪了大半天，大部分都是因为view函数使用错误，导致nn.linear函数的输入和输出不匹配。因此需要回模型检查view函数前的维度，通过print函数检查view函数输入前的维度，经过我认真检查维度，对每一个层都进行print后发现模型维度没有任何的错误，所以这个方法不适用于我，但是还把链接放在这里大家检查一下自己的模型batch维度不匹配。

②：然后我就在losses处前面加上print,即打印pred_scales，target_scales的shape。

        # print(pred_scales.shape) #torch.Size([8, 40, 448, 448])
        # print(target_scales.shape) #torch.Size([8, 448, 448])
        losses = loss_function_train(pred_scales, target_scales)

这里还有一个小插曲，刚开始target_scales的size还打印不出来，是因为target_scales是一个列表，里面是totch，经过分析把target_scales旁边的中括号去掉就可以打印了。

这里我们看一下pred_scales，target_scales到底是啥：

pred_scales = model(image, depth)

        if modality in ['rgbd', 'rgb']:
            image = sample['image'].to(device)
            # print(image.shape) #torch.Size([8, 3, 448, 448])
            batch_size = image.data.shape[0]
        if modality in ['rgbd', 'depth']:
            depth = sample['depth'].to(device)
            # print(depth.shape) #torch.Size([8, 1, 448, 448])
            batch_size = depth.data.shape[0]
            # print(batch_size) # 8
        target_scales = sample['label'].to(device)

model是我们实例化后的模型，这里将rgb和depth输入，pred_scales就是我们的模型输出，这里是(8,40,448,448)，target_scales是标签。这里我们可以看出target_scales是sample列表中['label']索引对应的数据，同理image和depth也是rgb和depth索引对应的数据。

而sample是什么呢？

train_data = Dataset(
       data_dir=args.dataset_dir,
       split='train',
       depth_mode=depth_mode,
       with_input_orig=with_input_orig,
       **dataset_kwargs)
train_loader = DataLoader(train_data,
                          batch_size=args.batch_size,
                          num_workers=args.workers,
                          drop_last=True,
                          shuffle=True)
train_loader, valid_loader = data_loaders
for i, sample in enumerate(train_loader):

我们看一下数据传递的流程，首先获取data路径，经过dataset获得图片，然后经过dataloader取一个batch的数据得到trainloader，遍历trainloader的列表，得到索引i和数据sample。因为trainloader取的一个batch=8的数据，所以samle里面包含了image，depth，label他们的大小分别为torch.Size([8, 3, 448, 448])，torch.Size([8, 1, 448, 448])，torch.Size([8, 448, 448])。即

pred_scales大小为(8,40,448,448)，我们有40个类别，target_scales大小为torch.Size([8, 448, 448])。

这里延伸一下pytorch如何进行损失函数计算参考：

标签没有通道，每一个像素代表一个类别，且大小和图片的输入相同，为什么不需要one-hot编码是因为pytorch自动进行编码了。这里有一个坑：预测值和标签进行损失计算，他们两个都必须有batch，否则是不能计算成功的。

下面一个例子演示一下：

inputs_scales = torch.rand(8,40,448,448)
targets_scales = torch.rand(8,448,448)
for inputs, targets in zip(inputs_scales, targets_scales):
    # inputs = inputs.unsqueeze(0)
    # targets = targets.unsqueeze(0)
    print(inputs.shape)
    print(targets.shape)
    loss2 = nn.CrossEntropyLoss()
    result2 = loss2(inputs, targets.long())
    print(result2)

torch.Size([40, 448, 448])
torch.Size([448, 448])
Traceback (most recent call last):
  File "/tmp/pycharm_project_346/kong.py", line 816, in <module>
    result2 = loss2(inputs, targets.long())
  File "/home/software/anaconda3/envs/pycharm329/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/software/anaconda3/envs/pycharm329/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1166, in forward
    label_smoothing=self.label_smoothing)
  File "/home/software/anaconda3/envs/pycharm329/lib/python3.7/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
ValueError: Expected input batch_size (40) to match target batch_size (448).

类似于题目中的bug是吧！

我们增加batch维度后：batch为8，所以遍历八次，每次都做损失计算。

inputs_scales = torch.rand(8,40,448,448)
targets_scales = torch.rand(8,448,448)
for inputs, targets in zip(inputs_scales, targets_scales):
    inputs = inputs.unsqueeze(0)
    targets = targets.unsqueeze(0)
    print(inputs.shape)
    print(targets.shape)
    loss2 = nn.CrossEntropyLoss()
    result2 = loss2(inputs, targets.long())
    print(result2)

torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7298)
torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7283)
torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7302)
torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7282)
torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7296)
torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7296)
torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7289)
torch.Size([1, 40, 448, 448])
torch.Size([1, 448, 448])
tensor(3.7292)

在损失的定义中：

inputs_scales和 targets_scales的维度分别为torch.Size([8, 40, 448, 448]),torch.Size([8, 448, 448])，遍历inputs_scales和 targets_scales，他们的维度就是如下，他们是不能进行损失计算的。

            print(targets.shape) torch.Size([448, 448])
            print(inputs.shape) torch.Size([40, 448, 448])

class CrossEntropyLoss2d(nn.Module):
    def __init__(self, device, weight):
        super(CrossEntropyLoss2d, self).__init__()
        self.weight = torch.tensor(weight).to(device)
        self.num_classes = len(self.weight) + 1  # +1 for void
        if self.num_classes < 2**8:
            self.dtype = torch.uint8
        else:
            self.dtype = torch.int16
        self.ce_loss = nn.CrossEntropyLoss(
            torch.from_numpy(np.array(weight)).float(),
            reduction='none',
            ignore_index=-1
        )
        self.ce_loss.to(device)
    def forward(self, inputs_scales, targets_scales):
        losses = []
        for inputs, targets in zip(inputs_scales, targets_scales):
            # mask = targets > 0
            # 返回一个和源张量同shape、dtype和device的张量，与源张量不共享数据内存，但提供梯度的回溯
            targets_m = targets.clone()
            targets_m -= 1
            print(inputs.size())
            print(targets_m.size())
            loss_all = self.ce_loss(inputs, targets_m.long())
            number_of_pixels_per_class = \
                torch.bincount(targets.flatten().type(self.dtype),
                               minlength=self.num_classes)
            divisor_weighted_pixel_sum = \
                torch.sum(number_of_pixels_per_class[1:] * self.weight)   # without void
            losses.append(torch.sum(loss_all) / divisor_weighted_pixel_sum)
            # losses.append(torch.sum(loss_all) / torch.sum(mask.float()))
        return losses

3：如何解决？

所以我们要给遍历的两个数据增加维度，或者说遍历[8, 40, 448, 448]，我们希望的输出是[1,40,448,448]，直接增加维度也是同理。然后我们就可以运行了。

            inputs = inputs.unsqueeze(0)
            targets = targets.unsqueeze(0)
            targets_m = targets.clone()

总结：预测图和标签又要有batch这一维度，才能够匹配，才能够输入到损失函数中。正好就对应了bug，batch的不匹配。

标签 ED, tar, xp