Buscar cadenas duplicadas en un txt y eliminar la fila completa
Lo que necesito es crear un archivo bat para modificar archivos txt pero antes necesito eliminar filas que están semi duplicadas puesto que luego voy a importar esos txt a un ejecutable .jar
Archivo original
El archivo de texto lo tengo así se llama descaga_2017
642926 |128 |C012644 |99 |3661351 |0160348B |2 |0.00 |0.00 |0.00 | | |
642926 |128 |C012644 |99 |3661352 |0160348 |6 |1.533 |0 |9.19800000 | |21 |
642926 |128 |C012644 |99 |3661353 |0160348B |2 |0.000 |0 |0.00000000 | |21
642949 |128 |C010241 |99 |3661485 |84155616B |2 |0.00 |0.00 |0.00 | | |
642949 |128 |C010241 |99 |3661486 |84154530 |4 |4.025 |0 |16.10000000 | |21 |
642949 |128 |C010241 |99 |3661487 |575427 |2 |4.025 |0 |8.05000000 | |21 |
642949 |128 |C010241 |99 |3661488 |0160348B |2 |0.00 |0.00 |0.00 | | |
642949 |128 |C010241 |99 |3661489 |0160348 |6 |1.533 |0 |9.19800000 | |21 |
642949 |128 |C010241 |99 |3661490 |0160348B |2 |0.000 |0 |0.00000000 | |21 |
642949 |128 |C010241 |99 |3661491 |0070181 |1 |27.254 |0 |27.25400000 | |11 |
642950 |128 |C010241 |99 |3661492 |101032 |1 |46.900 |0 |46.90000000 | |31 |
642980 |128 |C014433 |99 |3661655 |0040232B |1 |0.00 |0.00 |0.00 | | |
642980 |128 |C014433 |99 |3661656 |0040232 |2 |20.246 |0 |40.49200000 | |21 |
642980 |128 |C014433 |99 |3661657 |0040232b |1 |0.000 |0 |0.00000000 | |21 |
643043 |128 |C010278 |99 |3662001 |4700001b |2 |0.00 |0.00 |0.00 | | |
643043 |128 |C010278 |99 |3662002 |4700001 |1 |8.474 |0 |8.47400000 | |21 |
643043 |128 |C010278 |99 |3662003 |4700001B |2 |0.000 |0 |0.00000000 | |21 |
........
......
...
archivo enumerado para la explicacion:
1.642926 |128 |C012644 |99 |3661351 |0160348B |2 |0.00 |0.00 |0.00 | | |
2.642926 |128 |C012644 |99 |3661352 |0160348 |6 |1.533 |0 |9.19800000 | |21 |
3.642926 |128 |C012644 |99 |3661353 |0160348B |2 |0.000 |0 |0.00000000 | |21 | << eliminar toda la fila
4.642949 |128 |C010241 |99 |3661485 |84155616B |2 |0.00 |0.00 |0.00 | | |
5.642949 |128 |C010241 |99 |3661486 |84154530 |4 |4.025 |0 |16.10000000 | |21 |
6.642949 |128 |C010241 |99 |3661487 |575427 |2 |4.025 |0 |8.05000000 | |21 |
7.642949 |128 |C010241 |99 |3661488 |0160348B |2 |0.00 |0.00 |0.00 | | |
8.642949 |128 |C010241 |99 |3661489 |0160348 |6 |1.533 |0 |9.19800000 | |21 |
9.642949 |128 |C010241 |99 |3661490 |0160348B |2 |0.000 |0 |0.00000000 | |21 | << eliminar toda la fila
10.642949 |128 |C010241 |99 |3661491 |0070181 |1 |27.254 |0 |27.25400000 | |11 |
11.642950 |128 |C010241 |99 |3661492 |101032 |1 |46.900 |0 |46.90000000 | |31 |
12.642980 |128 |C014433 |99 |3661655 |0040232B |1 |0.00 |0.00 |0.00 | | |
13.642980 |128 |C014433 |99 |3661656 |0040232 |2 |20.246 |0 |40.49200000 | |21 |
14.642980 |128 |C014433 |99 |3661657 |0040232b |1 |0.000 |0 |0.00000000 | |21 | << eliminar toda la fila
15.643043 |128 |C010278 |99 |3662001 |4700001b |2 |0.00 |0.00 |0.00 | | |
16.643043 |128 |C010278 |99 |3662002 |4700001 |1 |8.474 |0 |8.47400000 | |21 |
17.643043 |128 |C010278 |99 |3662003 |4700001B |2 |0.000 |0 |0.00000000 | |21 | << eliminar toda la fila
Son demasiadas líneas lo que necesito es checar línea por línea, las líneas que se tienen que chequear tienen esta sintaxis.
<numero1> |<numero2> |<string1> |<numero3> |<numero4> |<string2> |<numero5> |0.00 |0.00 |0.00 | | |
ó
<numero1> |<numero2> |<string1> |<numero3> |<numero4> |<string2> |<numero5> |0.000 |0 |0.00000000 | |<numero:11,12,21,22> |
ejemplo de que que se encesita tomar esas comparaciones y eliminar la fila3.
1:642926 |128 |C012644 |99 |3661351 |0160348B |2 |0.00 |0.00 |0.00 | | |
3:642926 |128 |C012644 |99 |3661351 |0160348B |2 |0.000 |0 |0.00000000 | |21 |
Lo que importa es encontrar duplicidad hasta el campo |<numero5> | y luego verificar si el siguiente numero es 0.000, entonces terminar la comparacion y eliminar toda la fila 3 y dejar la fila 1
otro ejemplo es : comparar linea 15 y linea 17
15.643043 |128 |C010278 |99 |3662001 |4700001b |2 |0.00 |0.00 |0.00 | | |
17.643043 |128 |C010278 |99 |3662003 |4700001B |2 |0.000 |0 |0.00000000 | |21 |
linea 15 y linea 17:
643043 |128 |C010278 |99 |3662003 |4700001B |2 <<< comparar hasta acá
Seguir con la opción 1 u opción 2
Opción 1:En la línea 17 buscar si si el siguiente carácter es un numero " |0.000 |" entonces eliminar toda esa fila sin dejar fila en blanco
Opción 2:En la línea 17 si en el último campo se encuentra un numero |<numero:11,12,21,22> | entonces eliminar toda esa fila sin dejar fila en blanco .Luego el archivo terminaría así:
642926 |128 |C012644 |99 |3661351 |0160348B |2 |0.00 |0.00 |0.00 | | |
642926 |128 |C012644 |99 |3661352 |0160348 |6 |1.533 |0 |9.19800000 | |21 |
642949 |128 |C010241 |99 |3661485 |84155616B |2 |0.00 |0.00 |0.00 | | |
642949 |128 |C010241 |99 |3661486 |84154530 |4 |4.025 |0 |16.10000000 | |21 |
642949 |128 |C010241 |99 |3661487 |575427 |2 |4.025 |0 |8.05000000 | |21 |
642949 |128 |C010241 |99 |3661488 |0160348B |2 |0.00 |0.00 |0.00 | | |
642949 |128 |C010241 |99 |3661489 |0160348 |6 |1.533 |0 |9.19800000 | |21 |
642949 |128 |C010241 |99 |3661491 |0070181 |1 |27.254 |0 |27.25400000 | |11 |
642950 |128 |C010241 |99 |3661492 |101032 |1 |46.900 |0 |46.90000000 | |31 |
642980 |128 |C014433 |99 |3661655 |0040232B |1 |0.00 |0.00 |0.00 | | |
642980 |128 |C014433 |99 |3661656 |0040232 |2 |20.246 |0 |40.49200000 | |21 |
643043 |128 |C010278 |99 |3662001 |4700001b |2 |0.00 |0.00 |0.00 | | |
643043 |128 |C010278 |99 |3662002 |4700001 |1 |8.474 |0 |8.47400000 | |21 |
Gracias y saludos desde PERÚ. Nota: en la comparación no importa mayúsculas con minúsculas.